Tue 06 September 2016

Creating Pandas DataFrames from Lists and Dictionaries

Whenever I am doing analysis with pandas my first goal is to get data into a panda’s DataFrame using one of the many available options. For the vast majority of instances, I use read_excel, read_csv, or read_sql.

However, there are instances when I just have a few lines of data or some calculations that I want to include in my analysis. In these cases it is helpful to know how to create DataFrames from standard python data structures such as lists or dictionaries. The basic process is not difficult but because there are several different options it is helpful to understand how each works. I can never remember whether I should use from_dict, from_records, from_items or the default DataFrame constructor. Normally, through some trial and error, I figure it out. Since it is still confusing to me, I thought I would walk through several examples below to clarify the different approaches. At the end of the article, I briefly show how this can be useful when generating Excel reports.

Mon 29 August 2016

Introduction to Data Visualization with Altair

Posted by Chris Moffitt in articles

Despite being over 1 year old, one of the most popular articles I have written is Overview of Python Visualization Tools. After these many months, it is one of my most frequently searched for, linked to and read article on this site. I think this fact speaks to hunger in the python community for one visualization tool to rise above the rest. I am not sure I want (or need) one to “win” but I do continue to watch the changes in this space with interest.

All of the tools I mentioned in the original article are still alive and many have changed quite a bit over the past year or so. Anyone looking for a visualization tool should investigate the options and see which ones meet their needs. They all have something to offer and different use-cases will drive different solutions.

In the spirit of keeping up with the latest options in this space, I recently heard about Altair which calls itself a “declarative statistical visualization library for Python.” One of the things that peaked my interest was that it is developed by Brian Granger and Jake Vanderplas. Brian is a core developer in the IPython project and very active in the scientific python community. Jake is also active in the scientific python community and has written a soon to be released O’Reilly book called Python Data Science Handbook. Both of these individuals are extremely accomplished and knowledgeable about python and the various tools in the python scientific ecosystem. Because of their backgrounds, I was very curious to see how they approached this problem.

Tue 23 August 2016

Lessons Learned from Analyze This! Challenge

Posted by Chris Moffitt in articles

I recently had the pleasure of participating in a crowd-sourced data science competition in the Twin Cities called Analyze This! I wanted to share some of my thoughts and experiences on the process - especially how this challenge helped me learn more about how to apply data science theory and open source tools to real world problems.

I also hope this article can encourage others in the Twin Cities to participate in future events. For those of you not in the Minneapolis-St. Paul metro area, then maybe this can help motivate you to start up a similar event in your area. I thoroughly enjoyed the experience and got a lot out of the process. Read on for more details.

Wed 15 June 2016

Excel “Filter and Edit” - Demonstrated in Pandas

Posted by Chris Moffitt in articles

I have heard from various people that my previous articles on common Excel tasks in pandas were useful in helping new pandas users translate Excel processes into equivalent pandas code. This article will continue that tradition by illustrating various pandas indexing examples using Excel’s Filter function as a model for understanding the process.

Mon 16 May 2016

Sharing Your Python Case Studies

Posted by Chris Moffitt in articles

I would like to offer this blog as platform for people to share their success stories with python. Over the past couple of weeks, I have had a handful of conversations related to the topic of how to get python implemented in an organization. In these conversations, I have noticed a lot of common themes related to getting the process started and sustaining it over time.

Wed 06 April 2016

Interactive Data Analysis with Python and Excel

Posted by Chris Moffitt in articles

I have written several times about the usefulness of pandas as a data manipulation/wrangling tool and how it can be used to efficiently move data to and from Excel. There are cases, however, where you need an interactive environment for data analysis and trying to pull that together in pure python, in a user-friendly manner would be difficult. This article will discuss how to use xlwings to tie Excel, Python and pandas together to build a data analysis tool that pulls information from an external database, manipulates it and presents it to the user in a familiar spreadsheet format.

Tue 26 January 2016

Learn More About Pandas By Building and Using a Weighted Average Function

Posted by Chris Moffitt in articles

Pandas includes multiple built in functions such as sum, mean, max, min, etc. that you can apply to a DataFrame or grouped data. However, building and using your own function is a good way to learn more about how pandas works and can increase your productivity with data wrangling and analysis.

The weighted average is a good example use case because it is easy to understand but useful formula that is not included in pandas. I find that it can be more intuitive than a simple average when looking at certain collections of data. Building a weighted average function in pandas is relatively simple but can be incredibly useful when combined with other pandas functions such as groupby.

This article will discuss the basics of why you might choose to use a weighted average to look at your data then walk through how to build and use this function in pandas. The basic principles shown in this article will be helpful for building more complex analysis in pandas and should also be helpful in understanding how to work with grouped data in pandas.

Wed 30 December 2015

Getting to the “Plateau of Productivity” with Python

Posted by Chris Moffitt in articles

As we close out the year, I wanted to take a step back and write a post that will motivate people to learn python and apply it to their daily jobs. Based on some comments I’ve received (and my own personal observations), some people struggle to get started on this journey. They see the potential value of using python in their jobs but are not sure where to start and can not find the time to take the first steps. Closely related to this challenge is finding the perseverance to make it through the inevitable barriers you will encounter. My goal in this article is to provide some items to keep in mind so that you can be successful in your endeavors to learn python and apply it to your job. If you take the time (definitely no easy task) to develop your python skills, you can reap many benefits - outside of the obvious ones you may have started out seeking.

Mon 07 December 2015

Creating Advanced Excel Workbooks with Python

Posted by Chris Moffitt in articles

I have written several articles about using python and pandas to manipulate data and create useful Excel output. In my experience, no matter how strong the python tools are, there are times when you need to rely on Excel as the vehicle to communicate your message or further analyze the data. This article will walk through some additional improvements you can make to your Excel-based output by:

Adding Excel tables with XlsxWriter
Inserting custom VBA into your Excel file
Using COM for merging multiple Excel worksheets

Mon 26 October 2015

Pandas 0.17 Release and Other Notes

Posted by Chris Moffitt in articles

As many of you know, pandas released version 0.17.0 on October 9th. In typical pandas fashion there are a bunch of updates, bug fixes and new features which I encourage you to read all about here. I do not plan to go through all of the changes but there are a couple of key things that I think will be useful to me in my daily work that I will explore briefly in this article. In addition, I am including a couple of other tips and tricks for pandas that I use on a frequent basis and hope will be useful to you.

Practical Business Python

Creating Pandas DataFrames from Lists and Dictionaries

Introduction to Data Visualization with Altair

Lessons Learned from Analyze This! Challenge

Excel “Filter and Edit” - Demonstrated in Pandas

Sharing Your Python Case Studies

Interactive Data Analysis with Python and Excel

Learn More About Pandas By Building and Using a Weighted Average Function

Getting to the “Plateau of Productivity” with Python

Creating Advanced Excel Workbooks with Python

Pandas 0.17 Release and Other Notes

Subscribe to the mailing list

Social

Submit a Topic

Popular

Article Roadmap

Feeds

Disclosure