I would like to offer this blog as platform for people to share their success stories with python. Over the past couple of weeks, I have had a handful of conversations related to the topic of how to get python implemented in an organization. In these conversations, I have noticed a lot of common themes related to getting the process started and sustaining it over time. Some of the key items are:
- How do I figure out where to start?
- What resources help newbies vs. more experienced users?
- How do I select a good problem to tackle?
- How do I operationalize a solution and sustain it over time?
I am hopeful that the combination of real-world case studies plus the detailed articles I have done in the past will be a helpful guide for people on this journey. Please read on for more of the back story and learn how you can help.
On Saturday, April 23rd, I presented at Minnebar #11. The topic of my presentation was “Escaping Excel Hell with Python and Pandas.” For those that are interested, I have placed a copy of the slides as well as my example notebook in my github repo. My presentation boiled down to a few key points:
- People find themselves in a position where they need to solve a fairly basic data wrangling task and reach for Excel as that solution.
- Excel is really not an ideal tool for the solution but it is the only one many people know.
- Frequently the Excel “solution” evolves and grows over time into an unmanageable mess.
- Python plus pandas is a really good solution to this problem.
- If someone can build a super gnarly excel formula, they could probably learn to code python.
- One approach to solving this problem is to train the “Excel Alpha Geek” how they can use python to solve their problems in a better way.
Overall, the feedback was positive and I think people enjoyed the presentation. There’s just one problem. When I asked the people in the room, “how many of you know about or use python?” The overwhelming majority raised their hand. While it is always good to speak to a friendly audience, I feel like I was probably preaching to the choir. This group mostly knew about the python solution and would be able to evaluate its application to their needs. How do we reach people that only know VBA?
Through this blog, I have had the really good fortune to speak to some really smart people that are interested in the same thing I am. Basically, they feel that there is a big opportunity to introduce python into organizations and help people accomplish their jobs in a more efficient way. They have all had the experience of seeing organizations struggle with fairly simple processes because they were stuck in the Excel mindset. Many of these people have then introduced python into their workplace and seen tremendous improvements in productivity.
I have had similar experiences and here is a small example experience I had just the other day.
I asked someone to help pull some disparate data together and summarize it. The analyst (who is plenty smart) did the following tasks:
- pulled data from 2 or 3 systems
- exported and formatted the data for excel
- pasted it into multiple tabs on a workbook
- did a bunch of pivot tables, vlookups, manual manipulations and formulas to get the data to answer the question
I saw the results (which were what I was looking for) and then said: “Ok, thanks for doing this. How much time would it take for you to update this every week?” The pained look on his face confirmed my suspicions. It was probably several hours of work - based on the way the solution was built. Clearly time that he did not want to sign up for.
Since this was data I had familiarity with, I used the python+pandas approach and built a ~100 line script that does the same thing in a cleaner and more repeatable fashion. I probably spent as much time on the script as he did for the Excel creation. I do not say this to boast. I say this to highlight how much opportunity there is to streamline and improve day to day processes.
As I mentioned above, I have spoken to several people working on products to help with the python deployment problem. During one of the conversations, someone mentioned something along the lines that working in San Francisco gives people a distorted view of what the average work place is really like. This person mentioned that almost everyone at a company like Facebook has the ability to write custom SQL queries against their massive database. Sure enough, I looked this up and found:
Facebook uses Presto for interactive queries against several internal data stores, including their 300PB data warehouse. Over 1,000 Facebook employees use Presto daily to run more than 30,000 queries that in total scan over a petabyte each per day.
I don’t know about you but I certainly don’t work in an area where people write queries against Petabytes of data!
I was talking to someone that had recently moved to a new position at a local government entity. She is a savvy user but not a developer. Our exchange went something like this (names and acronyms changed to protect the innocent):
Me: “What are you working on in your new job?”
Amy: “I am helping them upgrade their system to Excel and Access.”
Me: “Uhh. Upgrading to Excel and Access. What in the world are they using now?”
Amy: “I don’t know. Some kind of green screen thing name BINGO.”
Amy: “Yeah, they hope to have it replaced by mid-2017.”
Me: “Oh. Ok…”
My point with these anecdotes is that there is such a disconnect between the extreme of a highly technical company like Facebook and the rest of the world just trying to do their job. It’s a huge chasm and you can not assume that a multi-petabyte database solution is going to work for someone trying to migrate away from a terminal solution or a heavily Excel-driven mindset.
Get To The Point
As I was thinking about these various observations, I wanted to try to draw out some common threads. I strongly believe that python is a great tool to help with these types of organizational problems but there are challenges:
- How do we let people know that python would be a good solution?
- Assuming they buy in to python, how do they get started?
- How do you simply and efficiently deploy python-based solutions?
Regarding point #3, Wes McKinney wrote a good article about the challenges and the python communities’ opportunity to fix this. The community has made progress. It is still a challenge but I’m hopeful people will take up Wes’ call to action.
I want to focus on points #1 and #2. I don’t know that I can build a technical solution but I think there may be an opportunity to share best practice with others and raise awareness of python and how it could be used to help people solve their day to day challenges.
A couple of weeks ago, this thread on reddit was extremely active and illustrated the interest people had in learning about real world examples of how python helped them solve a problem. There were lots of really good ideas and lots of interest in learning more.
What I would like to do is offer to help people post their solutions as case studies on this blog. The main goals would be:
- Show concrete examples of how python helped solve a real world business problem.
- The issue could be as big or as small as you’d like but I would lean towards solutions built by individuals or very small teams - not a massive project.
- You can share as much or as little as you’d like.
- Posting here would provide a level of anonymity (if desired). I think people are hesitant to talk about their work solutions for fear that someone will come after them.
- The technical solution is probably not as interesting as explaining universal challenges like:
- Organizational buy-in and change management
- What went well, what didn’t
- What would you do differently?
The true value might not be in the actual sharing of code but in the ideas and processes used to solve a problem and make it scalable. In many situations, the challenges are not technical in nature.
I think there is a real need to spread this information out in a format that is less threatening to a non-programmer. If we could get some good case studies out there it might spark some ideas and help people understand how to tackle their own problems.
If you are interested in sharing your experiences, let me know. I would be more than willing to work with you to put together as much or as little detail as you would like in order to get the word out there. This can be a small but meaningful way way that you could give back to the community.
So, what do you think? Put your thoughts in the comments and reach out to me if you have any great ideas.
- ← Interactive Data Analysis with Python and Excel
- Excel “Filter and Edit” - Demonstrated in Pandas →