This page contains a collection of books and other content that I have found useful and hope you will as well. Any of the links which point to Amazon are affiliate links which mean this site will receive a small referral commission for any purchases through these links. The rest of the content is freely available and very useful.
Python for Data Analysis by Wes McKinney is the definitive reference for Pandas. In October 2017, Wes published the 2nd edition which includes many updates for the latest versions of pandas. In addition to the excellent pandas content, Wes includes introductary examples of statsmodels, patsy and scikit-learn. Anyone that does any level of work in Pandas would benefit from having this book on their bookshelf. All the material is available at the pydata-book github repository but I encourage you to purchase the book if you are able.
The Python Data Science Handbook by Jake VanderPlas is an excellent overview of the core elements of the python data science toolkit. This book is like 5 books in one with great coverage of IPython, NumPy, pandas, matplotlib and machine learning with scikit-learn. I highly recommend this book to anyone that has basic python experience and is planning to work with any of the tools it covers. All of the content has been made generously available as notebooks so you can review the content before purchasing the book.
Data Science for Business by Foster Provost and Tom Fawcett is a very useful book for thinking about Data Science for solving business problems. The book does not cover any specific language and is light on math but very heavy on the fundamental concepts of Data Science and how to implement them in real life. This is a useful resource when it comes to figuring out how to apply technology in a complicated business setting.
Effective Pandas by Tom Augspurger is a short book that is a collection of several of his blog posts. It does a fabulous job of describing idiomatic pandas code. This book is best for someone that has basic python understanding and exposure to pandas. I continually come back to the content and refer to it in order to find out new and more efficient ways to use pandas. All of the content is available on github but please consider purchasing the book if you find it useful.
A Whirlwind Tour of Python by Jake VanderPlas is a quick but insightful introduction to python that is available for free. It focuses on basic and essential python syntax and hopes “readers will walk away with a solid foundation from which to explore the data science stack” If you have limited experience working with python, this is a good place to get started.
Storytelling With Data by Cole Nussbaumer Knafflic is an essential guide for anyone that is trying to communicate effectively with data. The majority of the recommendations on this page are technical in nature but this book is the exception. It does not talk about how to use python (or any tool) to build a visualization. However it presents extremely practical information on how to present information to your audience in a way that will be most effective in getting your point across. I think this is a must have book for anyone that performs data analysis and uses those results to try to drive change in an organization.
The Big Book of Dashboards by Steve Wexler, Jeffrey Shaffer and Andy Gotgreave is well-thought out walk through of 28 different real-world dashboards and provides insipiration for how to create your own dashboards. This book is a good companion to Storytelling with Data because it does not focus on technology but instead focuses on what makes a visualization compelling and actionable.
I am a sucker for cheat sheets. I find them useful once I have the basic syntax of a package down and just need a quick refresher. Here are a few of the ones I refer back to frequently.
The official pandas cheat sheet is a nice summary of data wrangling functions in pandas. It does not cover everything pandas can do but it is a good reminder of the core concepts.
Over at the Mark Graph blog, there is a really detailed 12 page pandas dataframe cheat sheet that is worth checking out. It goes into a lot more detail than the official pandas cheatsheet but is most useful to someone that has basic familiarity with pandas.
The Mark Graph blog also has a nice matplotlib cheatsheet that’s worth adding to your library.