Reading HTML tables with Pandas
Posted by Chris Moffitt in articles
The pandas read_html() function is a quick and convenient way to turn an HTML
table into a pandas DataFrame. This function can be useful for quickly incorporating tables
from various websites without figuring out how to scrape the site’s HTML.
However, there can be some challenges in cleaning and formatting the data before analyzing
it. In this article, I will discuss how to use pandas read_html()
to read and
clean several Wikipedia HTML tables so that you can use them for further numeric analysis.