Pandas Functions for Fast Analysis
1 . read_html()
Web scraping is one of the key processes that brings people to Python. Lots of people don’t know that Pandas has a web scraping function. With read HTML, all you have to do is pass in the name of the URL and you can access that data on that web page. Here is the full documentation.
This function will return a correlation matrix for pairs of numeric columns.
Getting rid of duplicate values is an important step in data analysis. There are lots of convoluted ways to do it, but Pandas has a super simple function you can use. Here is the full documentation for the function.
Histograms are an important part of exploratory data analysis. Many people import Matplotlib or Seaborn for their exploratory visualizations, but Pandas actually has simple functions to create histograms and other graphs. This function will return histograms for each applicable columns.
Handling date values in Pandas can always be an annoyance. The to_datetime function is great because it takes any datatype and converts that value to date time. Now you know you are working with the correct data type!
The dtypes function will return the data type for each column in your data frame. This can be a great first step for exploratory data analysis. Often it can be hard to tell the difference between an integer and a float for example. This function will give you that info and more.
The shape function will return the shape (size) of your data frame. This will let you know how many columns and rows you have. The size of your data frame can have an impact on how you choose to analyze it, or if you want to start by filtering out some data.
Pandas has a simple function for getting rid of null values. This function can be configured in a few ways to either drop any row that has a missing value or only drop rows where a certain column has missing values.
To drop any row with a missing value:
String data can require a lot of extra steps to clean and prepare for analysis. Pandas has many great functions to make this faster. One is the strip function. This will get rid of any unwanted spaces or white spaces in the data frame’s string data.
Similar to the histogram function mentioned above, there is also a simple Pandas function for making a box plot — a great way to understand the mean, median and quartiles of your numeric data.
I hope these functions help you with your Python data work :)