The Hidden Pandas Functions you Should Know

Web Scraping

pd.read_html("URL")

Webscraping is a key reason that someone might use Python. Pandas has a powerful, yet little known function that allows you to pass in a URL and start handling the tabular data at that address. Here is the full documentation.

Here is a demo video of the function:

Find all the Null Values in a Dataframe

df.isnull().sum()

This function combines .isnull() and .sum() and will return a list of each column in the data frame with the amount of null values in each column. Finding null values is an important part of EDA and data cleaning. Here is the output of the function call:

from author

Export your Dataframe to an Excel File

df.to_excel('dir/myDataFrame.xlsx',  sheet_name='Sheet1')

Understanding how to go back and forth between Excel and Python can be tricky, but many data scientists will find themselves grappling with this workflow frequently. This function allows you to pass your dataframe to an existing Excel file. All you need to do is specify the file path and the sheet name as arguments in the function. Here is the full Pandas to Excel documentation.

Find Duplicates

print(df.duplicated().sum())

There are multiple ways to find duplicates rows in your dataset. This function above is the easies, as it will find all the duplicate entries and print how many there are. If it prints “0”, there are no duplicates and you are good to go!

Correlation Matrix

df.corr()

Understanding the correlations between your numerical columns is a great first step in deciding what type of analysis you want to apply to the data. Pandas has a function that you can tack on to any dataframe and automatically produces a correlation matrix for the appropriate columns.

from author

Fill Null Values with Zeros

df.replace(np.nan, "0", inplace = True)

This function will take your entire data frame and fill the null values with zeros, or whatever value you put in the second argument of the function. It is certainly the fastest way to get rid of your null values, putting your dataset in a place that will avoid more errors and dead-ends in your analysis. If you are not sure whether or not Null values will impact your analysis, I advise you to either fill them or delete the entries that hold the null values.

Remove White Spaces from Text

String data can require a lot of extra steps to clean and prepare for analysis. Pandas has many great functions to make this faster. One is the strip function. This will get rid of any unwanted spaces or white spaces in the data frame’s string data.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jake from Mito

Jake from Mito

Exploring the future of Python and Spreadsheets