9 Pandas Functions You need to Know

Jake from Mito
3 min readMar 10, 2022

--

I will leave a rambling intro for the end of this article — I imagine you are here to learn important info. Here are 9 functions that I recommend you learn, with all the info you need to get started and no filler :)

Find Unique Values in a Column

df["column_name"].unique()

This function allows you to easily see what data you have in a column and what the distribution of that data is.

Web Scraping

pd.read_html("URL")

Webscraping is a key reason that someone might use Python. Pandas has a powerful, yet little known function that allows you to pass in a URL and start handling the tabular data at that address. Here is the full documentation.

Correlation Matrix

df.corr()

Understanding the correlations between your numerical columns is a great first step in deciding what type of analysis you want to apply to the data. Pandas has a function that you can tack on to any dataframe and automatically produces a correlation matrix for the appropriate columns.

Replace Null Values with Zeros (for example)

df.replace(np.nan, "0", inplace = True)

Getting rid of null values can be a key aspect of data cleaning and data analysis, though not all analyses require this step. There many ways of going about this, but this simple, one line function takes all the null values and replaces them with zeros. You can switch out the “0” with anything else by replacing what you put in the quotations.

Export your DataFrame to an Excel File

df.to_excel('dir/myDataFrame.xlsx',  sheet_name='Sheet1')

Understanding how to go back and forth between Excel and Python can be tricky, but many data scientists will find themselves grappling with this workflow frequently. This function allows you to pass your dataframe to an existing Excel file. All you need to do is specify the file path and the sheet name as arguments in the function. Here is the full Pandas to Excel documentation.

Retrieve Dataframe Information

Return the amount of rows and columns in your dataframe:

df.shape()

Get summary statistics about your dataframe:

df.describe()

Select Specific Data Types in your dataset

df.select_dtypes(include='int64')

With this function, you can select columns in your DataFrame, that contain a specific data type.

Take a random sample of a dataset

df.sample(n = 400)

Of course, there are an endless amount of functions you can learn. These are just a selection that has helped me in my data science experience. Please let me know in the comments if there are others that you recommend.

I hope these functions are helpful :)

--

--

Jake from Mito
Jake from Mito

Written by Jake from Mito

Exploring the future of Python and Spreadsheets

No responses yet