Everything you need to know about joins in Python

Jake from Mito
2 min readMar 9, 2023

--

Merging dataframes allows you to combine multiple data sources into one. In the data world, enriching data often includes merging two datasets together. These datasets can be similar or different. Pandas is a great way to do this in a few lines of code. Below you’ll find easy “copy and paste -able” code to merge dataframes in seconds.

The code you probably want

Merge two datasets with an lookup join (similar to a VLOOKUP in Excel):

import pandas as pd
fav_color_df = pd.DataFrame({
"name": ["Jeff", "Bill", "Peter"],
"favorite_color": ["blue", "red", "yellow"]
})
fav_food_df = pd.DataFrame({
"name": ["Peter", "Jeff", "Bill", "Maggie"],
"favorite_food": ["apple", "banana", "orange", "tangerine"]
})
# First, drop duplicates so that we get the first entry for the lookup dataframe
fav_food_df.drop_duplicates(subset=["name"], inplace=True)
# Then, do the lookup merge
merged_df = fav_color_df.merge(fav_food_df, left_on=['name'], right_on=['name'], how='left')

Other Common Use Cases

Merge with an inner join.

An inner join will only keep the entries where the key exists in both datasets.

merged_df = fav_color_df.merge(fav_food_df, left_on=['name'], right_on=['name'], how='inner')

Merge with a left join

A left join will only keep the entries where the key exists in the first dataframe.

merged_df = fav_color_df.merge(fav_food_df, left_on=['name'], right_on=['name'], how='left')

Merge with a right join

A left join will only keep the entries where the key exists in the second dataframe.

merged_df = fav_color_df.merge(fav_food_df, left_on=['name'], right_on=['name'], how='right')

Merge with an outer join

Includes all rows from both dataframes.

merged_df = fav_color_df.merge(fav_food_df, left_on=['name'], right_on=['name'], how='outer')

Merge with an inner join

Includes rows that are present in both dataframes.

merged_df = fav_color_df.merge(fav_food_df, left_on=['name'], right_on=['name'], how='inner')

What is this blog post

At Mito, we’ve spent years working with users to automate their data work in Python. As such, we’ve got a great sense of what the most common use cases are for common pandas functions.

This blog post is part of a series of tutorials — designed to be the best pandas documentation for you most of the time. There’s no BS marketing language at the top of this blog post, and we rank (easy to copy!) code snippets depending on how likely you are to be encountering them.

If you’re looking to save time writing Python code, we recommend checking out Mito — we built it with the same set of learnings that helped us build the best pandas tutorials ever!

With Mito, you can add new columns and write formulas just like you would in Excel or Google sheets — and we write the Python code for you. See our documentation here.

You can find more blogs like this here.

--

--