4 Tools To Get into Data Science

Jake from Mito
3 min readMay 25, 2022

--

  1. Mito

Mito allows the users to access Python through a familiar spreadsheet. Each edit made in the Mito spreadsheet will generate the equivalent Python for you. This allows the user to see the Python for their operations, without needing to write the code themselves. Mito allows the user to do normal spreadsheet/data science operations such as filter, merge, pivot, graph etc. — and have the code written for them automatically.

Here is a demo video:

To install Mito, use these three commands in your terminal:

python -m pip install mitoinstaller
python -m mitoinstaller install

Then to open the Mitosheet interface:

import mitosheet
mitosheet.sheet()

Here is a link to the full install instructions.

Mito graphing uses the Plotly python package to create interactive and shareable charts. Here is an example:

docs.trymito.io

2. Beautiful Soup

Beautiful soup is a Python package that allows for easy data extraction from html and xml formats — this essentially means web-scraping.

To install:

$ pip install beautifulsoup4
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

The package has easy commands to parse through the html.

To retrieve the title:

soup.title

To find all the URLs in the page:

for link in soup.find_all('a'):

To retrieve all the text on a page:

print(soup.get_text())

Here is the full documentation.

3. Scrubadub

This package is all about removing personally identifiable data from datasets. As privacy laws become more and more stringent (as they should!), Python users and businesses at large are increasingly focused on removing personal data.

Here are examples of the PII that this packages removes (taken from the package documentation):

https://scrubadub.readthedocs.io/en/stable/

To install:

pip install scrubadub

Here is an overview of how the package works:

https://scrubadub.readthedocs.io/en/stable/usage.html

4. Lux — Automated Visualization Suggestions and Generation

Lux takes any DataFrame you pass in and analyzes it for possible visualizations. Then Lux will present the pre-configured visualizations that you can choose from. All you have to do is click the visualization you want and you’re done — no coding required! Lux is great for those who want to access visualizations more quickly in Python. Even for more advanced Python users, getting the syntax for a visualization right can be a time consuming process.

To install lux:

import lux
import pandas as pd
https://github.com/lux-org/lux

Lux will recommend and present a variety of charts, ranging from data exploration to more visual and geographical (if the data is pertinent).

With Lux’s intent feature, you can specify the columns you are interested in and it will recommend charts specific to those columns.

df.intent = ["Column1","Column2"]
df
https://github.com/lux-org/lux

--

--

Jake from Mito
Jake from Mito

Written by Jake from Mito

Exploring the future of Python and Spreadsheets

No responses yet