4 Tools To Get into Data Science
Mito allows the users to access Python through a familiar spreadsheet. Each edit made in the Mito spreadsheet will generate the equivalent Python for you. This allows the user to see the Python for their operations, without needing to write the code themselves. Mito allows the user to do normal spreadsheet/data science operations such as filter, merge, pivot, graph etc. — and have the code written for them automatically.
Here is a demo video:
To install Mito, use these three commands in your terminal:
python -m pip install mitoinstaller
python -m mitoinstaller install
Then to open the Mitosheet interface:
import mitosheet
mitosheet.sheet()
Here is a link to the full install instructions.
Mito graphing uses the Plotly python package to create interactive and shareable charts. Here is an example:
2. Beautiful Soup
Beautiful soup is a Python package that allows for easy data extraction from html and xml formats — this essentially means web-scraping.
To install:
$ pip install beautifulsoup4
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
The package has easy commands to parse through the html.
To retrieve the title:
soup.title
To find all the URLs in the page:
for link in soup.find_all('a'):
To retrieve all the text on a page:
print(soup.get_text())
Here is the full documentation.
3. Scrubadub
This package is all about removing personally identifiable data from datasets. As privacy laws become more and more stringent (as they should!), Python users and businesses at large are increasingly focused on removing personal data.
Here are examples of the PII that this packages removes (taken from the package documentation):
https://scrubadub.readthedocs.io/en/stable/
To install:
pip install scrubadub
Here is an overview of how the package works:
4. Lux — Automated Visualization Suggestions and Generation
Lux takes any DataFrame you pass in and analyzes it for possible visualizations. Then Lux will present the pre-configured visualizations that you can choose from. All you have to do is click the visualization you want and you’re done — no coding required! Lux is great for those who want to access visualizations more quickly in Python. Even for more advanced Python users, getting the syntax for a visualization right can be a time consuming process.
To install lux:
import lux
import pandas as pd
Lux will recommend and present a variety of charts, ranging from data exploration to more visual and geographical (if the data is pertinent).
With Lux’s intent feature, you can specify the columns you are interested in and it will recommend charts specific to those columns.
df.intent = ["Column1","Column2"]
df