Data Science

3 Must-Have Projects For Your Data Science Portfolio

And a comprehensive roadmap of how to build them!

Aakash N S

--

Photo by Scott Graham on Unsplash

Table Of Contents
· Introduction
· Project 1: Exploratory Data Analysis and Visualization (EDA)
· Project 2: Classical Machine Learning on Structured Data
· Project 3: Deep Learning on Unstructured Data
· Conclusion

Before you start applying for data science jobs, make sure to complete
atleast one project in each of these three important domains:

  1. Exploratory Data Analysis and Visualization
  2. Classical Machine Learning on Tabular Data
  3. Deep Learning (Computer Vision/NLP)

You can host your projects on your Github/Jovian profile. Here’s mine:

Project 1: Exploratory Data Analysis and Visualization (EDA)

Check out these projects for inspiration:

  1. Analyzing your WhatsApp messages by Michael Chia Yin
  2. Understanding your Browsing Patterns using Pandas by Kartik Godawat
  3. What Makes a Student Prefer a University by Daniela Cruz

Here are the steps for building a project on EDA & visualization:

  1. Find a real-world dataset of your choice online
  2. Use Numpy & Pandas to parse, clean & analyze data
  3. Use Matplotlib & Seaborn to create visualizations
  4. Ask and answer interesting questions about the data
  5. Document & publish your work in a Jupyter notebook or blog post

Take our course on Data Analysis with Python: Zero to Pandas to learn the skills required for building projects on Exploratory Data Analysis and Visualization

Project 2: Classical Machine Learning on Structured Data

Check out these projects for inspiration:

  1. New York Taxi Fare Prediction by Allen Kong
  2. Predicting the Auction Price of Bulldozers by Ankur Singh
  3. Building the Hogwarts Sorting Hat using Logistic Regression by Ekaterina Derevyanka

Here are the steps for building a classical machine learning project:

  1. Find an interesting tabular dataset online (typically in CSV/JSON format)
  2. Identify the type of problem: regression, classification, unsupervised learning, etc.
  3. Clean the data if required and perform exploratory data analysis
  4. Do some feature engineering i.e. create some new & useful features using existing ones
  5. Identify the right modeling approaches e.g. decision trees, regression, gradient boosting, etc.
  6. Train a model and evaluate its performance using K-fold cross-validation
  7. Experiment with different modeling approaches & hyperparameters
  8. Document & publish your work in a Jupyter notebook or blog post

Check out these courses on Coursera and Udemy to learn the skills required to build a classical machine learning project.

Project 3: Deep Learning on Unstructured Data

Check out these projects for inspiration:

  1. Blindness Detection using Image Classification
  2. Generating New Artworks using GANs
  3. Bounding Box Prediction using PyTorch
  4. Classifying Environment Audio Recordings

Here are the steps for building a deep learning project:

  1. Find an interesting unstructured dataset online (images, text, audio, etc.)
  2. Identify the type of problem: regression, classification, generative modeling, etc.
  3. Identify the type of neural network you need: fully-connected, convolutional, recurrent, etc.
  4. Prepare the dataset for training (set up batches, apply augmentations & transforms)
  5. Define a network architecture and set up a training loop
  6. Train the model and evaluate its performance using a validation/test set
  7. Experiment with different network architectures, hyperparameters & regularization techniques
  8. Document and publish your work in a Jupyter notebook or blog post

Take our course on Deep Learning with PyTorch: Zero to GANs to learn the skills required to build a deep learning project.

Conclusion

Where to Find Datasets for Your Projects?

Here are some sources for finding exciting and unique datasets:

  1. Kaggle Datasets (use the opendatasets library for downloading datasets)
  2. Past Kaggle Competitions (check the “Completed” tab)
  3. awesome-public-datasets on Github
  4. FastAI Course Datasets
  5. Curated Deep Learning Datasets

You can also export your personal data from applications like Google Chrome, WhatsApp, Facebook, Instagram, Apple, FitBit, etc., to analyze and predict your own behavior!

What are you building? Tweet at us and let us know! We’d love to feature your project on our Community Medium Publication.

--

--