This roadmap contains 16 Chapters that can be completed in 8 months, whether you are a fresher in the field or an experienced professional who wants to transition into Data Science.
Data Structure is the most important thing to learn not only for data scientists but for all the people working in computer science. With data structure, you get an internal understanding of the working of everything in software.
Understand these topics
Types of Algorithm Analysis
Asymptotic Notation, Big-O, Omega, Theta
Stacks
Queues
Linked List
Trees
Graphs
Sorting
Searching
Hashing
3 | Pandas Numpy Matplotlib
Python supports n-dimensional arrays with Numpy. For data in 2-dimensions, Pandas is the best library for analysis. You can use other tools but tools have drag-and-drop features and have limitations. Pandas can be customized as per the need as we can code depending upon the real-life problem.
Numpy
Vectors, Matrix
Operations on Matrix
Mean, Variance, and Standard Deviation
Reshaping Arrays
Transpose and Determinant of Matrix
Diagonal Operations, Trace
Add, Subtract, Multiply, Dot, and Cross Product.
Pandas
Series and DataFrames
Slicing, Rows, and Columns
Operations on DataFrame
Different ways to create DataFrame
Read, Write Operations with CSV files
Handling Missing values, replace values, and Regular Expression
GroupBy and Concatenation
Matplotlib
Graph Basics
Format Strings in Plots
Label Parameters, Legend
Bar Chart, Pie Chart, Histogram, Scatter Plot
4 | Statistics
Descriptive Statistics
Measure of Frequency and Central Tendency
Measure of Dispersion
Probability Distribution
Gaussian Normal Distribution
Skewness and Kurtosis
Regression Analysis
Continuous and Discrete Functions
Goodness of Fit
Normality Test
ANOVA
Homoscedasticity
Linear and Non-Linear Relationship with Regression
Inferential Statistics
t-Test
z-Test
Hypothesis Testing
Type I and Type II errors
t-Test and its types
One way ANOVA
Two way ANOVA
Chi-Square Test
Implementation of continuous and categorical data
5 | Machine Learning
The best way to master machine learning algorithms is to work with the Scikit-Learn framework. Scikit-Learn contains predefined algorithms and you can work with them just by generating the object of the class. These are the algorithm you must know including the types of Supervised and Unsupervised Machine Learning:
Linear Regression
Logistic Regression
Decision Tree
Gradient Descent
Random Forest
Ridge and Lasso Regression
Naive Bayes
Support Vector Machine
KMeans Clustering
Other Concepts and Topics for ML
Measuring Accuracy
Bias-Variance Trade-off
Applying Regularization
Elastic Net Regression
Predictive Analytics
Exploratory Data Analysis
6 | MLOps
You can master any one of the cloud services provider from AWS, GCP and Azure. You can switch easily once you understand one of them.
We will focus on AWS - Amazon Web Services first
Deploy ML models using Flask
Amazon Lex - Natural Language Understanding
AWS Polly - Voice Analysis
Amazon Transcribe - Speech to Text
Amazon Textract - Extract Text
Amazon Rekognition - Image Applications
Amazon SageMaker - Building and deploying models
Working with Deep Learning on AWS
7 | Natural Language Processing
If you are interested in working with Text, you should do some of the work an NLP Engineer do and understand the working of Language models.
Sentiment analysis
POS Tagging, Parsing,
Text preprocessing
Stemming and Lemmatization
Sentiment classification using Naive Bayes
TF-IDF, N-gram,
Machine Translation, BLEU Score
Text Generation, Summarization, ROUGE Score
Language Modeling, Perplexity
Building a text classifier
Identifying the gender
8 | Computer Vision
To work on image and video analytics we can master computer vision. To work on computer vision we have to understand images.
PyTorch Tensors
Understanding Pretrained models like AlexNet, ImageNet, ResNet.
Neural Networks
Building a perceptron
Building a single layer neural network
Building a deep neural network
Recurrent neural network for sequential data analysis