This online data science course covers the essential concepts for working with real datasets in Python. You’ll learn NumPy fundamentals such as array creation, indexing, slicing, and core operations, so you can process data faster and avoid slow manual calculations. You’ll also learn how to combine and split arrays and apply arithmetic functions to prepare data for analysis and modeling.You’ll understand concepts like Bayes theorem and common distributions, then use statistics such as central tendency, dispersion, boxplots, and correlation to explain patterns clearly. You’ll also build hands-on EDA skills, including 5-point summaries, skewness checks, missing value imputation, outlier handling, encoding categorical data, and scaling, so your datasets become reliable inputs for machine learning. Learn core models like regression, decision trees, and k-means clustering, and apply these skills to projects like the MovieLens dataset and building an Azure chatbot. By the end of the course, you’ll be able to summarize data and make confident decisions using probability, distributions, and descriptive statistics.
Overview
Syllabus
- Introduction to NumPy
- Introduction to NumPy, Indexing an array, Slicing an array, Operations on an array, Arithmetic functioning in NumPy, Concatenation of arrays, Splitting of arrays.
- Introduction to Pandas
- Introduction to Pandas, Introduction to data structures, Introduction to Pandas Series and creating Series, Manipulating Series, Introduction to DataFrames and creating DataFrame, Manipulating the DataFrames, Reading data from different sources.
- Introduction to Probability and Distributions
- Introduction to Probability and Distributions, Probability - Meaning and Concepts, Rules for Computing Probability, Marginal Probability and Example, Bayes Theorem and Example, Binomial Distribution and Example, Normal Distribution and Example, Poisson Distribution and Example.
- Introduction to Descriptive Statistics
- Role of statistics in data analysis, key statistical methods and terms, types of data and attributes, and visualization techniques; includes central tendency, dispersion measures, empirical rules, boxplots, and correlation analysis.
- Introduction to Exploratory Data Analysis (EDA)
- Introduction to EDA, Descriptive data measures, 5-point summary and skewness of data, Box-plot, covariance and coefficient of correlation, Let's get our hands dirty with code, Univariate and multivariate analysis, Encoding categorical data, Scaling and normalization, Preprocessing, Imputing missing values, Working with outliers.
- Supervised Learning - Linear Regression
- Concepts of machine learning and importance, Feature or mathematical space, Supervised machine learning - Introduction, Linear regression and its Pearson’s coefficient, Linear regression mathematically and coefficient of determination.
- Supervised Learning - Logistic Regression
- Overview of Logistic Regression as a classification algorithm, Understanding the sigmoid function
- Introduction to Decision Trees
- Concept and structure of Decision Trees
- Introduction to Ensemble Techniques
- Ensemble methods, Bagging, Bagging - Hands-on exercise, Boosting, Types of boosting, Adaboosting - Hands-on exercise, Gradient Boosting - Hands-on exercise, Random Forest.
- Introduction to Unsupervised Learning
- Unsupervised learning, Clustering - types and distance, K-means clustering.
Taught by
Prof. Mukesh Rao and Dr. Abhinanda Sarkar