Overview

This online data science course covers the essential concepts for working with real datasets in Python. You’ll learn NumPy fundamentals such as array creation, indexing, slicing, and core operations, so you can process data faster and avoid slow manual calculations. You’ll also learn how to combine and split arrays and apply arithmetic functions to prepare data for analysis and modeling.You’ll understand concepts like Bayes theorem and common distributions, then use statistics such as central tendency, dispersion, boxplots, and correlation to explain patterns clearly. You’ll also build hands-on EDA skills, including 5-point summaries, skewness checks, missing value imputation, outlier handling, encoding categorical data, and scaling, so your datasets become reliable inputs for machine learning. Learn core models like regression, decision trees, and k-means clustering, and apply these skills to projects like the MovieLens dataset and building an Azure chatbot. By the end of the course, you’ll be able to summarize data and make confident decisions using probability, distributions, and descriptive statistics.

Syllabus

Introduction to NumPy

Introduction to NumPy, Indexing an array, Slicing an array, Operations on an array, Arithmetic functioning in NumPy, Concatenation of arrays, Splitting of arrays.

Introduction to Pandas

Introduction to Pandas, Introduction to data structures, Introduction to Pandas Series and creating Series, Manipulating Series, Introduction to DataFrames and creating DataFrame, Manipulating the DataFrames, Reading data from different sources.

Introduction to Probability and Distributions

Introduction to Probability and Distributions, Probability - Meaning and Concepts, Rules for Computing Probability, Marginal Probability and Example, Bayes Theorem and Example, Binomial Distribution and Example, Normal Distribution and Example, Poisson Distribution and Example.

Introduction to Descriptive Statistics

Role of statistics in data analysis, key statistical methods and terms, types of data and attributes, and visualization techniques; includes central tendency, dispersion measures, empirical rules, boxplots, and correlation analysis.

Introduction to Exploratory Data Analysis (EDA)

Introduction to EDA, Descriptive data measures, 5-point summary and skewness of data, Box-plot, covariance and coefficient of correlation, Let's get our hands dirty with code, Univariate and multivariate analysis, Encoding categorical data, Scaling and normalization, Preprocessing, Imputing missing values, Working with outliers.

Supervised Learning - Linear Regression

Concepts of machine learning and importance, Feature or mathematical space, Supervised machine learning - Introduction, Linear regression and its Pearson’s coefficient, Linear regression mathematically and coefficient of determination.

Supervised Learning - Logistic Regression

Overview of Logistic Regression as a classification algorithm, Understanding the sigmoid function

Introduction to Decision Trees

Concept and structure of Decision Trees

Introduction to Ensemble Techniques

Ensemble methods, Bagging, Bagging - Hands-on exercise, Boosting, Types of boosting, Adaboosting - Hands-on exercise, Gradient Boosting - Hands-on exercise, Random Forest.