A basic statistical course typically covers the analysis of univariate and bivariate data, and it introduces statistical inference using confidence intervals and hypothesis tests. In this course we take the next step in mastering the fundamentals of data science. We explore multivariate data with numerical and graphical tools, study multivariate distributions and tackle the problem of multiple testing. We build linear regression and ANOVA models, conduct inference and verify the model assumptions. Throughout the course we emphasize the benefits of data visualization, exploratory data analysis and outlier detection. All analyzes are performed using R.
Overview
Syllabus
Module 1: Introduction to R
Module 2: Univariate data
- Random variables and distributions
- Statistical models and estimators
- Maximum likelihood estimators
- Quantile-Quantile plot and test for normality
- Transformation to normality
- Inference about the mean
Module 3: Bivariate data
- Exploratory data analysis, covariance, correlation
- Bivariate distributions
- Bivariate normal distribution
- Simple linear regression
- One-way analysis of variance
Module 4: Multivariate data
- Exploratory data analysis
- Multivariate distributions
- Multivariate normal distribution
- Statistical models and estimators
- Test for multivariate normality
- Outlier detection
- Inference about the mean (multiple testing)
Module 5: Multiple linear regression
- The linear regression model
- The least squares estimator
- Properties of the LS estimator
- Analysis of variance
- Statistical properties
- Statistical inference
- Verifying the model assumptions
- Outlier detection
Taught by
Mia Hubert, Stefan Van Aelst and Tim Verdonck