Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

IBM

Data Science with R - Capstone Project

IBM via Coursera

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
In this capstone course, you will apply various data science skills and techniques that you have learned as part of the previous courses in the IBM Data Science with R Specialization or IBM Data Analytics with Excel and R Professional Certificate. For this project, you will assume the role of a Data Scientist who has recently joined an organization and be presented with a challenge that requires data collection, analysis, basic hypothesis testing, visualization, and modeling to be performed on real-world datasets. You will collect and understand data from multiple sources, conduct data wrangling and preparation with Tidyverse, perform exploratory data analysis with SQL, Tidyverse and ggplot2, model data with linear regression, create charts and plots to visualize the data, and build an interactive dashboard. The project will culminate with a presentation of your data analysis report, with an executive summary for the various stakeholders in the organization.

Syllabus

  • Module 1 - Capstone Overview and Data Collection
    • In this module, you will be introduced to the Data Science with R Capstone Project and the problem scenario you will be working on throughout the project. You will explore the datasets used in the project and understand how data can be collected from different sources. You will learn how to gather data using web scraping techniques to extract information from HTML pages and how to use HTTP requests with the OpenWeather API to retrieve weather data. The collected data will then be organized into structured formats such as data frames for further analysis.
  • Module 2 - Data Wrangling
    • In this module, you will learn how to clean and prepare datasets for analysis through various data wrangling techniques. You will work with web-scraped data and apply methods such as renaming columns, cleaning text using regular expressions, and removing unnecessary links or characters. You will also learn how to handle missing data, convert categorical values into numeric formats, and perform data normalization to prepare the dataset for further analysis. Through hands-on labs, you will practice using functions and data manipulation techniques to transform raw data into a clean and structured format.
  • Module 3: Performing Exploratory Data Analysis with SQL, Tidyverse & ggplot2
    • At this stage of the Capstone Project, you have gained some valuable working knowledge of data collection and data wrangling. You have also learned a lot about SQL querying and visualization. Congratulations! Now it's time to apply some of your new knowledge and learn about Exploratory Data Analysis (EDA) techniques, again through practice. You can use the datasets you wrangled in the previous Module. However, if you had any issues completing the wrangling, no worries - we have prepared some clean datasets for you to use. You will be asked to complete three labs:
  • Module 4: Predictive Analysis
    • In this module, you will learn how to build and evaluate regression models to predict hourly bike-sharing demand using weather and datetime data. You will begin by constructing a baseline linear regression model and then improve the model by incorporating polynomial, interaction, and regularization terms. Through hands-on labs, you will compare different models and evaluate their performance using metrics such as R-squared and RMSE. You will also analyze the influence of predictor variables by examining and visualizing their coefficients.
  • Module 5 - Building a R Shiny Dashboard App
    • In this module, you will learn how to build an interactive dashboard application using R Shiny to visualize bike-sharing demand predictions. You will create a dashboard that integrates a Leaflet map to display predicted demand across different cities and allows users to explore the results through interactive controls such as dropdown menus. You will also enhance the dashboard by incorporating data visualizations with ggplot2 to display detailed bike-sharing demand trends for selected cities. Through hands-on labs, you will gain practical experience in designing and improving interactive data applications.
  • Module 6 - Present Your Data-Driven Insights
    • In this final module, you will focus on presenting the results of your capstone project. You will create a comprehensive PowerPoint presentation that highlights the key steps of your analysis, the insights you discovered, and the outcomes of your predictive modeling work. You will learn best practices for structuring and communicating data-driven findings effectively. After preparing your presentation, you will submit your final project either through an AI-graded submission or a peer-reviewed submission.

Taught by

Yan Luo

Reviews

4.6 rating at Coursera based on 111 ratings

Start your review of Data Science with R - Capstone Project

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.