Data Scientists spend most of their time cleaning data. In this course, you will learn to convert and manipulate messy data to extract what you need.
Overview
Syllabus
- Data Extraction Fundamentals
- Review the fundamentals of tabular data formats CSV and Excel and learn about the JSON format.
- Problem Set: Data Extraction Fundamentals
- Gain experience working with .csv files and wrangling JSON.
- Data in More Complex Formats
- Start working with XML and learn how to use BeautifulSoup to scrape web pages.
- Problem Set: Data in More Complex Formats
- Practice working with XML and parsing HTML with BeautifulSoup.
- Data Quality
- Learn about what can make data "dirty" and find out how you can audit your data for quality.
- Problem Set: Data Quality
- Practice auditing and cleaning dirty data.
- Working with MongoDB
- Find out how to use MongoDB to store and query your data.
- Problem Set: Working with MongoDB
- Practice wrangling data and inserting data into MongoDB.
- Analyzing Data
- Learn how to create more sophisticated MongoDB queries using pipelines and operators.
- Problem Set: Analyzing Data
- Gain experience with MongoDB pipelines and operators.
- Case Study: OpenStreetMap Data
- Go through a case study showing how to audit, clean and prepare OpenStreetMap data for insertion into a database.
Taught by
Shannon Bradshaw
Tags
Reviews
3.0 rating, based on 9 Class Central reviews
Showing Class Central Sort
-
There are two main parts of that, SQL and MongoDB, SQL lecturer gave more details for students to understand but not for Data Wrangling and MongoDB, skipping lot of details and not well explanation for the codes. Sure that students need to invest more time to study but not recommend for those who do not have python knowledge, it's the worst course so far from data analyst nanodegree.
-
I do not know how anyone can rate the class more than 1 star. The learning materials were pathetic, and not usually on topic. I have had to go through the materials at least 3 times and consulted with Senior Data Analysts at my current employer.…
-
Where to start.
Although self paced, the course was without direction and expectations. I have worked through the lessons so many times and still the materials are not on target for the exercise. The "project Rubric", was a waste of a mouse click as it showed no specifics as to what is expected.
I have spent more than 5 months learning from other sites, (read non-Udacity) on how to proceed but I feel like a ship with out a rudder and no destination. It boggles the mind that there is no desired outcome for the project... Just ... Explore that data... what does that mean anyways? -
I think the project at the end was very helpful. Getting to clean data, convert it from XML to JSON, load it into MongoDB and analyze it took effort. At the end, it seemed worth it.
-
I would highly recommend this course. Since this course is self paced, it is possible to finish it very quickly, as the material is not tough to comprehend. I didn't find any "Final Project Instructions" in the course though.
-