Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This specialization provides a comprehensive understanding of relational databases, data preprocessing, and big data technologies. Learners will explore database design, implementation, and applications, along with data preparation and analysis techniques. The specialization is designed to help data scientists and analysts manage and analyze large datasets efficiently using industry-standard tools.
Syllabus
- Course 1: Introduction to Relational Databases
- Course 2: Relational Database Design
- Course 3: Relational Database Implementation and Applications
- Course 4: Data Preparation and Analysis
- Course 5: Big Data Technologies
Courses
-
This course provides you with the opportunity to learn about relational database design. You will be provided an in-depth understanding of the design principles and methodologies involved in creating well-structured, normalized, and efficient relational databases to manage data for small, medium, and large-scale enterprises. Possessing database design skills will enable you to excel in careers such as Database Administrators, Data Analysts, Software Developers, Data Engineers, and Business Intelligence Developers; capitalizing on the ability to create robust and efficient data solutions for any organization. These are one of the top sought-after careers across many industries today. At the end of this course, you will be able to: - Describe the process and the design aspects involved in relational database design. - Interpret the main components of an Entity-Relationship diagram (ERD) using unified modeling language (UML) notation. - Develop entity-relationship diagrams using basic and extended Entity-relationship features in relational design. - Translate Entity-Relationships diagrams into logical schemas (relation schemas). - Describe the theory and practical application of functional dependencies in relational database design. - Use the theory to recognize candidate keys and primary keys. - Derive minimal and canonical covers of functional dependencies. - Describe the principles of database normalization. - Identify and apply normalization techniques.
-
Database management systems are a crucial part of most large-scale industry and open-source systems. This course will introduce you to important concepts of database systems and design. We will learn what relational databases are, what they are used for, the theory underlying their design, and how to query and modify a database using the declarative SQL language. At the end of the course, you will be able to: - Describe what relational databases are, and how they are used. - Master the Relational Database Model. - Demonstrate proficiency in formal relational database theory. - Demonstrate comprehensive SQL skills. - Apply database knowledge to practical problems. Software Requirements: Jupyter Notebooks, SQL
-
In today's data-driven world, the ability to work with relational databases is an essential skill for professionals in various fields. This course is designed to equip you with the knowledge and practical skills needed to become proficient in database management and application development. Whether you are pursuing a career as a database administrator, software developer, or data analyst, this course provides you with a strong foundation to excel in your chosen field. By the end of this course, students will be able to: • Describe relational databases and their core components, including tables, rows, columns, and keys. • Implement relational database and usage of indexes, views, triggers, temporary tables, functions, and stored procedures. • Describe their role in enforcing business logic and data integrity in database environment. • Apply database design and SQL knowledge to real-world application development. • Develop database-driven applications using programming languages, such as Java, Python or C/C++ and frameworks. • Describe the concepts of indexing and hashing in efficient support for search operations. • Describe the concepts of transactions and their properties (ACID: Atomicity, Consistency, Isolation, Durability). • Define concurrency control and understand the impact of uncontrolled concurrent transactions on data integrity. Software Requirements: VS Code editor, MySQL Workbench, PostgreSQL To succeed in this course, learners should possess a solid understanding of relational database design. If you haven't yet mastered these skills, we strongly recommend completing Introduction to Relational Databases and Relational Database Design beforehand. These foundational courses are designed to equip you with the essential knowledge necessary to excel in this material.
-
This course introduces the necessary concepts and common techniques for analyzing data. The primary emphasis is on the process of data analysis, including data preparation, descriptive analytics, model training, and result interpretation. The process starts with removing distractions and anomalies, followed by discovering insights, formulating propositions, validating evidence, and finally building professional-grade solutions. Following the process properly, regularly, and transparently brings credibility and increases the impact of the results. This course will cover topics including Exploratory Data Analysis, Feature Screening, Segmentation, Association Rules, Nearest Neighbors, Clustering, Decision Tree, Linear Regression, Logistic Regression, and Performance Evaluation. Besides, this course will review statistical theory, matrix algebra, and computational techniques as necessary. This course prepares students ready for and capable of the data preparation and analysis process. Besides developing Python codes for carrying out the process, students will learn to tune the software tools for the most efficient implementation and optimal performance. At the end of this course, students will have built their inventory of data analysis codes and their confidence in advocating their propositions to the business stakeholders. Required Textbook: This course does not mandate any textbooks because the lecture notes are self-contained. Optional Materials: A Practitioner's Guide to Machine Learning (abbreviated PGML for Reading) Software Requirements: Python version 3.11 or above with the latest compatible versions of NumPy, SciPy, Pandas, Scikit-learn, and Statsmodels libraries. To succeed in this course, learners should possess a basic knowledge of linear algebra and statistics, basic set theory and probability theory, and have basic Python and SQL skills. A few courses that can help equip you with the database knowledge needed for this course are: Introduction to Relational Databases, Relational Database Design, and Relational Database Implementation and Applications.
-
Big data is the area of informatics focusing on datasets whose size is beyond the ability of typical database and other software tools to capture, store, analyze and manage. This course provides a rapid immersion into the area of big data and the technologies which have recently emerged to manage it. We start with an introduction to the characteristics of big data and an overview of the associated technology landscape and continue with an in depth exploration of Hadoop, the leading open source framework for big data processing. Here the focus is on the most important Hadoop components such as Hive, Pig, stream processing and Spark as well as architectural patterns for applying these components. We continue with an exploration of the range of specialized (NoSQL) database systems architected to address the challenges of managing large volumes of data. Overall the objective is to develop a sense of how to make sound decisions in the adoption and use of these technologies as well as economically deploy them on modern cloud computing infrastructure.
Taught by
Gerald Balekaki, Jawahar Panchal, Ming-Long Lam and Yousef Elmehdwi