Build AI Apps with Azure, Copilot, and Generative AI — Microsoft Certified
Learn EDR Internals: Research & Development From The Masters
Overview
Build a Learning Habit
Download Class Central's free printable study calendar
Download for Free
Explore the challenges and solutions for managing schema evolution in data lakes through this informative EuroPython 2021 conference talk. Learn best practices for storage, control, scalability, and availability in data lake design. Discover how Episource tackled the complex task of storing and searching evolving nested JSON data from their NLP engine processing millions of medical documents. Gain insights into implementing a solution using AVRO format for schema evolution, leveraging a Schema registry for version control, and utilizing Athena for distributed SQL queries. Understand the benefits of both "schema-on-write" and "schema-on-read" approaches in maintaining data integrity and compatibility across schema changes.
Syllabus
Prakshi Yadav - Data lake: Design for schema evolution
Taught by
EuroPython Conference