Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the critical importance of accurate query execution cost estimation in distributed big data environments in this 28-minute conference talk from DSC EUROPE 24. Vadim Opolski examines how different query execution plans impact performance in tools like Hadoop, Hive, and Spark, while highlighting the limitations of current open-source optimization approaches that rely primarily on heuristics and statistics. Learn about the emerging potential of AI-based cost estimation methods that leverage machine learning models trained on multi-client data, as implemented in commercial platforms like Databricks and Vertica. Discover the measurable performance improvements achieved through these AI-driven approaches, validated through TPC-DS and TPC-H benchmarks, and understand the opportunities for open-source tools to collect and utilize data for training similar models. Gain valuable insights into how these advancements could revolutionize query optimization across the big data ecosystem.
Syllabus
AI-driven query plan optimization in big data processing tools | Vadim Opolski | DSC EUROPE 24
Taught by
Data Science Conference