Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Open Lakehouse and AI - Building Foundations with ClickHouse, Apache Iceberg, LLMs, and AWS S3 Tables

Altinity via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore how to integrate databases, data lakes, and real-time systems for AI workloads through three comprehensive conference talks from industry experts. Learn to build cost-efficient, high-performance AI architectures by integrating ClickHouse with Apache Iceberg, separating compute and storage while leveraging open table formats and caching strategies to accelerate queries on cold data. Discover how large language models can translate natural language questions into SQL on streaming data by combining Apache Kafka, Iceberg, and query engines via the Model Context Protocol, while addressing translation challenges, benchmarking, security, and adoption considerations. Master Amazon S3 Tables, AWS's fully managed Apache Iceberg service, including its core features such as atomic snapshots, high throughput operations, automated maintenance and compaction, performance tuning options, and migration paths for streaming analytics workloads. Gain practical insights through live demonstrations of query acceleration using swarm compute and caching, real-time pipeline construction with LLMs, and hands-on exploration of copy-on-write versus merge-on-read update modes, making this essential viewing for data engineers, architects, and AI practitioners working with modern data infrastructure.

Syllabus

0:00:00 - Introduction
0:06:40 - Building a foundation for AI with ClickHouse and Apache Iceberg - Altinity
00:15:08 - Challenges of shared‑nothing architecture and AI workloads
00:17:26 - Introducing Apache Iceberg and ClickHouse integration
00:24:30 - Parquet & MergeTree performance benchmarks and Iceberg catalog
00:27:57 - Hybrid tables and tiered storage strategies
00:31:10 - Demo - Query acceleration using swarm compute and caching
00:37:00 - Discussion and Q&A on open formats
00:53:49 - Teaching Databases to Speak Human with LLMs and MCP - Confluent
00:54:42 - Demo - Building a real‑time pipeline and LLM
00:59:29 - Challenges of text‑to‑SQL translation and benchmark evolution
01:06:30 - Connecting LLMs to data via MCP tools
01:09:52 - Streaming data with Apache Kafka and Trino
01:14:42 - Evaluation, security and governance considerations
01:18:30 - Adoption outlook and conclusion
01:21:00 - Conclusion and Q&A
01:27:12 - Managed Apache Iceberg With Amazon S3 Tables - AWS
01:27:50 - Amazon S3 use cases
01:29:29 - Iceberg advantages and capabilities
01:33:36 - Copy‑on‑write vs merge‑on‑read update modes
01:36:47 - Overview of Amazon S3 Tables service and momentum
01:41:55 - Maintenance and compaction features
01:51:11 - Performance tuning, streaming analytics, Agents, & MCP
01:54:25 - Iceberg REST catalog endpoints Glue vs S3 IRC and integration choices
01:56:38 - S3-to-S3 Tables migration and conclusion

Taught by

Altinity

Reviews

Start your review of Open Lakehouse and AI - Building Foundations with ClickHouse, Apache Iceberg, LLMs, and AWS S3 Tables

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.