Query Inside the File - AI Engineering for Audio, Video and Sensor Data

Explore advanced AI engineering techniques for processing audio, video, and sensor data in this 22-minute conference talk from MLOps World GenAI Summit 2025. Learn how to unlock the full potential of unstructured data by querying specific segments within large media files rather than processing entire datasets. Discover real-world applications where audio, video, and sensor data are transformed into structured, queryable assets directly from S3, enabling precise segmentation, object detection, event filtering, and context-aware LLM prompts. Master the use of DataChain to manipulate complex data types including bounding boxes, video frames, and time-based slices using Pydantic data models, dramatically improving inference speed, cost, and accuracy. Understand how to ask targeted questions like "What's happening in this 12-second clip where two people enter the car?" instead of processing gigabytes of unnecessary data. Gain insights into turning raw media into structured, queryable data for LLM pipelines, leveraging Pydantic data models for representing multimodal assets, and building scalable pipelines for real-world multimodal workflows that redefine how AI teams approach massive media file processing in production environments.