Efficient, High-Performance AI Inferencing with Intel Xeon 6

Discover how Intel Xeon 6 processors deliver cost-effective, high-performance AI inference solutions for enterprise environments in this 14-minute conference talk from Ray Summit 2025. Learn about the accelerating demand for AI inference workloads that now outpace training in enterprise settings and explore how CPU-based inference excels in RAG applications, intelligent chatbots, automated content creation, document summarization, and agentic workflows where predictable latency and security are critical. Examine Intel Xeon 6's AI-centric hardware advancements including AMX (Advanced Matrix Extensions) for inference speedups, enhanced memory bandwidth through MRDIMMs, high multi-core parallelism for scalable inference, and Intel TDX for confidential computing and secure execution. Understand deployment acceleration through vLLM integration and Intel's turnkey enterprise solutions—Intel AI for Enterprise Inference and Intel AI Enterprise RAG—designed to reduce infrastructure complexity. See how Anyscale Ray amplifies Xeon's capabilities by orchestrating distributed inference across hybrid infrastructure with autoscaling, resource-aware scheduling, and workload portability to dynamically handle variable AI traffic, creating a unified, enterprise-grade compute fabric that democratizes AI inference across industries.