Context Platform Engineering to Reduce Token Anxiety

Learn about Context Platform Engineering, a specialized set of skills and tools designed to optimize systems for Agent Swarm Context at any scale in this 24-minute conference talk. Discover how to design, size, and configure AI inference platforms to maximize KV-cache hit rates, which according to Manus AI is the single most important metric for production-stage AI agents. Explore WEKA's new open source context platform engineering toolkit that translates Service Level Agreement (SLA) requirements of AI Agents into Agent+LLM inference platform Service Level Objectives (SLOs). Examine research results from WEKA Labs providing new observability into both unit and aggregate KV Cache hit rates consumed by agent swarms of various leading AI coding agents. Review benchmark results for sizing agent swarm context for arbitrary working sets, including context window sizes, latency, concurrency, and throughput SLOs per agent unit across modern GPU memory hierarchies, with support for KV Cache offloading plug-ins like vLLM/LMCache, SGLang HiCache, and NVIDIA Dynamo KVBM/NIXL. Gain insights from Val Bercovici, Chief AI Officer at WEKA and former CTO of NetApp/SolidFire, along with founding governing board member experience of the Kubernetes CNCF in the Linux Foundation.