Learn about the latest developments in AI/ML inference workloads on Kubernetes in this 32-minute conference talk from CNCF. Explore how the Kubernetes Working Group Serving (WG Serving) addresses the challenges posed by Generative AI through enhanced serving infrastructure solutions. Discover the group's initiatives focused on optimizing compute-intensive inference scenarios using specialized accelerators, which benefit web services and stateful databases. Gain detailed insights into WG Serving's workstreams and ongoing developments, while learning about opportunities for model server authors and practitioners to leverage Kubernetes capabilities for serving workloads. Understand how to contribute to the advancement of AI/ML inference on Kubernetes and participate in this evolving technological landscape.