Manage Cloud Native LLM Workloads Across Edge and Cloud Seamlessly Using KubeEdge and WasmEdge

This conference talk explores how to deploy Large Language Models (LLMs) beyond data centers to edge devices using KubeEdge and WasmEdge integration. Learn how this powerful combination addresses key challenges in edge AI deployment, including maintaining accuracy with limited resources and simplifying cross-device deployment. Discover how WasmEdge provides a lightweight, portable runtime under 50MB with no external dependencies, while KubeEdge Sedna orchestrates edge-cloud collaboration by monitoring inference accuracy and automatically routing requests to cloud-based models when needed. See a demonstration of how small LLMs deliver quick local inference at the edge, with seamless transition to larger cloud models when higher accuracy is required. The presenters showcase how inference workloads built in Rust and compiled to WebAssembly can be deployed across edge and cloud environments without modifications. This solution has been successfully implemented in production across multiple industries including aerospace and banking.