LLM Inference On-Device in React Native - The Practical Aspects

Explore the practical implementation of large language model inference directly on mobile devices within React Native applications in this 31-minute conference talk. Discover why development teams are shifting inference to on-device processing, focusing on benefits like improved reliability, enhanced privacy protection, and reduced latency. Learn about the critical constraints imposed by model size limitations and examine OS-provisioned alternatives available across different platforms. Compare acceleration options including GPU, NPU, and CPU implementations, understanding the specific tradeoffs between iOS and Android environments. Gain insights into debugging performance bottlenecks within abstraction layers such as OpenCL, and evaluate various inference frameworks including TensorFlow Lite, ONNX, ExecuTorch, MLC, llama.cpp, and Apple's native solutions. Master practical optimization techniques including quantization strategies and compilation-time improvements that can significantly enhance performance. Understand the current landscape of on-device AI implementation in React Native, including what solutions are production-ready today and which areas still require careful consideration of tradeoffs when building mobile AI applications.