Memory-Efficient LLM Inference on Edge Devices With NNTrainer

Explore how the NNTrainer open-source project achieves memory-efficient Large Language Model (LLM) inference on edge devices in this 26-minute conference talk. Discover how NNTrainer, originally optimized for training neural networks on memory-constrained devices, repurposes its battle-proven memory schedulers and memory-storage cooperation infrastructure to enable larger LLMs to run on smaller devices. Learn about NNTrainer's key technologies for minimizing memory footprint, including memory-efficient tensor scheduling that intelligently manages memory resources and a novel approach that leverages flash storage as auxiliary memory to extend capacity for larger models. See practical examples of running LLMs on devices using NNTrainer's optimization techniques. Understand how this Linux Foundation AI & Data project, currently part of the NNStreamer organization and under review to become an independent LFAI project, contributes to making advanced AI models more accessible on resource-limited hardware through innovative memory management solutions.