Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

GPULlama3 Java - Beyond CPU Inference with Modern Java

Devoxx via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore GPU-accelerated Large Language Model inference in Java through this comprehensive conference talk that demonstrates how to leverage modern JDK features and TornadoVM for high-performance AI applications. Learn to implement local LLM inference using Java 21+'s Vector API and projects like JLama and llama3.java, moving beyond traditional CPU-only approaches without requiring Python or specialized runtimes. Discover GPULlama3.java, an open-source framework that extends llama3.java with TornadoVM integration to offload inference computation to GPUs while maintaining full Java compatibility. Master techniques for enabling half-precision data types in the JVM, expressing GPU-optimized matrix operations, implementing fast Flash Attention algorithms, and ensuring compatibility with popular open-source models including Llama 2/3, Gemma, and Mistral. Understand how to integrate with LangChain4j for seamless GPU execution in Java-based inference engines and witness live demonstrations running on diverse hardware from Apple Silicon to high-end NVIDIA GPUs. Gain practical insights into using TornadoVM's profiling and analysis tools to evaluate GPU performance during inference, providing a complete roadmap for building scalable AI applications on the JVM with modern acceleration techniques in a fully Java-native stack.

Syllabus

GPULlama3 java: Beyond CPU Inference with Modern Java by Michalis Papadimitriou

Taught by

Devoxx

Reviews

Start your review of GPULlama3 Java - Beyond CPU Inference with Modern Java

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.