Running Open Source LLMs Locally on RTX 5090 - Performance and Capabilities

Explore the capabilities of running powerful open-source large language models (LLMs) locally on an RTX 5090 GPU in this 33-minute video. Learn how to use LM Studio to run models like DeepSeek R1 and GEMMA 3 27B entirely on your own hardware without cloud dependencies. The demonstration covers text generation, multimodal AI applications, and discusses future possibilities for video and image generation. As part 2 of a three-part series on the RTX 5090, this tutorial provides a comprehensive walkthrough from initial setup to pushing the limits with 32B parameter models, analyzing performance metrics, and testing various capabilities including humor comprehension and meme analysis. Follow along with timestamps that guide you through each section, from basic installation to advanced model configuration and real-world applications.

Syllabus

00:00 Introduction to Running LLMs Locally
01:27 Setting Up LM Studio
01:58 Testing DeepSeek R1 on RTX 5090
02:42 Exploring Model Settings and Performance
03:48 Generating Content with DeepSeek R1
06:19 Loading Larger Models
09:44 Pushing the Limits with 32B Models
12:47 Reflections on Local AI Performance
16:55 Introduction to GEMMA 327B
17:15 Setting Up the Model
18:00 First Impressions and Performance
18:47 Roasting with GEMMA
22:48 Analyzing Memes and Humor
27:53 Exploring Smaller LLMs
30:49 Conclusion and Future Plans