LLMOps: OpenVino Toolkit Quantization 4int LLama 3.2 3B and Inference on CPU
The Machine Learning Engineer via YouTube
Build AI Apps with Azure, Copilot, and Generative AI — Microsoft Certified
AI Engineer - Learn how to integrate AI into software applications
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to convert the LLAMA3.2 3 Billion parameter model to OpenVino IR format and quantize it to 4-bit integer precision. Follow along as the process of model conversion and quantization is demonstrated step-by-step. Discover how to perform inference on a CPU using Chain of Thought (CoT) prompts with the optimized model. Access the accompanying Jupyter notebook for hands-on practice and deeper understanding of the LLMOps techniques covered in this 26-minute tutorial on data science and machine learning.
Syllabus
LLMOps: OpenVino Toolkit quantization 4int LLama3.2 3B, Inference CPU #datascience #machinelearning
Taught by
The Machine Learning Engineer