Building LLM-Based Apps with Llama 3 at 1,000 Tokens per Second on the SambaNova AI Platform
AI Engineer via YouTube
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn to build high-performance LLM applications using Llama-3 at 1,000 tokens per second on SambaNova's AI platform in this intermediate-level workshop. Discover SambaNova's full-stack generative AI platform powered by the SN40L AI chip and explore Samba-1, a trillion parameter composition of experts model designed for enterprise applications. Build and deploy an end-to-end question-answering system with retrieval augmented generation (RAG) for enterprise search using a comprehensive technology stack including LangChain framework, Unstructured for text preprocessing, E5-large-v2 embeddings, ChromaDB vector store, and Llama-3-8B-Instruct model. Gain hands-on experience through practical exercises with provided GitHub repositories, step-by-step Jupyter notebooks, and Streamlit applications while utilizing SambaNova API keys for both CoE and Llama-3 endpoints. Designed for tech professionals and engineers interested in enterprise generative AI applications, this workshop requires programming experience (preferably Python), a GitHub account, and a laptop for the practical components.
Syllabus
[Full Workshop] Llama 3 at 1,000 tok/s on the SambaNova AI Platform
Taught by
AI Engineer