Code Your Own Llama 4 LLM from Scratch - Full Course

Code Your Own Llama 4 LLM from Scratch - Full Course

freeCodeCamp.org via freeCodeCamp Direct link

- 0:03:11 How LLMs Predict the Next Word

7 of 55

7 of 55

- 0:03:11 How LLMs Predict the Next Word

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Code Your Own Llama 4 LLM from Scratch - Full Course

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - 0:00:00 Introduction to the course
  2. 2 - 0:00:15 Llama 4 Overview and Ranking
  3. 3 - 0:00:26 Course Prerequisites
  4. 4 - 0:00:43 Course Approach for Beginners
  5. 5 - 0:01:27 Why Code Llama from Scratch?
  6. 6 - 0:02:20 Understanding LLMs and Text Generation
  7. 7 - 0:03:11 How LLMs Predict the Next Word
  8. 8 - 0:04:13 Probability Distribution of Next Words
  9. 9 - 0:05:11 The Role of Data in Prediction
  10. 10 - 0:05:51 Probability Distribution and Word Prediction
  11. 11 - 0:08:01 Sampling Techniques
  12. 12 - 0:08:22 Greedy Sampling
  13. 13 - 0:09:09 Random Sampling
  14. 14 - 0:09:52 Top K Sampling
  15. 15 - 0:11:02 Temperature Sampling for Controlling Randomness
  16. 16 - 0:12:56 What are Tokens?
  17. 17 - 0:13:52 Tokenization Example: "Hello world"
  18. 18 - 0:14:30 How LLMs Learn Semantic Meaning
  19. 19 - 0:15:23 Token Relationships and Context
  20. 20 - 0:17:17 The Concept of Embeddings
  21. 21 - 0:21:37 Tokenization Challenges
  22. 22 - 0:22:15 Large Vocabulary Size
  23. 23 - 0:23:28 Handling Misspellings and New Words
  24. 24 - 0:28:42 Introducing Subword Tokens
  25. 25 - 0:30:16 Byte Pair Encoding BPE Overview
  26. 26 - 0:34:11 Understanding Vector Embeddings
  27. 27 - 0:36:59 Visualizing Embeddings
  28. 28 - 0:40:50 The Embedding Layer
  29. 29 - 0:45:31 Token Indexing and Swapping Embeddings
  30. 30 - 0:48:10 Coding Your Own Tokenizer
  31. 31 - 0:49:41 Implementing Byte Pair Encoding
  32. 32 - 0:52:13 Initializing Vocabulary and Pre-tokenization
  33. 33 - 0:55:12 Splitting Text into Words
  34. 34 - 1:01:57 Calculating Pair Frequencies
  35. 35 - 1:06:35 Merging Frequent Pairs
  36. 36 - 1:10:04 Updating Vocabulary and Tokenization Rules
  37. 37 - 1:13:30 Implementing the Merges
  38. 38 - 1:19:52 Encoding Text with the Tokenizer
  39. 39 - 1:26:07 Decoding Tokens Back to Text
  40. 40 - 1:33:05 Self-Attention Mechanism
  41. 41 - 1:37:07 Query, Key, and Value Vectors
  42. 42 - 1:40:13 Calculating Attention Scores
  43. 43 - 1:41:50 Applying Softmax
  44. 44 - 1:43:09 Weighted Sum of Values
  45. 45 - 1:45:18 Self-Attention Matrix Operations
  46. 46 - 1:53:11 Multi-Head Attention
  47. 47 - 1:57:55 Implementing Self-Attention
  48. 48 - 2:10:40 Masked Self-Attention
  49. 49 - 2:37:09 Rotary Positional Embeddings RoPE
  50. 50 - 2:38:08 Understanding Positional Information
  51. 51 - 2:40:58 How RoPE Works
  52. 52 - 2:49:03 Implementing RoPE
  53. 53 - 2:56:47 Feed-Forward Networks FFN
  54. 54 - 2:58:50 Linear Layers and Activations
  55. 55 - 3:02:19 Implementing FFN

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.