Get 20% off all career paths from fullstack to AI
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn how to extend the context length of Large Language Models (LLMs) during inference through a technical deep dive video that introduces grouped self-attention as an alternative to classical transformer self-attention mechanisms. Explore the challenges of out-of-distribution issues related to positional encoding when LLMs process text sequences beyond their pre-training context window. Examine implementation details, smooth transition techniques, and benchmark data while following along with code demonstrations based on the research paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning." Master practical solutions for handling longer sequences in neural networks without requiring model retraining or fine-tuning.
Syllabus
Introduction
Theory
Main idea
Implementation
SelfExtend LLM
Deep Dive
Smooth Transition
Benchmark Data
Publication
Code Implementation
Taught by
Discover AI