On the Power of Forward Pass Through Transformer Architectures

Explore a technical seminar presentation from Princeton University's Abhishek Panigrahi examining the computational mechanisms of transformer architectures during forward passes. Delve into two key phenomena: first, discover how moderate-sized BERT language models learn linguistic structures and parse trees during pre-training, with insights gained through synthetic PCFG data and the inside-outside algorithm. Then investigate in-context learning capabilities of large language models through the innovative Transformer in Transformer (TinT) framework, which demonstrates how a 1.3B parameter model can simulate and fine-tune a 125M parameter model in a single forward pass. Learn about the implications of these findings for understanding transformer inference processes and potential architectural improvements. Gain valuable insights into the internal workings of transformer models and their ability to execute complex computational tasks during inference.