Improving LLM Generalization by Selecting and Synthesizing Data

Watch a technical lecture from Stanford Assistant Professor Tatsunori Hashimoto exploring approaches to enhance language model generalization through data selection and synthesis. Discover methods for controlling the gap between pretraining and evaluation by algorithmically filtering training data to focus on benchmark-relevant distributions and adapting to new domains through synthetic data generation. Learn from Hashimoto's expertise in statistical machine learning and natural language processing, including his work on instruction-following, controllable language models, differentially private fine-tuning, and benchmarks for LM safety and capabilities. Gain insights from this accomplished researcher who has received numerous accolades including the NSF CAREER award and best paper awards at major conferences like ICML, ICLR, and CHI.