Watch a 48-minute research talk from Google Research scientist Chiyuan Zhang at the Simons Institute exploring how large language models (LLMs) handle logical reasoning tasks. Dive into findings about the complex relationship between memorization and genuine reasoning capabilities in LLMs through experiments with Knights and Knaves puzzles. Learn how these models can achieve near-perfect accuracy on training puzzles through memorization yet struggle with slight variations, while still showing improved generalization after fine-tuning. Examine detailed analyses including perturbation tests, cross difficulty-level transfer, model probing, and experiments with incorrect answers that reveal how LLMs balance memorization versus actual reasoning when solving logical problems. Understand the implications of per-sample memorization scoring for determining when models rely on memorized patterns versus engaging in true logical reasoning.