Self-Incorrect - LLMs Struggle with Discriminating Self-Generated Responses

Watch a 12-minute research presentation from Johns Hopkins University's Center for Language & Speech Processing that examines whether Large Language Models (LLMs) can effectively improve their own outputs through self-discrimination. Learn about a unified framework developed to compare generative and discriminative capabilities of LLMs, and discover key findings that challenge the assumption that these models can enhance their performance through self-judgment alone. Explore experimental analyses conducted on various open-source and industrial LLMs, revealing that models do not consistently perform better at discriminating between previously-generated alternatives compared to generating initial responses. Presented by researcher Dongwei Jiang, this talk discusses findings from their paper investigating the limitations of LLMs in self-correction and discrimination tasks.