Towards Sequence-to-Sequence Models Without Activation Functions

This talk by Grigorios Chrysos from the University of Wisconsin-Madison explores the fundamental question of whether neural networks truly need activation functions. Examine how high-order interactions among input elements might provide sufficient expressivity for complex tasks without traditional activation functions. Learn about the challenges activation functions pose for deep learning theory, network dynamics analysis, interpretability, and privacy. Discover research findings on networks that achieve strong performance in demanding static tasks like ImageNet recognition and sequence-to-sequence tasks such as arithmetic operations and language modeling without conventional activation functions. Part of the "The Future of Language Models and Transformers" series at the Simons Institute, this 56-minute presentation challenges core assumptions about neural network architecture design.