Where Does Negative Transfer Come From? On the Implicit Bias of SGD in Multi-Task Learning

Explore the complexities of negative transfer in multi-task learning through this 29-minute conference talk by David Mueller from the Center for Language & Speech Processing at Johns Hopkins University. Delve into the relationship between task conflict and negative transfer, discovering that negative outcomes can occur even without significant task conflicts. Examine the crucial role of optimization temperature in negative transfer, and learn how poorly chosen hyperparameters may be responsible for suboptimal performance rather than inherent task incompatibilities. Investigate the connection between these findings and the implicit bias of Stochastic Gradient Descent (SGD), which suggests a preference for solutions with high gradient coherence. Uncover the limitations of current explanations for negative transfer, including task conflict and implicit bias, and recognize the need for innovative multi-task optimization methods. Challenge conventional wisdom about neural network generalization and gain insights that could reshape approaches to multi-task learning optimization.