Exploring Geometric Representational Disparities Between Multilingual and Bilingual Translation Models

Dive into a 40-minute conference talk presented by Neha Verma, a 3rd year PhD student at the Center for Language & Speech Processing (CLSP) at Johns Hopkins University. Explore the geometric differences in representations between bilingual and one-to-many multilingual translation models. Examine how these models utilize dimensions in their underlying vector space through measures of isotropy, including intrinsic dimensionality and IsoScore. Discover the findings that multilingual model decoder representations are consistently less isotropic than their bilingual counterparts for a given language pair. Investigate how this anisotropy in multilingual decoder representations is largely attributed to modeling language-specific information, potentially limiting remaining representational capacity. Gain insights into the implications of these disparities for parameter efficiency and overall performance in multilingual machine translation, particularly in one-to-many translation settings.