Transformers Discover Molecular Structure Without Graph Priors

Explore how pure Transformer architectures can effectively model molecular structures and properties without relying on predefined graph representations or physical priors in this research presentation. Learn about a groundbreaking study that challenges the dominance of Graph Neural Networks (GNNs) in molecular machine learning by demonstrating that unmodified Transformers trained directly on Cartesian coordinates can achieve competitive performance in predicting molecular energies and forces. Discover how Transformers naturally learn physically consistent patterns, including attention weights that decay with interatomic distance, while maintaining flexibility across different molecular environments due to the absence of hard-coded biases. Examine the comparative analysis between Transformers and state-of-the-art equivariant GNNs on the OMol25 dataset, revealing that Transformers can match GNN performance under equivalent training compute budgets. Understand the implications of predictable scaling improvements in Transformers that follow empirical scaling laws observed in other domains, and consider how this research points toward standardized, scalable architectures for molecular modeling that may eliminate the need for hard-coded graph inductive biases in drug discovery applications.