Tokenized and Continuous Embedding Compressions of Protein Sequence and Structure
Valence Labs via YouTube
Build AI Apps with Azure, Copilot, and Generative AI — Microsoft Certified
Get 20% off all career paths from fullstack to AI
Overview
Google, IBM & Meta Certificates – 40% Off
One plan covers every Professional Certificate on Coursera.
Unlock All Certificates
Explore a comprehensive lecture on protein machine learning representations, focusing on the joint distribution of sequence and structure. Dive into the analysis of ESMFold embeddings, uncovering massive activations and their implications. Learn about continuous compression schemes that significantly reduce ESMFold embeddings while maintaining structural information and performance on protein function benchmarks. Discover a novel tokenized all-atom structure vocabulary that enables high reconstruction accuracy from sequence alone. Examine the CHEAP (Compressed Hourglass Embedding Adaptations of Proteins) embeddings and the HPCT (Hourglass Protein Compression Transformer) architecture, understanding their potential for compact representation of protein structure and sequence. Gain insights into information content asymmetries between sequence and structure, and explore the democratization of representations captured by large models. Investigate the flexible downstream applications of CHEAP embeddings, including generation, search, and prediction. The lecture concludes with a Q&A session, providing an opportunity to delve deeper into this cutting-edge research in protein machine learning.
Syllabus
- Introduction
- Background
CHEAP
Q&A
Taught by
Valence Labs