Data Reliability Scoring

Learn about data reliability scoring methods for assessing dataset trustworthiness in this conference talk from Harvard's Center of Mathematical Sciences and Applications. Explore the challenge of evaluating data quality when dealing with potentially noisy, biased, or strategically manipulated datasets without access to ground truth. Discover the Gram Determinant Score, a novel reliability measure that uses only reported data and auxiliary observations to assess how well datasets align with unobserved truth. Examine the theoretical foundations and provable guarantees of this scoring method, including its ability to preserve natural reliability orderings. Review experimental results demonstrating the score's effectiveness in capturing data quality across synthetic noise scenarios and contrastive learning embeddings applications. Gain insights into strategic data reporting challenges and statistical approaches for reliability assessment in data-driven decision making contexts.