Combining Formal and Informal Information in Bayesian Program Analysis via Soft Evidences

Watch this 16-minute conference presentation from OOPSLA 2025 that introduces a novel neural-symbolic approach to program analysis combining formal analysis techniques with informal information through Bayesian inference. Learn how researchers Tianchi Li and Xin Zhang from Peking University developed a method that converts traditional Datalog program analysis into probabilistic analysis by attaching probabilities to analysis rules, enabling the system to rank potential alarms based on their likelihood rather than simply flagging them. Discover how neural networks evaluate the probability of analysis facts using informal information such as variable names and string constants, which are then encoded as "soft evidences" - essentially noisy sensors that provide additional context to the probabilistic analysis framework. Explore the practical applications demonstrated through improvements to pointer analysis on Java benchmarks using variable name information, and taint analysis for Android applications considering inter-component communication, with results showing significant improvements in alarm ranking accuracy - achieving 55.4% better inversion count, 44.9% improved mean rank, and 58% better median rank for true alarms in pointer analysis. Examine how this soft evidence mechanism generalizes across different analysis types, including demonstrations on taint analysis and interval analysis for C programs using dynamic execution information, establishing a systematic framework for incorporating human-readable program features into formal analysis tools.