AutoFeedback - Scaling Human Feedback with Custom Evaluation Models

Explore the innovative AutoFeedback system for scaling human feedback in LLM applications through a 40-minute conference talk by Arjun Bansal, CEO & Co-founder of Log10. Dive into the development of custom evaluation models that combine human and model-based evaluation strengths, significantly improving efficiency and accuracy in LLM evaluation. Learn how these models, built using in-context learning and fine-tuning techniques, have achieved a 44% reduction in absolute error on a 7-point grading task. Discover the models' capability to generate explanations for their grades, enhancing transparency and interpretability. Understand the synthetic bootstrapping procedure that allows fine-tuning with as few as 25-50 human-labeled examples, approaching the accuracy of models trained on larger datasets while reducing costs by 10x+ compared to human annotations. Gain insights into scaling human feedback and improving LLM evaluation processes for more efficient and cost-effective outcomes.