Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

AppWorld: Reliable Evaluation of Interactive Agents in a World of Apps and People

Center for Language & Speech Processing(CLSP), JHU via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Watch a research talk exploring AppWorld, a groundbreaking simulation environment designed for evaluating AI agents' capabilities in performing everyday digital tasks. Dive into the development of a high-fidelity simulated world featuring nine common applications like Amazon, Gmail, and Venmo, where AI assistants must navigate complex scenarios such as splitting bills with roommates through interactive coding and API calls. Learn about the challenges of creating reliable evaluation frameworks for complex tasks with multiple solution paths, and discover how current leading language models like GPT-4 perform on these real-world challenges. Explore future research directions for developing multimodal, collaborative, and socially intelligent AI agents that can effectively learn from environmental feedback and adapt to various situations. Presented by PhD candidate Harsh Trivedi from Stony Brook University, whose work on AppWorld earned a Best Resource Paper award at ACL'24.

Syllabus

AppWorld: Reliable Evaluation of Interactive Agents in a World of Apps and People -- Harsh Trivedi

Taught by

Center for Language & Speech Processing(CLSP), JHU

Reviews

Start your review of AppWorld: Reliable Evaluation of Interactive Agents in a World of Apps and People

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.