Hamel Husain

ML Engineer & Data Scientist

Talk

AI Evals Pitfalls

Generic benchmarks won't save you. Most teams building LLM applications make the same evaluation mistakes: they adopt off-the-shelf metrics, skip error analysis, automate too early, exclude domain experts from the process, and more. This talk covers five common pitfalls I've observed across hundreds of AI projects and strategies to avoid them. You'll learn why looking at your data matters more than dashboards, when to build automated evaluators, and how to develop metrics that capture what "good" means for your specific use case.

About

ML Engineer & Data Scientist who has recently turned himself into a meme on AI Evals.