Talk
Sub-Second or Uninstalled: Engineering Low-Latency Voice Agents
Voice AI has a brutal constraint: if your agent takes more than a few seconds to respond, users stop talking to it. This talk is a field guide to latency engineering for production speech-to-speech systems. We'll cover hiding tool execution behind natural conversation, taming context rot, forcing deterministic workflows onto non-deterministic models, and choosing when to use fast SLMs versus delegating to slower, smarter ones. Practical Python patterns from a voice agent serving real users daily.
About
Smit Shah is the Founder and CTO of Enata, an AI-native product that acts as a second brain for field sales teams across healthcare, industrials, and consumer packaged goods. He specializes in distributed systems, applied machine learning, and ML infrastructure, with a focus on building reliable AI systems in production. Previously, Smit worked on large-scale systems and ML platforms at Microsoft, Amazon, and Google, and on applied ML at Snorkel AI and Martian.
