Talk
Building Production-Ready Multimodal ELT Pipelines for Hugging Face with dlt, Lance, and Ibis
Today, AI agents can generate workable ELT pipeline scripts in Python within seconds. But generating a script is not the same as operating a pipeline in production. Agent-produced pipelines often resemble the work of a very fast junior data engineer: functional, but lacking performance considerations, deployment strategy, lifecycle management, and architectural consistency. Scaling this approach across teams quickly becomes expensive to review and nearly impossible to standardize. In production systems, we address these challenges with frameworks and interoperable architectures. We encapsulate solved problems, enforce modularity, and provide reusable abstractions. AI agents benefit from the same constraints. In this talk, we will demo how to build a production-ready multimodal ELT pipeline in Python using: dlt for standardized extraction and loading, Lance as a high-performance multimodal storage layer, Ibis within dltHub for complex preprocessing that goes beyond what is practical in SQL, dltHub data quality checks to ensure the dataset meets production standards, and automated publishing of the resulting dataset to Hugging Face.
About
Matthaus Krzykowski is the CEO and co-founder of dltHub, makers of dlt, the most popular open-source Python library for moving data. dltHub’s goal is to make code-first data engineering in Python possible end to end - accessible, reliable, and production-ready. As Python has become one of the dominant languages for data, machine learning, and AI, Matthaus saw production data ingestion remain fragmented, slow, and overly SQL-centric. dlt takes a different approach: it is an interoperable Python library that powers production workloads at over 8,000 companies, without requiring containers, GUIs, or platform lock-in. Being fully code-first, dlt can be embedded directly into notebooks, applications, and AI automation workflows. dltHub extends this with a hosted platform that helps teams move faster from prototype to production pipelines. Matthaus built his first production AI agent in 2016 while working on NLU systems at Rasa, and has spent the last decade pushing for data engineering to feel more like software engineering.
