The Unbreakable Ecosystem: How LLMs Fix Your Data Before It Breaks.
In the modern data landscape, the primary bottleneck for artificial intelligence is rarely the model architecture itself, but rather the reliability and ""freshness"" of the underlying data infrastructure. As a practitioner at Meta, I have observed that even the most sophisticated AI models are only as effective as the pipelines that feed them. Traditional data engineering is often reactive; when an upstream API changes or a schema drifts, pipelines break, causing cascading downstream failures that require hours—or days—of manual engineering intervention to resolve.
This presentation explores the shift from manual data maintenance to AI-Augmented Data Operations (DataOps). We will dive into a detailed industry case study regarding the implementation of a ""Self-Healing"" framework designed to manage petabyte-scale data flows. By integrating Large Language Models (LLMs) directly into the orchestration layer (such as Airflow or Dagster), we have moved beyond simple ""if-then"" error handling. Attendees will learn how generative agents can now analyze a failed job’s stack trace, compare malformed incoming payloads against historical schema metadata, and autonomously generate the necessary SQL transformations or regex fixes to maintain data continuity. This approach doesn't just fix errors; it documents them and proposes permanent code PRs for engineering review, effectively bridging the gap between raw data chaos and operational excellence.
Furthermore, we will address the critical "human-in-the-loop" aspect. While the AI manages the immediate recovery to prevent business downtime, we will discuss the governance frameworks necessary to ensure these autonomous fixes remain transparent and accountable. This session is designed to equip practitioners with a blueprint for reducing "Mean Time to Recovery" (MTTR) and shifting the Data Engineer’s role from a "break-fix" plumber to a strategic architect of resilient AI systems. This is a crucial step in "Shaping the AI-Driven Future" by ensuring the foundations of our digital world are as intelligent as the applications they support.
Anjan Rajkumar is a Senior Data Engineer at Meta within the Social Experiences Team, where he leads high-priority initiatives for Facebook Groups and cross-app content sharing. With over 16 years of experience in architecting robust data solutions , Anjan specializes in building scalable pipelines using Scala, Spark, and Python. At Meta, he currently spearheads interest-based strategies and ""North Star"" metric development to drive global user engagement, while leading organization-level automated data quality and dashboard compliance workstreams.
Prior to joining Meta, Anjan served as a Senior Lead Data Engineer at Vyopta (a HP Business), where he led digital transformation efforts, including migrating legacy pipelines to Big Data technologies and optimizing Spark-based workflows. He holds a Master of Science in Management Information Systems from Texas A&M University’s Mays Business School. Anjan is passionate about leveraging AI and advanced analytics to transform complex data into resilient, self-healing infrastructure.
We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.