How Netflix's Data Bridge orchestrates 300,000 data pipeline jobs per week
Netflix runs one of the most complex data ecosystems on the planet. Hundreds of internal teams, dozens of purpose-built datastores, petabytes of data flowing between systems daily. Somewhere in that ecosystem, an engineer needs to move data from system A to system B. Should be straightforward, right?
It wasn't. For years, the answer to "how do I move data at Netflix" depended on which systems were involved, which team you asked, and which bespoke tool someone had built for a similar use case three years ago. Netflix's engineering culture of "Freedom and Responsibility" is great for empowering teams to pick the best tool for the job. It's less great for preventing fifteen teams from building fifteen slightly different data movement tools.
In January 2026, Netflix's Data Movement Platform team published a detailed technical blog about Data Bridge, the unified control plane they built to fix this. The numbers are worth pausing on: roughly 20,000 distinct data movement jobs, more than three dozen source-destination pairs, about 300,000 job executions per week. And it's considered the standard path for batch data movement at Netflix.
This post goes deep on Data Bridge's architecture, what problems it actually solves, and what data platform teams at smaller companies can take from Netflix's approach.
The fragmentation problem
Before Data Bridge, Netflix's data movement situation looked something like this: a team needed to move data from Cassandra to their Iceberg-based data warehouse. They'd look at existing tools, find that none of them perfectly fit their source-destination combination or their specific transformation requirements, and build yet another solution. That solution would work for their use case but be invisible to other teams facing similar needs.
Multiply this across hundreds of teams over several years and you get a fragmented mess. Multiple tools doing roughly the same thing, each with its own API, its own configuration format, its own monitoring setup, and its own operational burden.
Three specific problems made this worse than just duplicated effort:
Poor separation of concerns. Users' intent (move data from A to B on a schedule) was tangled up with implementation details (configure these Spark parameters, use this specific connector version, set this batch size). That meant the platform team couldn't upgrade or swap underlying engines without breaking user workflows. Every infrastructure improvement required coordinating with every user.
No unified observability. When a data movement job failed, the debugging path depended on which tool was running it. There was no single place to see all data movements, their status, their SLAs, or their failure patterns. For a company running 300,000 jobs a week, that's a serious operational gap.
No standard for new integrations. When Netflix adopted a new datastore, every data movement tool needed to build support for it independently. Or more often, nobody did, and teams built one-off scripts to bridge the gap.
This isn't a Netflix-specific problem. I'd argue most companies at 50+ engineers hit some version of it. The tools are different (Airflow, dbt, Fivetran, custom scripts), but the pattern is identical: fragmentation creeps in because every team optimizes locally.
How Data Bridge works
Data Bridge is a unified control plane for batch data movement. That sentence has a lot of precision packed into it, so let me unpack it.
Control plane, not data plane. Data Bridge doesn't actually move data. It doesn't have its own execution engine. It's an abstraction layer that sits between the user and the various systems that do the heavy lifting. When you tell Data Bridge to move data from Cassandra to Iceberg on a daily schedule, Data Bridge figures out which connector to use, which execution engine to run it on, and how to configure the job. The actual data movement happens on whatever engine Data Bridge selects.
Unified. One API, one configuration model, one monitoring system for all batch data movement. Whether you're moving data between Kafka and S3, Cassandra and Iceberg, or Google Sheets and your data warehouse, you interact with Data Bridge the same way.
Batch data movement. Data Bridge handles batch/scheduled data movement. Netflix has a separate platform called Data Mesh for streaming data movement and real-time processing, built on Kafka and Flink. The two are complementary.
The architecture has three layers:
Intent layer. Users describe what they want: source, destination, schedule, and optional transformations. They don't specify how to execute it. This is the API that users interact with.
Connector layer. Data Bridge has a growing catalog of connectors, each defining how to move data between a specific source-destination pair. A connector encapsulates the implementation details: which engine to use, what parameters to set, how to handle retries and error classification.
Execution layer. Most connectors route to Maestro, Netflix's workflow orchestrator, for actual execution. Maestro handles scheduling, retries, error classification, notifications, and logging. Data Bridge creates a Maestro workflow per data movement job, and that workflow is owned and managed by Data Bridge, hidden from the user.
The execution architecture has a clever two-layer workflow structure. The top-level Maestro workflow is a thin wrapper that calls the Data Bridge API at runtime to fetch the current connector configuration and subworkflow identifier. The actual data movement logic lives in a subworkflow. This design means the Data Bridge team can swap the underlying implementation of any connector at runtime by updating the subworkflow identifier in the connection metadata, without redeploying the top-level workflow and without the user knowing or caring.
That's the key architectural insight. By separating the user's intent from the execution implementation and making the execution layer hot-swappable, you can upgrade infrastructure without user migration projects.
Why Maestro, not a new execution engine
Netflix already had Maestro, their workflow orchestrator that handles over a million tasks per day. It's battle-tested, supports periodic scheduling, integrates with Netflix's identity and access control systems, handles retries with detailed error classification, and provides notification and logging out of the box.
The Data Bridge team explicitly chose to build on Maestro rather than create a new execution engine. Their reasoning was practical: all of those capabilities (scheduling, retries, access control, logging) would need to be rebuilt from scratch in any new system. Maestro already had them, and it was already proven at Netflix scale.
This is a decision pattern worth noting. When you're building a platform layer, the temptation is to build everything from the ground up so you have full control. The risk is that you spend 18 months rebuilding scheduling and retry logic before you even get to the hard part (the routing and connector abstraction). Netflix avoided that trap by treating Maestro as the execution substrate and focusing Data Bridge on the control plane and user experience.
Maestro is also open-sourced as of mid-2024, which means this isn't entirely a "works only at Netflix scale" story. Teams running Airflow or Dagster could apply the same pattern: build a thin control plane that translates user intent into DAGs/workflows on the orchestrator you already have.
SQL transformations and what comes next
One recent addition to Data Bridge is support for stateless SQL-based transformations. Users can now define lightweight SQL transforms as part of their data movement job, instead of having to set up a separate transformation pipeline.
This saw quick adoption, and that makes sense. If you're already describing a data movement job (source, destination, schedule), adding a SQL transform is a natural extension. Without it, users would need to: move data from A to a staging area, run a separate dbt or Spark job to transform it, and then move the result to B. Three steps where one would do.
The Data Bridge team says they're considering support for additional transformation types beyond SQL. The architecture supports it because transformations are just another parameter in the connector configuration. The execution engine handles the actual work.
This is also where Data Bridge's future direction gets interesting. The team mentioned embedding data movement closer to datastore control planes, so that data movement can be initiated from the datastore's own UI or API rather than requiring users to interact with Data Bridge separately. That would make data movement almost invisible to the user, triggered as a natural consequence of creating or configuring a datastore.
What smaller teams can take from this
You are not Netflix. You probably don't have 20,000 data movement jobs or three dozen source-destination pairs. But the underlying problem scales down.
If your team has more than three ways to move data between systems (Fivetran for ingestion, Airflow DAGs for custom moves, dbt for transforms, and a handful of cron scripts that nobody wants to touch), you have a version of the fragmentation problem. Not at Netflix scale, but the same symptoms: no single inventory of data movement jobs, and custom work required every time you add a new integration.
The Data Bridge pattern suggests a few practical takeaways:
Separate intent from implementation. Even if you don't build a full control plane, establish a standard way for users to describe data movement jobs. A YAML config that specifies source, destination, schedule, and optional transforms, checked into Git. That alone gives you a single inventory of all data movement happening in your org.
Build on your existing orchestrator. If you're running Airflow, build your abstraction layer on top of Airflow. Don't replace it. Create a thin service that generates Airflow DAGs from user intent configs. You get all of Airflow's scheduling, retry, and monitoring capabilities without rebuilding them.
Make the execution layer swappable. When you define a data movement job, don't hard-code the execution path. Use a connector pattern where the "how" can change without changing the "what." This pays off the first time you need to swap a connector or upgrade an engine version. Without it, you're doing a manual migration across every affected job.
Centralize observability. Even if you can't unify the execution layer right away, you can centralize monitoring. A single dashboard that shows all data movement jobs, their status, SLAs, and failure rates. That's often the first win because it makes the fragmentation visible, and visible problems get fixed faster than invisible ones.
The tradeoffs
Data Bridge is a control plane, which means it adds a layer of abstraction between users and execution engines. Abstractions have costs.
Debugging gets harder. When a data movement job fails, the failure might be in the connector, the Maestro workflow, the subworkflow, or the underlying execution engine. The Data Bridge team built observability hooks to forward execution status to centralized metrics and alerting, but there's still more indirection than in a system where users directly configure and run their own jobs.
The connector catalog is a bottleneck. Data Bridge supports three dozen source-destination pairs, but Netflix has more datastores than that. The team acknowledged that demand for new connectors outpaces their capacity to build them, and they're working on making it easier for other teams to contribute connectors. That's the classic platform team scaling problem: the platform is only as useful as its coverage, and coverage is limited by engineering bandwidth.
Intent-based APIs trade flexibility for simplicity. If your data movement job has an unusual requirement that the connector doesn't support, you're stuck waiting for the platform team to extend the connector or building a workaround. That friction is the cost of the standardization benefit. For most jobs, the trade is worth it. For the 5% of edge cases, it's annoying.
These are real tradeoffs. But they're the tradeoffs of a mature platform, not of a broken system. The alternative, fifteen teams building fifteen one-off tools, has worse tradeoffs. You just don't see them as clearly because they're distributed across the org.