
Stay up to date with our latest features and announcements

MAR 17, 2026
Last quarter, I was on a call with a data lead at a Series E fintech. His team had 4,000+ data pipelines running in Airflow. I asked him how many of those were still serving active dashboards. Long pause. "At least half of them," he said. "Maybe." That conversation keeps coming back to me because it's the same story everywhere. Not because these are bad teams. They're usually very good teams that grew fast and didn't have time to stop and organize the mess accumulating behind them. This post i
MAR 12, 2026
Netflix runs one of the most complex data ecosystems on the planet. Hundreds of internal teams, dozens of purpose-built datastores, petabytes of data flowing between systems daily. Somewhere in that ecosystem, an engineer needs to move data from system A to system B. Should be straightforward, right? It wasn't. For years, the answer to "how do I move data at Netflix" depended on which systems were involved, which team you asked, and which bespoke tool someone had built for a similar use case th

MAR 10, 2026
Zepto is an Indian quick-commerce company that promises 10-minute grocery delivery. That promise depends on data. Demand forecasting, inventory optimization, rider routing, and dozens of daily dashboards for business teams all run on top of their data infrastructure. At the scale they operate (millions of orders, 200+ TB processed daily), even small inefficiencies in how compute resources get allocated compound quickly. Late last year, Zepto's data engineering team noticed something that should

MAR 9, 2026
There's a pattern showing up across the data platform teams I pay attention to. Zepto built an internal DataPortal that routes Databricks jobs between Spark clusters and SQL warehouses based on workload metadata. Netflix built Data Bridge, a unified control plane that abstracts execution engines away from users entirely. Uber built data access proxies that route Presto, Spark, and Hive queries to different clusters based on query weight and data location. Three companies, three different scales

MAR 7, 2026
Last week I was talking to a Head of Data at a Series C fintech. She had nine people on her data team. Three of them, she told me, spent most of their week just keeping integrations alive between Fivetran, Airflow, dbt, and Looker. Nobody was building anything new. The entire job had become making sure Tool A still talked to Tool B after someone changed a config. This post is about how that happens, why it's getting worse, and what the actual cost looks like when you break it down. The stack

MAR 7, 2026
Here's a pattern I keep seeing. A team runs Snowflake for everything: batch transforms, ad-hoc analyst queries, dashboard refreshes, and that one product analytics job that scans 2 billion clickstream rows every hour. Their monthly bill keeps climbing. Query performance gets worse as workloads compete for the same warehouse. The response is usually to throw a bigger warehouse at it, which makes the bill climb faster. The problem isn't Snowflake. The problem is treating one engine as the answer

MAR 3, 2026
3 Months back I was catching up with a CDO at a NASDAQ listed consumer tech company. Their cloud data platform bill had tripled in eight months. Not because data volume tripled. Because every team, analytics, data science, product, marketing, was running every query they could think of through the same platform. Ad-hoc exploration and heavy dbt transforms and dashboard refreshes and ML feature generation. Same compute, same bill. Their CFO had started asking questions. The platform doesn't matt
AI that builds pipelines like your best data engineer. No vendor lock-in, faster deployments, lower warehouse costs, and zero production incidents.
Start now