MAR 3, 2026

Why your "one platform" data strategy is costing you more than you think

3 Months back I was catching up with a CDO at a NASDAQ listed consumer tech company. Their cloud data platform bill had tripled in eight months. Not because data volume tripled. Because every team, analytics, data science, product, marketing, was running every query they could think of through the same platform. Ad-hoc exploration and heavy dbt transforms and dashboard refreshes and ML feature generation. Same compute, same bill. Their CFO had started asking questions.

The platform doesn't matter for this story. Could be Snowflake. Could be Databricks. Could be BigQuery. The pattern is the same everywhere: you pick one engine, it works, and then every new workload gets routed there because that's where the data lives and nobody wants to manage a second system.

This post is about why that pattern breaks down, what it actually costs, and what the alternative looks like without adding chaos.

One platform for everything is a hidden tax

Every cloud data platform makes tradeoffs. Snowflake is excellent at SQL analytics on structured data. Databricks is strong for Python-heavy workloads and ML. BigQuery is deeply serverless and tightly coupled to GCP. Each one is a solid default for specific use cases.

The problem is when "solid default" becomes "only option." That's when costs get weird.

Here's what I see over and over. A data engineering team sizes their compute for the hardest job they run, usually a morning dbt build or a heavy transformation pipeline. Then that same compute serves dashboards, analyst ad-hoc queries, data science notebooks, and maybe some streaming jobs. Most of those workloads need a fraction of the resources. But nobody provisions separate compute tiers because it's easier to share, and nobody has time to audit which queries actually need what.

On Snowflake, that means defaulting to a Medium or Large warehouse for everything, which is a 4-8x cost premium over right-sized compute. On Databricks, it means leaving interactive clusters running all day for jobs that take minutes. On BigQuery, it means flat-rate slots sitting idle 90% of the time because you bought enough capacity for peak load.

The waste pattern is the same regardless of vendor. You're paying your data platform budget at several times the rate you need to because one-size compute doesn't fit every workload.

And it's not just cost. Each platform has architectural limits that show up when you push it outside its sweet spot. Need sub-second response times at high concurrency for a user-facing dashboard? Cloud warehouses aren't built for that. Want to run iterative local development without burning cloud credits? Not an option in a cloud-only architecture. Need real-time ingestion and low-latency analytics on event streams? You're bolting capabilities onto an engine that wasn't designed for them.

Different workloads have different compute profiles. That sounds obvious when you say it out loud. Batch transforms need raw throughput. Dashboards need fast reads on pre-aggregated data. Exploration needs cheap, iterative compute. ML training needs memory and sometimes GPUs. Forcing all of that through one engine means everything except the workload you sized for is either over-provisioned or under-served.

Why teams stay single-platform anyway

If the tradeoffs are this clear, why is single-platform the default?

Operational simplicity is the real answer. One platform means one set of credentials, one monitoring setup, one cost center, one set of access controls. Every additional engine adds operational surface area. For a small team, that overhead is real.

Data gravity keeps it that way. Once your data lives in one platform, moving it somewhere else for specific workloads means copies, sync jobs, and the governance question of which version is authoritative. Most teams look at that complexity and decide the cost premium of single-platform is the lesser evil.

Vendor messaging reinforces it. Snowflake, Databricks, and Google are all trying to be your entire data platform. Snowflake added Snowpark for Python workloads. Databricks added SQL warehouses for BI. BigQuery added continuous queries for streaming. The pitch is always the same: stay here, we'll handle everything. And for a while it's true enough.

But "true enough" has a cost, and it compounds. Every quarter, the team grows, workload diversity increases, and the gap between what the platform does well and what you're asking it to do gets wider. At some point, your data platform cost becomes a board-level conversation for reasons that have nothing to do with data volume.

What multi-engine actually looks like

The alternative isn't running five platforms with five copies of your data. That's worse, not better.

What changed is Apache Iceberg. Before Iceberg, multi-engine meant multi-copy. Export from your warehouse to S3, point a different engine at it, and now you have two copies with no shared governance. The operational cost killed the idea for most teams.

Iceberg lets you store data once in open format on object storage. Snowflake, Databricks, Spark, Trino, DuckDB, and ClickHouse can all read and write to the same tables. One copy of data, multiple compute engines, each handling what it's best at. Netflix, Apple, LinkedIn, and Stripe all run production workloads on Iceberg. All three major platforms, Snowflake, Databricks, and BigQuery, added native Iceberg support. It's not experimental.

For a mid-market data team, maybe 10-30 people with one primary platform, the workload split usually looks something like this.

Your primary platform (Snowflake, Databricks, whatever you already have) stays as the governed transformation engine. Heavy dbt runs, complex joins, the stuff that needs raw compute. Size it for the job, run it, tear it down. This is what it's good at, and you're paying for throughput. That's the right trade.

BI and dashboard queries are the first thing worth moving. Most BI queries are fast, selective reads on already-transformed data. They don't need heavy compute. Headset, a cannabis analytics platform, routed 94% of their Looker queries from Snowflake to DuckDB through a routing layer called Greybeam and cut their BI compute costs by 83%. They changed a connection string. That was it. The same pattern applies if you're on Databricks or BigQuery: if your BI tool is pulling lightweight queries through expensive compute, you're overpaying for reads.

User-facing analytics with high concurrency is a different workload entirely. If you're building product dashboards or customer-facing analytics that need sub-second latency at thousands of concurrent queries, ClickHouse or StarRocks are purpose-built for that. Cloud warehouses aren't. Benchmarks consistently show 2-3x better performance on ordered data with fewer resources. This isn't a platform replacement. It's a serving layer.

Dev and testing should run locally whenever possible. Every dbt run, validation script, and exploratory query during development burns cloud credits for no reason. DuckDB runs in-process on a laptop, handles hundreds of gigabytes on a modern machine, and costs nothing. Whatever platform you're on, your dev environment doesn't need to be the same as production.

The key point: you're not replacing your primary data platform. You're right-sizing its role. It stays as the backbone for governance, transformation, and heavy compute. Other engines handle the workloads where it's either too expensive or architecturally wrong.

What this costs you in return

Multi-engine has real tradeoffs, and I think the biggest one is organizational, not technical. Someone has to own the decision of which workloads go where. That person needs to understand cost and performance characteristics of each engine well enough to route correctly. Most teams don't have that role defined yet. Without it, you end up with an accidental multi-engine architecture, which is worse than intentional single-engine.

Iceberg is maturing fast but still has friction. DuckDB's Iceberg support was read-only through early 2025. Catalog interoperability between Snowflake's Polaris, Databricks' Unity Catalog, and open-source options like Nessie is still messy. You will spend time on plumbing that feels like it should already be solved.

Governance gets harder, not simpler. In a single platform, access control is one system. Multiple engines reading the same Iceberg tables need catalog-level governance. It's solvable. It's not solved out of the box.

Migration is incremental, not instant. Start with one workload. BI serving or dev/test are the usual first candidates because the risk is low and the savings are immediate. Prove it works, build confidence, expand. Budget 3-6 months for the first workload to be production-stable.

My rough heuristic: if your total platform spend is under $5K/month and you have a small team, single-engine is fine. The operational overhead of adding engines will eat the savings. Above $20K monthly with a team big enough to absorb the complexity, the math changes. Headset saved 83% on BI compute alone. Definite, an analytics startup, found that most of their customers' analytical data was under 2TB and didn't need a distributed warehouse at all, so they moved entirely to DuckDB.

Start with what you already have

The data platform market is converging on a pattern: open storage, multiple compute engines, unified governance. Every major vendor is moving this direction even if they'd prefer you stayed exclusively on their compute.

You don't have to architect the whole thing tomorrow. Pull up your query history, whatever platform you're on. Snowflake has query_history, Databricks has the query profile in SQL warehouses, BigQuery has INFORMATION_SCHEMA.JOBS. Filter for queries with execution times under 30 seconds. Add up what those queries cost you. That number is what you're overpaying for lightweight compute run through heavyweight infrastructure. That workload is your first candidate.