MAR 7, 2026

Why your data team needs more than one compute engine

Here's a pattern I keep seeing. A team runs Snowflake for everything: batch transforms, ad-hoc analyst queries, dashboard refreshes, and that one product analytics job that scans 2 billion clickstream rows every hour. Their monthly bill keeps climbing. Query performance gets worse as workloads compete for the same warehouse. The response is usually to throw a bigger warehouse at it, which makes the bill climb faster.

The problem isn't Snowflake. The problem is treating one engine as the answer to every question your data platform has to answer. And that assumption is costing teams real money.

This post is about multi-compute: running different engines for different workloads, inside the same data platform. Why more teams are doing it, what it actually looks like, and where it gets messy.

One engine, many problems

Most data teams start with a single compute engine. Snowflake, Databricks, BigQuery. Pick one, move everything there, call it a day. For a while, this works fine. The warehouse handles your dbt models, your Looker dashboards, and the occasional analyst query without breaking a sweat.

Then the workloads multiply. Product wants clickstream analytics on billions of events. Finance needs reliable reporting with strict SLAs. The ML team starts running feature engineering jobs that hog compute for hours. Analysts are queueing behind batch transforms at 9am every morning, wondering why their dashboards are slow.

What you're really dealing with is a resource contention problem dressed up as a performance problem. Everything runs on one engine, so everything competes for the same pool of compute. The fixes are all expensive. Bigger warehouses mean more credits. Multi-cluster auto-scaling helps with concurrency spikes but burns through spend fast. A team I talked to recently cut their Snowflake bill by 40% just by shortening auto-suspend timers and right-sizing their warehouses. But right-sizing only goes so far when the fundamental issue is that you're asking one system to be good at everything.

There's a cost angle too. Snowflake charges the same credit rate whether you're running a critical revenue report or a throwaway exploratory query from a junior analyst. Every query gets the same treatment. That's simple, but it's not efficient. You end up paying warehouse rates for workloads that don't need warehouse-grade compute.

Then there's the blast radius. If your single engine has an outage or a performance degradation, everything goes down together. Your batch pipelines, your dashboards, your data apps. One bad query from an analyst can cascade into a warehouse-wide slowdown that takes out the CFO's morning revenue dashboard. That's not a hypothetical. I've heard that exact story at least five times.

Why "just optimize your warehouse" isn't enough

The standard advice is warehouse isolation within your existing platform. Separate your ETL warehouse from your BI warehouse. Run batch jobs on a dedicated cluster. This helps. It's still optimization within the constraints of one engine.

Snowflake is excellent at structured SQL analytics. It's not built for sub-100ms latency on high-concurrency product analytics hitting billions of rows. Databricks is great for large-scale Spark processing and ML workloads. It's overkill for a simple nightly dbt run that transforms a few million rows. BigQuery's serverless model works well for bursty ad-hoc queries but gets expensive fast for always-on workloads.

Each engine was designed with certain workload profiles in mind. Push a workload outside an engine's sweet spot, and you pay for it in performance, cost, or both.

The ClickHouse team published a cost-performance benchmark in late 2025 comparing five major cloud data warehouses (Snowflake, Databricks, ClickHouse Cloud, BigQuery, and Redshift) across 1 billion, 10 billion, and 100 billion row datasets. The results showed order-of-magnitude differences in cost-per-query depending on the workload type. No single engine won across all query patterns.

That's the core insight behind multi-compute. You stop trying to find one engine that's adequate at everything and start matching engines to workloads.

What multi-compute actually looks like

Multi-compute isn't a product you buy. It's an architectural decision: run different compute engines against the same (or coordinated) data, choosing each engine based on what the workload requires.

In practice, this usually means your data platform has two or three engines rather than one. A common setup at mid-market companies running Snowflake:

Snowflake stays the primary warehouse for governed BI reporting, dbt transforms, and the canonical data models that finance, marketing, and ops depend on. It's the system of record for your business metrics.

ClickHouse (or StarRocks, or Apache Druid) handles the high-volume, low-latency workloads. Product analytics on clickstream data. User-facing dashboards where response time matters. Event aggregations that would be slow and expensive in a general-purpose warehouse.

Spark on Databricks (or open-source Spark, or DuckDB for smaller jobs) handles heavy transformations, ML feature engineering, and anything that benefits from distributed processing or Python-native workflows.

The enabling technology is open table formats. Apache Iceberg has become the de facto standard for multi-engine data access. Store data once in Parquet on S3 or GCS, query it from Spark, Trino, Snowflake, ClickHouse, Flink, and a growing list of other engines. No data duplication. No proprietary lock-in. By 2025, every major warehouse vendor had shipped native Iceberg support. Databricks acquired Tabular (founded by Iceberg's creators). Snowflake launched its Open Catalog. The format war is effectively over.

This matters because it makes multi-compute practical for teams that aren't Netflix-scale. You don't need custom data movement pipelines between engines. You write your canonical data to Iceberg tables, and each engine reads from the same source.

A B2B SaaS company I spoke with was ingesting about 3 billion click events per month into Snowflake. Their product analytics team needed interactive queries on this data. Snowflake could do it, but only on an XL warehouse running continuously, costing them north of $12k/month for just that workload. They moved the clickstream data into ClickHouse. Same queries, sub-200ms response times, at a fraction of the cost. Snowflake stayed the home for their clean business data: ARR, churn, pipeline metrics, customer segmentation. Each system did what it was built for.

The cost impact from workload routing like this is real. Teams I've talked to report 2x to 5x cost reductions on specific workloads moved to a more appropriate engine. For teams spending $50k+ per month on a single warehouse, shifting even 20-30% of workloads elsewhere can save meaningfully.

There are reliability benefits too. When product analytics runs on ClickHouse and finance reporting runs on Snowflake, an issue in one doesn't take down the other. You can set different SLAs for different engines based on criticality. Your revenue dashboard on Snowflake gets five-nines. Your experimental product analytics cluster gets "best effort." A single-engine architecture doesn't give you that option.

Where this gets hard

Multi-compute introduces real complexity, and teams should go in with eyes open.

Orchestration is the first thing that gets harder. When data flows through multiple engines, you need Airflow or Dagster workflows that submit jobs to Snowflake, trigger Spark runs, and coordinate data landing in ClickHouse. That's more moving parts than a pure dbt-on-Snowflake setup, and more ways for things to break at 3am.

There's also an expertise cost. Running ClickHouse well requires different skills than running Snowflake. Self-hosted ClickHouse demands real operational knowledge: shard management, replica placement, merge tuning. ClickHouse Cloud reduces this, but you're still adding a second system your team needs to learn. For a team of three or four data engineers already stretched thin, that might be too much to absorb.

Governance gets messier. When data lives in one warehouse, lineage and access control are relatively contained. When the same data is queryable from three different engines, you need governance that spans all of them. Catalogs like Atlan help, but it's still more surface area than a single-engine setup.

And the "which engine for which workload" decision isn't always clear-cut. Some workloads sit in a gray zone where Snowflake handles it fine and ClickHouse would be marginally better. The overhead of routing to a separate engine might not justify the savings. Multi-compute pays off most clearly when you have distinct workload clusters with obviously different requirements: high-volume low-latency analytics vs. complex governed BI reporting vs. heavy batch processing.

None of this solves your upstream problems either. Bad data quality running through three engines is still bad data quality. Analysts writing ungoverned SQL everywhere will now have more places to do it.

So who should actually do this? If you're spending under $10k/month on Snowflake and most of your workloads are standard BI, you probably don't need multi-compute yet. If you're north of $50k/month and a big chunk goes to workloads your warehouse isn't optimized for, it pays for itself quickly.

Where to start

The days of routing every workload through one engine are ending for teams at any real scale. Not because Snowflake or Databricks are bad. They're excellent at what they're designed for. But no single engine is the best choice for every workload, and the cost of pretending otherwise adds up fast.

If you're feeling cost or performance pressure, start small. Identify your most expensive or worst-performing workload class. Evaluate whether a specialized engine handles it better. Run both in parallel for a month, compare cost and latency. You don't have to rearchitect everything at once. You just have to stop assuming one tool fits every job.