$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[June 05, 2026] ClickHouse

ClickHouse Materialized Views: Data Pipelines

ClickHouse Materialized Views: Data Pipelines

────────────────────────────────────────────────────────

ClickHouse Materialized Views: Data Pipelines

Tip: Basic Materialized View

CREATE MATERIALIZED VIEW daily_stats
ENGINE = SummingMergeTree()
ORDER BY (date, url)
AS SELECT toDate(created_at) as date, url, count() as views
FROM page_views GROUP BY date, url;

Automatically populates from source table.

Gotcha: MV Only Captures New Inserts

Existing data in the source table is not backfilled. Populate manually first.

Tip: TO Clause for Explicit Target

CREATE MATERIALIZED VIEW mv_name TO target_table
AS SELECT ... FROM source_table;

Separates the view logic from the storage table.

Gotcha: MV Runs Synchronously

Inserts into the source table wait for the MV to process. Slow MVs slow down inserts.

Tip: Chained Materialized Views

CREATE MATERIALIZED VIEW hourly_stats TO daily_stats
AS SELECT ... FROM raw_events;

Chain multiple MVs for multi-level aggregation.

Gotcha: MV Schema Must Match

The MV's SELECT output must match the target table's schema exactly.

Tip: Order of Columns in ORDER BY Matters Massively

ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.

Tip: Use LowCardinality for Enum-Like Strings

Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.

Gotcha: Mutations Are Heavy

ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.

Senior Insight

ClickHouse materialized views are different from PostgreSQL's — they're triggered by data insertion, not query time. When you INSERT into a source table, ClickHouse runs a transformation and stores the result in a target table. I use this for real-time data rollups: raw event data → materialized view → pre-aggregated by hour. The gotcha: materialized views don't backfill existing data. You need to INSERT historical data into the target table separately.

Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)

────────────────────────────────────────────────────────
<-- back to posts