ClickHouse Materialized Views: Data Pipelines
ClickHouse Materialized Views: Data Pipelines
ClickHouse Materialized Views: Data Pipelines
Tip: Basic Materialized View
CREATE MATERIALIZED VIEW daily_stats
ENGINE = SummingMergeTree()
ORDER BY (date, url)
AS SELECT toDate(created_at) as date, url, count() as views
FROM page_views GROUP BY date, url;
Automatically populates from source table.
Gotcha: MV Only Captures New Inserts
Existing data in the source table is not backfilled. Populate manually first.
Tip: TO Clause for Explicit Target
CREATE MATERIALIZED VIEW mv_name TO target_table
AS SELECT ... FROM source_table;
Separates the view logic from the storage table.
Gotcha: MV Runs Synchronously
Inserts into the source table wait for the MV to process. Slow MVs slow down inserts.
Tip: Chained Materialized Views
CREATE MATERIALIZED VIEW hourly_stats TO daily_stats
AS SELECT ... FROM raw_events;
Chain multiple MVs for multi-level aggregation.
Gotcha: MV Schema Must Match
The MV's SELECT output must match the target table's schema exactly.
Tip: Order of Columns in ORDER BY Matters Massively
ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.
Tip: Use LowCardinality for Enum-Like Strings
Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.
Gotcha: Mutations Are Heavy
ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.
Senior Insight
ClickHouse materialized views are different from PostgreSQL's — they're triggered by data insertion, not query time. When you INSERT into a source table, ClickHouse runs a transformation and stores the result in a target table. I use this for real-time data rollups: raw event data → materialized view → pre-aggregated by hour. The gotcha: materialized views don't backfill existing data. You need to INSERT historical data into the target table separately.
Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)