ClickHouse AggregatingMergeTree: State
ClickHouse AggregatingMergeTree: State
ClickHouse AggregatingMergeTree: State
Tip: Aggregate Function Columns
CREATE TABLE stats (
date Date,
visits AggregateFunction(sum, UInt64),
users AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree()
ORDER BY date;
Gotcha: Special Insert Syntax
INSERT INTO stats SELECT
toDate(now()),
sumState(toUInt64(1)),
uniqState(toUInt64(user_id))
FROM page_views;
Tip: Query with Merge
SELECT date, sumMerge(visits), uniqMerge(users) FROM stats GROUP BY date;
Gotcha: State Type
Aggregate columns use special state types, not regular values.
Tip: Materialized View
CREATE MATERIALIZED VIEW daily_stats
ENGINE = AggregatingMergeTree()
ORDER BY date
AS SELECT toDate(created_at) as date, sumState(1) as visits FROM events;
Gotcha: Complex Setup
AggregatingMergeTree is more complex than SummingMergeTree. Use when you need multiple aggregate functions.
Tip: Order of Columns in ORDER BY Matters Massively
ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.
Tip: Use LowCardinality for Enum-Like Strings
Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.
Gotcha: Mutations Are Heavy
ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.
Senior Insight
AggregatingMergeTree stores intermediate aggregation states (from uniqState, avgState, quantileState) that can be combined later. I use it for real-time analytics where raw data arrives continuously but final aggregation happens at query time. The state functions (-State variants) store compact binary representations that combine efficiently. A dashboard query using AggregatingMergeTree performs in milliseconds regardless of data volume because the heavy computation happens during insertion.
Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)