$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[July 17, 2025] ClickHouse

ClickHouse AggregatingMergeTree: State

ClickHouse AggregatingMergeTree: State

────────────────────────────────────────────────────────

ClickHouse AggregatingMergeTree: State

Tip: Aggregate Function Columns

CREATE TABLE stats (
    date Date,
    visits AggregateFunction(sum, UInt64),
    users AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree()
ORDER BY date;

Gotcha: Special Insert Syntax

INSERT INTO stats SELECT
    toDate(now()),
    sumState(toUInt64(1)),
    uniqState(toUInt64(user_id))
FROM page_views;

Tip: Query with Merge

SELECT date, sumMerge(visits), uniqMerge(users) FROM stats GROUP BY date;

Gotcha: State Type

Aggregate columns use special state types, not regular values.

Tip: Materialized View

CREATE MATERIALIZED VIEW daily_stats
ENGINE = AggregatingMergeTree()
ORDER BY date
AS SELECT toDate(created_at) as date, sumState(1) as visits FROM events;

Gotcha: Complex Setup

AggregatingMergeTree is more complex than SummingMergeTree. Use when you need multiple aggregate functions.

Tip: Order of Columns in ORDER BY Matters Massively

ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.

Tip: Use LowCardinality for Enum-Like Strings

Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.

Gotcha: Mutations Are Heavy

ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.

Senior Insight

AggregatingMergeTree stores intermediate aggregation states (from uniqState, avgState, quantileState) that can be combined later. I use it for real-time analytics where raw data arrives continuously but final aggregation happens at query time. The state functions (-State variants) store compact binary representations that combine efficiently. A dashboard query using AggregatingMergeTree performs in milliseconds regardless of data volume because the heavy computation happens during insertion.

Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)

────────────────────────────────────────────────────────
<-- back to posts