$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[October 13, 2024] ClickHouse

ClickHouse CollapsingMergeTree: Counters

ClickHouse CollapsingMergeTree: Counters

────────────────────────────────────────────────────────

ClickHouse CollapsingMergeTree: Counters

Tip: Sign Column

CREATE TABLE metrics (
    user_id UInt64,
    event String,
    value Int64,
    sign Int8
) ENGINE = CollapsingMergeTree(sign)
ORDER BY (user_id, event);

Gotcha: Sign Values

1 = add, -1 = subtract. Rows with matching keys and opposite signs cancel.

Tip: Update Counter

INSERT INTO metrics VALUES (1, 'page_views', 1, -1);
INSERT INTO metrics VALUES (1, 'page_views', 5, 1);

The old value is cancelled, new value remains.

Gotcha: Summing Before Collapse

If multiple rows with the same sign exist, they're summed before cancellation.

Tip: Query with FINAL

SELECT user_id, event, sum(value * sign) FROM metrics FINAL GROUP BY user_id, event;

Gotcha: Not for Arbitrary Updates

CollapsingMergeTree is for counters, not general updates.

Tip: Order of Columns in ORDER BY Matters Massively

ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.

Tip: Use LowCardinality for Enum-Like Strings

Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.

Gotcha: Mutations Are Heavy

ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.

Senior Insight

CollapsingMergeTree enables real-time state tracking by using 'sign' rows (+1 for add, -1 for cancel). I've used it for tracking active user sessions, inventory levels, and financial balances where you need the current state derived from a sequence of events. The pattern: insert a +1 row when a session starts and a -1 row when it ends. The SELECT ... FINAL query returns only the current state. It's ClickHouse's most elegant solution for mutable state in an append-only world.

Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)

────────────────────────────────────────────────────────
<-- back to posts