$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[November 07, 2024] ClickHouse

ClickHouse VersionedCollapsingMergeTree: History

ClickHouse VersionedCollapsingMergeTree: History

────────────────────────────────────────────────────────

ClickHouse VersionedCollapsingMergeTree: History

Tip: Version Column

CREATE TABLE users (
    id UInt64,
    name String,
    version UInt64,
    sign Int8
) ENGINE = VersionedCollapsingMergeTree(sign, version)
ORDER BY id;

Gotcha: Higher Version Wins

Rows with higher version numbers take precedence during collapse.

Tip: Track Changes

INSERT INTO users VALUES (1, 'John', 1, 1);
INSERT INTO users VALUES (1, 'Jane', 2, 1);

Version 2 replaces version 1.

Gotcha: Sign Still Required

Both sign and version are needed for proper collapsing.

Tip: Query Latest State

SELECT * FROM users FINAL WHERE id = 1;

Gotcha: History Preservation

Old versions are collapsed during merge. For full history, use a separate table.

Tip: Order of Columns in ORDER BY Matters Massively

ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.

Tip: Use LowCardinality for Enum-Like Strings

Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.

Gotcha: Mutations Are Heavy

ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.

Senior Insight

VersionedCollapsingMergeTree extends CollapsingMergeTree with explicit version tracking. I use it when the same row can be updated multiple times and you need to track the current version. The pattern: insert a +1 row with version 1, then a -1 row with version 1 followed by a +1 row with version 2. This ensures the correct revision is active. It's more complex than CollapsingMergeTree but necessary for applications with multiple updates per entity.

Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)

────────────────────────────────────────────────────────
<-- back to posts