ClickHouse ReplacingMergeTree: Deduplication
ClickHouse ReplacingMergeTree: Deduplication
ClickHouse ReplacingMergeTree: Deduplication
Tip: Version Column
CREATE TABLE users (
id UInt64,
name String,
version UInt64
) ENGINE = ReplacingMergeTree(version)
ORDER BY id;
Gotcha: Dedup During Merge
Deduplication happens during background merges, not on insert.
Tip: Without Version
ENGINE = ReplacingMergeTree()
Keeps the last inserted row per key.
Gotcha: FINAL for Consistent Reads
SELECT * FROM users FINAL WHERE id = 1;
Forces deduplication at query time.
Tip: OPTIMIZE TABLE
OPTIMIZE TABLE users FINAL;
Forces immediate merge and dedup.
Gotcha: Not for Real-Time
Don't rely on ReplacingMergeTree for real-time dedup. Use it for eventual consistency.
Tip: Order of Columns in ORDER BY Matters Massively
ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.
Tip: Use LowCardinality for Enum-Like Strings
Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.
Gotcha: Mutations Are Heavy
ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.
Senior Insight
ReplacingMergeTree is ClickHouse's answer to upsert operations. It deduplicates rows with the same sorting key during merges, keeping only the last row (by version column or insertion time). I use it for maintaining the latest state of frequently updated entities. The critical note: deduplication only happens during merges, which are asynchronous. Immediate SELECT queries may return duplicates. Use FINAL or aggregate with argMax() for consistent results.
Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)