$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[March 07, 2024] ClickHouse

ClickHouse Table Engines

ClickHouse Table Engines: Tips & Tricks

────────────────────────────────────────────────────────

ClickHouse Table Engines: Tips & Tricks

Tip: MergeTree is the Default

CREATE TABLE events (
    id UInt64,
    event String,
    created_at DateTime
) ENGINE = MergeTree()
ORDER BY (created_at, event);

Use this for 90% of cases.

Gotcha: ReplacingMergeTree Deduplicates

CREATE TABLE users (
    id UInt64,
    name String,
    version UInt64
) ENGINE = ReplacingMergeTree(version)
ORDER BY id;

Keeps only the row with the highest version per id. Deduplication happens during merges.

Tip: SummingMergeTree for Pre-Aggregation

CREATE TABLE sales (
    product_id UInt64,
    amount UInt64,
    created_at DateTime
) ENGINE = SummingMergeTree()
ORDER BY (product_id, created_at);

Automatically sums amount for rows with the same key.

Gotcha: AggregatingMergeTree

CREATE TABLE daily_stats (
    date Date,
    views AggregateFunction(sum, UInt64)
) ENGINE = AggregatingMergeTree()
ORDER BY date;

Stores aggregate functions, not raw values. Requires special insert syntax.

Tip: Log Engines for Small Data

CREATE TABLE logs (message String) ENGINE = StripeLog;

No indexes, no sorting. Good for small lookup tables.

Gotcha: Engine Choice Affects Query Performance

MergeTree: fast reads, good compression. Log engines: fast writes, poor reads.

Tip: ORDER BY Defines Data Layout

ORDER BY (created_at, event_type)

Data is physically sorted by these columns. Queries filtering on the first column are fastest.

Gotcha: PARTITION BY

PARTITION BY toYYYYMM(created_at)

Partitions by month. Makes it easy to drop old data.

Tip: Order of Columns in ORDER BY Matters Massively

ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.

Tip: Use LowCardinality for Enum-Like Strings

Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.

Gotcha: Mutations Are Heavy

ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.

Senior Insight

The MergeTree engine family is ClickHouse's heart, and choosing the right engine variant is critical. MergeTree is the base — use it for most data. SummingMergeTree pre-aggregates numeric columns, ideal for counters. AggregatingMergeTree stores intermediate aggregation states, perfect for real-time rollups. I once had a dashboard that ran a 10-second query on MergeTree and a 50ms query on AggregatingMergeTree with the same data. Choose your engine based on query patterns.

Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)

────────────────────────────────────────────────────────
<-- back to posts