ClickHouse MergeTree: Optimization Tips
ClickHouse MergeTree: Optimization Tips
ClickHouse MergeTree: Optimization Tips
Tip: ORDER BY Defines Data Layout
ORDER BY (created_at, event_type)
Data is physically sorted by these columns. Queries filtering on the first column are fastest.
Gotcha: Primary Key = Sorting Key
In ClickHouse, ORDER BY defines both. Choose the most selective column first.
Tip: Partition by Month
PARTITION BY toYYYYMM(created_at)
Makes it easy to drop old data: ALTER TABLE events DROP PARTITION 202401.
Gotcha: Too Many Partitions Slow Down Queries
Each partition adds overhead. Aim for dozens, not thousands, of partitions.
Tip: SAMPLE BY for Approximate Queries
ORDER BY (user_id, created_at)
SAMPLE BY user_id
Enables SAMPLE 0.1 for 10% sampling.
Gotcha: TTL for Auto-Deletion
TTL created_at + INTERVAL 90 DAY DELETE
Automatically removes old data during merges.
Tip: Order of Columns in ORDER BY Matters Massively
ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.
Tip: Use LowCardinality for Enum-Like Strings
Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.
Gotcha: Mutations Are Heavy
ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.
Senior Insight
I've adopted a strict separation between ClickHouse clusters for different data temperatures. Hot data (7 days) goes on fast NVMe storage with replication. Warm data (90 days) goes on SSD without replication. Cold data (beyond 90 days) goes to S3 with the S3 table engine. This tiered approach optimizes cost without sacrificing query performance — the hot cluster handles real-time dashboards, while historical analytics queries gracefully tolerate the slower S3 access.
Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)