$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[May 22, 2026] ClickHouse

ClickHouse Query Optimization

ClickHouse Query Optimization: Tips & Tricks

────────────────────────────────────────────────────────

ClickHouse Query Optimization: Tips & Tricks

Tip: ORDER BY Defines Query Speed

ORDER BY (created_at, event_type)

Queries filtering on created_at are fast. Queries filtering only on event_type are slow.

Gotcha: PREWHERE is Faster Than WHERE

SELECT * FROM page_views
PREWHERE created_at >= today() - 7
WHERE url LIKE '/api%';

PREWHERE filters before reading full rows.

Tip: Sampling for Large Datasets

SELECT count() * 10 FROM page_views SAMPLE 0.1;

Queries 10% of data, multiplies result. Much faster for approximate results.

Gotcha: SELECT * Reads All Columns

ClickHouse is column-oriented. Reading unnecessary columns wastes I/O.

SELECT url, count() FROM page_views GROUP BY url;

Tip: Partition Pruning

PARTITION BY toYYYYMM(created_at)

Queries with WHERE created_at >= '2024-01-01' only scan relevant partitions.

Gotcha: EXPLAIN for Query Plans

EXPLAIN actions = 1
SELECT * FROM page_views WHERE url = '/home';

Shows how ClickHouse will execute the query.

Tip: FINAL for ReplacingMergeTree

SELECT * FROM users FINAL WHERE id = 1;

Forces deduplication. Expensive — use sparingly.

Gotcha: max_threads Setting

SET max_threads = 4;

Controls parallelism. Too many threads can hurt performance on small queries.

Tip: Order of Columns in ORDER BY Matters Massively

ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.

Tip: Use LowCardinality for Enum-Like Strings

Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.

Gotcha: Mutations Are Heavy

ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.

Senior Insight

The ClickHouse query profiler is an underutilized tool. It samples stack traces during query execution and shows where CPU time is spent. I use log_queries = 1 and log_query_threads = 1 to capture detailed execution metrics, then query system.query_log to find expensive operations. The insight that consistently surprises me: most slow queries are IO-bound, not CPU-bound. The bottleneck is reading data from disk, not processing it. This makes compression and column selection the most impactful optimizations.

Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)

────────────────────────────────────────────────────────
<-- back to posts