$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[February 06, 2024] ClickHouse

ClickHouse Skip Indexes: Granularity

ClickHouse Skip Indexes: Granularity

────────────────────────────────────────────────────────

ClickHouse Skip Indexes: Granularity

Tip: MinMax Index

ALTER TABLE events ADD INDEX idx_date created_at TYPE minmax GRANULARITY 1;

Gotcha: Bloom Filter Index

ALTER TABLE events ADD INDEX idx_user user_id TYPE bloom_filter(0.01) GRANULARITY 1;

Tip: Set Index

ALTER TABLE events ADD INDEX idx_status status TYPE set(10) GRANULARITY 1;

Gotcha: Granularity

Granularity defines how many index granules are skipped. Higher = more skipping but larger index.

Tip: Materialize Index

ALTER TABLE events MATERIALIZE INDEX idx_date;

Applies index to existing data.

Gotcha: Index Overhead

Skip indexes add storage overhead and slow down inserts.

Tip: Order of Columns in ORDER BY Matters Massively

ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.

Tip: Use LowCardinality for Enum-Like Strings

Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.

Gotcha: Mutations Are Heavy

ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.

Senior Insight

Skip indexes in ClickHouse are data-skipping indexes that help avoid reading entire granules. I use minmax indexes on monotonically increasing columns (like timestamps) to prune partitions, and bloom_filter indexes on high-cardinality columns for equality lookups. The granularity parameter controls how many granules the index covers — lower granularity means more precise skipping but larger indexes. Tuning skip indexes requires understanding your data distribution and query patterns.

Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)

────────────────────────────────────────────────────────
<-- back to posts