$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[October 06, 2025] ClickHouse

ClickHouse Dictionaries: Fast Lookups

ClickHouse Dictionaries: Fast Lookups

────────────────────────────────────────────────────────

ClickHouse Dictionaries: Fast Lookups

Tip: Flat Dictionary

Creates an in-memory dictionary for O(1) lookups. Best for small reference tables.

Gotcha: Dictionary Memory Usage

Flat dictionaries load ALL data into RAM. Use hashed or cache for large datasets.

Tip: dictGet Function

SELECT dictGet('countries', 'name', toUInt64(country_id)) as country_name
FROM events;

Faster than JOIN for dimension lookups.

Gotcha: Dictionary Updates

lifetime controls refresh interval. Changes in the source aren't visible until refresh.

Tip: Cache Dictionary

Only caches frequently accessed entries. Good for large reference tables.

Gotcha: Complex Keys

Dictionaries support composite keys for multi-column lookups.

Tip: Order of Columns in ORDER BY Matters Massively

ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.

Tip: Use LowCardinality for Enum-Like Strings

Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.

Gotcha: Mutations Are Heavy

ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.

Senior Insight

The most expensive ClickHouse mistake I've made was not planning the ORDER BY (primary key) carefully. Unlike PostgreSQL where indexes can be added later, ClickHouse's primary key determines the physical data layout and can't be changed without recreating the table. I spend significant time analyzing query patterns before designing the schema. The ORDER BY should include columns that appear in WHERE clauses for efficient data skipping. Getting this wrong means recreating the table and re-importing all data.

Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)

────────────────────────────────────────────────────────
<-- back to posts