$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[March 09, 2024] ClickHouse

ClickHouse LowCardinality: Optimization

ClickHouse LowCardinality: Optimization

────────────────────────────────────────────────────────

ClickHouse LowCardinality: Optimization

Tip: LowCardinality Type

CREATE TABLE events (
    event_type LowCardinality(String),
    country LowCardinality(FixedString(2))
) ENGINE = MergeTree ORDER BY event_type;

Gotcha: Best for < 10K Unique Values

LowCardinality uses dictionary encoding. If cardinality is high, it wastes memory.

Tip: Automatic Compression

LowCardinality columns are automatically compressed. No manual configuration needed.

Gotcha: Not Compatible with All Functions

Some functions don't support LowCardinality. ClickHouse falls back to regular processing.

Tip: Combine with Enum8

status LowCardinality(Enum8('active' = 1, 'inactive' = 2))

Maximum compression for known-value columns.

Gotcha: Query Performance

LowCardinality speeds up GROUP BY and WHERE on the column. But adds slight overhead on inserts.

Tip: Order of Columns in ORDER BY Matters Massively

ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.

Tip: Use LowCardinality for Enum-Like Strings

Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.

Gotcha: Mutations Are Heavy

ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.

Senior Insight

The LowCardinality data type is ClickHouse's secret weapon for string columns with fewer than 10,000 unique values. It stores values as a dictionary internally, reducing storage by 5-10x and improving query performance because comparisons work on integer dictionary keys instead of strings. I apply LowCardinality to status fields, country codes, campaign names, and any enum-like string column. The only downside: it adds CPU overhead for dictionary lookups during insertion.

Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)

────────────────────────────────────────────────────────
<-- back to posts