$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[December 01, 2025] ClickHouse

ClickHouse Arrays and Nested Data

ClickHouse Arrays: Tips & Tricks

────────────────────────────────────────────────────────

ClickHouse Arrays: Tips & Tricks

Tip: Array Column Type

CREATE TABLE posts (
    id UInt64,
    title String,
    tags Array(String)
) ENGINE = MergeTree()
ORDER BY id;

Native array support — no junction tables needed.

Gotcha: arrayJoin() Expands Arrays

SELECT title, arrayJoin(tags) as tag FROM posts;

One row per tag. Great for tag frequency analysis.

Tip: Array Functions

has(tags, 'php')           -- Contains element
hasAny(tags, ['php', 'py']) -- Contains any
hasAll(tags, ['php', 'laravel']) -- Contains all
length(tags)               -- Array size

Gotcha: Nested Types

CREATE TABLE events (
    id UInt64,
    properties Nested(
        browser String,
        os String,
        country String
    )
) ENGINE = MergeTree()
ORDER BY id;

Nested types are stored as parallel arrays internally.

Tip: groupArray() for Aggregation

SELECT user_id, groupArray(url) as urls FROM page_views GROUP BY user_id;

Collects all URLs visited by each user into an array.

Gotcha: Array Indexing

SELECT tags[1] FROM posts; -- First element (1-based indexing)

ClickHouse uses 1-based array indexing, not 0-based.

Tip: arrayMap() for Transformation

SELECT arrayMap(x -> upper(x), tags) FROM posts;

Applies a function to each array element.

Gotcha: LowCardinality for Array Elements

tags Array(LowCardinality(String))

Massive compression if array elements have few unique values.

Tip: Order of Columns in ORDER BY Matters Massively

ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.

Tip: Use LowCardinality for Enum-Like Strings

Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.

Gotcha: Mutations Are Heavy

ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.

Senior Insight

ClickHouse monitoring is essential because silent failures are common. Parts can accumulate without merging, mutations can stall, and ZooKeeper sessions can expire. I monitor system.merges for merge queue depth, system.replication_queue for replication lag, and system.mutations for stuck mutations. A healthy ClickHouse server processes merges continuously. If merges stall, performance degrades over time until queries become noticeably slow. Proactive monitoring prevents this.

Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)

────────────────────────────────────────────────────────
<-- back to posts