$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[September 07, 2024] ClickHouse

ClickHouse Array Functions: Operations

ClickHouse Array Functions: Operations

────────────────────────────────────────────────────────

ClickHouse Array Functions: Operations

Tip: Array Column

CREATE TABLE posts (
    title String,
    tags Array(String)
) ENGINE = MergeTree ORDER BY title;

Gotcha: Array Join

SELECT arrayJoin(tags) as tag, count() FROM posts GROUP BY tag;

Expands arrays into rows. Each tag becomes a separate row.

Tip: Array Contains

SELECT * FROM posts WHERE has(tags, 'laravel');

Checks if array contains a specific value.

Gotcha: Array Indexing

SELECT tags[1] FROM posts;

1-based indexing (not 0-based like most languages).

Tip: Array Map and Filter

SELECT arrayFilter(x -> x != 'spam', tags) FROM posts;

Removes 'spam' from each tags array.

Gotcha: Array Size Limit

Arrays have no explicit size limit, but large arrays impact query performance.

Tip: Order of Columns in ORDER BY Matters Massively

ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.

Tip: Use LowCardinality for Enum-Like Strings

Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.

Gotcha: Mutations Are Heavy

ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.

Senior Insight

ClickHouse's array functions are surprisingly powerful. I use arraySort(), arrayConcat(), and arrayIntersect() on tag data, groupArray() and groupUniqArray() for aggregation into arrays, and arrayMap() for element-wise transformations. For time-series data, arrayCumSum() creates running totals in a single query. The learning curve is steep, but once you internalize the array function mental model, you can express complex data transformations concisely.

Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)

────────────────────────────────────────────────────────
<-- back to posts