$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[October 12, 2024] ClickHouse

ClickHouse Analytic Queries

ClickHouse Analytic Queries: Tips & Tricks

────────────────────────────────────────────────────────

ClickHouse Analytic Queries: Tips & Tricks

Tip: Time Series Aggregation

SELECT
    toStartOfHour(created_at) as hour,
    count() as views
FROM page_views
GROUP BY hour
ORDER BY hour;

Groups data by hour for time-series analysis.

Gotcha: uniq() vs uniqExact()

SELECT uniq(user_id) FROM page_views;     -- Approximate, fast
SELECT uniqExact(user_id) FROM page_views; -- Exact, slower

uniq() uses HyperLogLog — good enough for most cases.

Tip: Percentiles

SELECT
    quantile(0.5)(response_time) as p50,
    quantile(0.95)(response_time) as p95,
    quantile(0.99)(response_time) as p99
FROM api_logs;

Essential for performance monitoring.

Gotcha: TOP N Queries

SELECT url, count() as views
FROM page_views
WHERE created_at >= today() - 7
GROUP BY url
ORDER BY views DESC
LIMIT 10;

Standard top-N pattern.

Tip: Running Totals

SELECT
    date,
    views,
    sum(views) OVER (ORDER BY date) as running_total
FROM daily_stats;

Window functions work in ClickHouse too.

Gotcha: arrayJoin() for Unnesting

SELECT arrayJoin(tags) as tag, count() FROM posts GROUP BY tag;

Expands arrays into rows.

Tip: Date Truncation Functions

toStartOfDay(dt)    -- Midnight of the day
toStartOfWeek(dt)   -- Start of the week
toStartOfMonth(dt)  -- First of the month
toStartOfHour(dt)   -- Start of the hour

Gotcha: GROUP BY with Aliases

ClickHouse allows GROUP BY aliases in some versions, but it's safer to repeat the expression.

Tip: Order of Columns in ORDER BY Matters Massively

ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.

Tip: Use LowCardinality for Enum-Like Strings

Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.

Gotcha: Mutations Are Heavy

ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.

Senior Insight

ClickHouse requires a fundamental shift in thinking for developers coming from OLTP databases. The database is optimized for bulk reads of millions of rows, not single-row lookups. I've seen developers struggle because they try to use ClickHouse like PostgreSQL — frequent small INSERTs, point SELECTs, UPDATEs. These operations work in ClickHouse but perform poorly. The mental model should be: write in batches, read in ranges, and never update. Once you internalize this, ClickHouse becomes beautiful.

Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)

────────────────────────────────────────────────────────
<-- back to posts