$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[February 01, 2024] ClickHouse

ClickHouse Projections: Query Optimization

ClickHouse Projections: Query Optimization

────────────────────────────────────────────────────────

ClickHouse Projections: Query Optimization

Tip: Create Projection

ALTER TABLE events ADD PROJECTION daily_stats (
    SELECT toDate(created_at) as date, count() as total
    GROUP BY date
);

Gotcha: Projections Update on Insert

Each insert updates all projections. Too many projections slow down writes.

Tip: Projection Selection

ClickHouse automatically uses projections when the query matches. No syntax changes needed.

Gotcha: Projections Need Materialization

Existing data isn't included until you run:

ALTER TABLE events MATERIALIZE PROJECTION daily_stats;

Tip: Projection for Skip Indexes

Projections can act as skip indexes, allowing ClickHouse to skip entire data parts.

Gotcha: Projection Storage

Projections are stored alongside the main data. They increase disk usage.

Tip: Order of Columns in ORDER BY Matters Massively

ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.

Tip: Use LowCardinality for Enum-Like Strings

Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.

Gotcha: Mutations Are Heavy

ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.

Senior Insight

Optimizing ClickHouse queries requires a different mindset than PostgreSQL. Sequential scans are often faster than indexed lookups because columnar storage reads data in large blocks. I focus on: (1) selecting only needed columns (no SELECT *), (2) filtering on the primary key (ORDER BY columns) early, (3) using PREWHERE for filtering columns that aren't in the primary key, and (4) avoiding finalizeAggregation in queries unless necessary.

Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)

────────────────────────────────────────────────────────
<-- back to posts