ClickHouse Async Inserts: Buffering
ClickHouse Async Inserts: Buffering
ClickHouse Async Inserts: Buffering
Tip: Enable Async Inserts
SET async_insert = 1;
Gotcha: Buffer Flush
ClickHouse flushes the buffer when it reaches max_async_insert_bytes or async_insert_busy_timeout.
Tip: Wait for Async
SET wait_for_async_insert = 1;
Waits for the insert to be processed before returning.
Gotcha: Not for All Workloads
Async inserts are great for high-throughput logging. Not ideal for transactional inserts.
Tip: Batch Size
SET max_async_insert_bytes = 10485760;
10MB buffer before flush.
Gotcha: Data Loss on Crash
Unflushed data in the buffer is lost if ClickHouse crashes.
Tip: Order of Columns in ORDER BY Matters Massively
ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.
Tip: Use LowCardinality for Enum-Like Strings
Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.
Gotcha: Mutations Are Heavy
ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.
Senior Insight
Async inserts in ClickHouse (22.4+) buffer incoming data and flush it in batches. I enable this for all HTTP-based data ingestion where latency isn't critical. The async insert mode groups 10,000+ rows into a single insert, dramatically reducing part count and merge overhead. The setting async_insert = 1 with wait_for_async_insert = 0 provides the best throughput. The trade-off: up to 1 second of data staleness (the default flush interval).
Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)