ClickHouse Data Insertion at Scale
ClickHouse Data Insertion: Tips & Tricks
ClickHouse Data Insertion: Tips & Tricks
Tip: Batch Inserts of 10K-100K Rows
$data = [];
for ($i = 0; $i < 10000; $i++) {
$data[] = [
'id' => $i,
'url' => '/page/' . $i,
'user_id' => rand(1, 1000),
'created_at' => now(),
];
}
DB::connection('clickhouse')->table('page_views')->insert($data);
Optimal batch size: 10K-100K rows.
Gotcha: Don't Insert One Row at a Time
Single-row inserts are extremely inefficient in ClickHouse. Always batch.
Tip: Async Inserts
$client->addSendAsyncInterval(1000);
Buffers inserts and sends them periodically.
Gotcha: Insert Format Matters
CSV is faster than JSON for bulk inserts:
INSERT INTO page_views FROM INFILE '/data/views.csv' FORMAT CSV;
Tip: Kafka Integration
CREATE TABLE kafka_views ENGINE = Kafka()
SETTINGS
kafka_broker_list = 'localhost:9092',
kafka_topic_list = 'page_views',
kafka_group_name = 'clickhouse',
kafka_format = 'JSONEachRow';
Real-time ingestion from Kafka.
Gotcha: Insert Deduplication
ClickHouse may deduplicate identical inserts within a short time window. Add a unique column if needed.
Tip: Materialized Views for Auto-Insert
CREATE MATERIALIZED VIEW daily_views
ENGINE = SummingMergeTree()
ORDER BY (date, url)
AS SELECT toDate(created_at) as date, url, count() as views
FROM page_views GROUP BY date, url;
Automatically populates from the source table.
Gotcha: Insert Blocks
ClickHouse groups inserts into "parts." Too many small inserts create too many parts, slowing down merges.
Tip: Order of Columns in ORDER BY Matters Massively
ClickHouse's primary key is defined by ORDER BY. Put high-cardinality columns first for better data skipping. ORDER BY (timestamp, user_id) is very different from ORDER BY (user_id, timestamp) in query performance.
Tip: Use LowCardinality for Enum-Like Strings
Strings like status, country, browser benefit from LowCardinality(String) — it's stored as a dictionary internally, reducing storage 10x and speeding up scans.
Gotcha: Mutations Are Heavy
ALTER TABLE ... UPDATE and DELETE in ClickHouse create new parts instead of modifying in place. A single mutation on a large table can take hours and block merges. Design for append-only from day one.
Senior Insight
ClickHouse is optimized for bulk inserts, not single-row operations. Inserting one row at a time creates tiny parts that overwhelm the merge process. I always buffer to at least 100K rows or insert every 5 seconds (whichever comes first). The Async Insert mode (ClickHouse 22.4+) buffers inserts automatically — I enable it for data ingestion pipelines. The throughput difference is dramatic: 100 single inserts/second vs 100K rows/second in bulk mode.
Source: ClickHouse Blog (https://clickhouse.com/blog), Altinity Blog (https://altinity.com/blog), Altinity Knowledge Base (https://kb.altinity.com/)