MongoDB Aggregation Pipeline in Laravel
MongoDB Aggregation Pipeline: Tips & Tricks
MongoDB Aggregation Pipeline: Tips & Tricks
Tip: Basic Aggregation
Post::raw(function ($collection) {
return $collection->aggregate([
['$match' => ['published' => true]],
['$group' => ['_id' => '$category_id', 'count' => ['$sum' => 1]]],
['$sort' => ['count' => -1]],
]);
});
Gotcha: $match Should Come First
Put $match at the start of the pipeline to filter early. MongoDB can use indexes at the first stage.
Tip: $unwind for Array Data
Post::raw(function ($collection) {
return $collection->aggregate([
['$unwind' => '$tags'],
['$group' => ['_id' => '$tags', 'count' => ['$sum' => 1]]],
]);
});
Counts tag frequency across all posts.
Gotcha: $lookup for Joins
['$lookup' => [
'from' => 'categories',
'localField' => 'category_id',
'foreignField' => '_id',
'as' => 'category',
]]
MongoDB's version of a LEFT JOIN.
Tip: $project to Shape Output
['$project' => [
'title' => 1,
'comment_count' => ['$size' => '$comments'],
'author_name' => '$author.name',
]]
Gotcha: Pipeline Memory Limit
Aggregation pipelines have a 100MB memory limit per stage. Use allowDiskUse: true for larger datasets.
Tip: $facet for Multiple Aggregations
['$facet' => [
'totalPosts' => [['$count' => 'count']],
'categories' => [['$group' => ['_id' => '$category_id']]],
]]
Runs multiple aggregations in one pipeline.
Gotcha: $out Writes to a Collection
['$out' => 'post_stats']
Creates/replaces a collection with the pipeline output. Destructive operation.
Tip: Embed or Reference? The 80/20 Rule
If you always access data together, embed it. If you access it independently, reference it. The 16MB document size limit is the hard boundary — stay under 1MB for most documents.
Tip: Index Your Query Patterns, Not All Fields
Creating indexes on every field wastes RAM. Use explain() to find in-memory sorts and collection scans. Index only what your actual queries filter on.
Gotcha: No Transaction Rollback for Index Builds
Building an index on a large collection can take hours. If it fails midway, the partial index is silently discarded. Plan index builds during maintenance windows.
Senior Insight
I've learned to be explicit about MongoDB write concerns. The default w: 1 acknowledges writes from the primary only, which means a failover can lose acknowledged writes. For critical data, I use w: majority to ensure writes are replicated to a majority of replicas. The trade-off is latency — waiting for majority acknowledgment adds network round-trips. For a logging system where data loss is acceptable, w: 1 is fine. For financial transactions, w: majority is the minimum.
Source: MongoDB Developer Center (https://www.mongodb.com/developer/), MongoDB Engineering Blog (https://www.mongodb.com/blog/channel/engineering-blog), Studio 3T Blog (https://studio3t.com/blog/)