MongoDB Pipeline Stages: Advanced
MongoDB Pipeline Stages: Advanced
MongoDB Pipeline Stages: Advanced
Tip: $facet for Multiple Aggregations
['$facet' => [
'totalPosts' => [['$count' => 'count']],
'categories' => [['$group' => ['_id' => '$category_id']]],
'recentPosts' => [['$sort' => ['created_at' => -1]], ['$limit' => 5]],
]]
Runs multiple aggregations in one pipeline.
Gotcha: $graphLookup for Recursive Queries
['$graphLookup' => [
'from' => 'categories',
'startWith' => '$parent_id',
'foreignField' => '_id',
'as' => 'ancestors',
]]
Traverses parent-child hierarchies.
Tip: $bucket for Grouping
['$bucket' => [
'groupBy' => '$views',
'boundaries' => [0, 100, 1000, 10000],
'default' => 'other',
'output' => ['count' => ['$sum' => 1]],
]]
Groups numeric data into ranges.
Gotcha: $merge for Writing Results
['$merge' => [
'into' => 'post_stats',
'on' => '_id',
'whenMatched' => 'merge',
'whenNotMatched' => 'insert',
]]
Writes aggregation results to a collection.
Tip: $addFields vs $project
$addFields adds new fields while keeping existing ones. $project replaces all fields.
Gotcha: Pipeline Memory Limit
100MB per stage. Use allowDiskUse: true for larger datasets.
Tip: Embed or Reference? The 80/20 Rule
If you always access data together, embed it. If you access it independently, reference it. The 16MB document size limit is the hard boundary — stay under 1MB for most documents.
Tip: Index Your Query Patterns, Not All Fields
Creating indexes on every field wastes RAM. Use explain() to find in-memory sorts and collection scans. Index only what your actual queries filter on.
Gotcha: No Transaction Rollback for Index Builds
Building an index on a large collection can take hours. If it fails midway, the partial index is silently discarded. Plan index builds during maintenance windows.
Senior Insight
Each aggregation pipeline stage adds latency. I've audited pipelines with 15+ stages that took 30 seconds because each stage processed the full document stream. The optimization strategy: $match early, $project to reduce document size in intermediate stages, and use $lookup only when necessary. A six-stage pipeline with proper filtering can outperform a three-stage pipeline where $match comes last. Think of the pipeline as a stream, not a sequence of operations.
Source: MongoDB Developer Center (https://www.mongodb.com/developer/), MongoDB Engineering Blog (https://www.mongodb.com/blog/channel/engineering-blog), Studio 3T Blog (https://studio3t.com/blog/)