$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[March 29, 2026] MongoDB

MongoDB Aggregation Pipeline in Laravel

MongoDB Aggregation Pipeline: Tips & Tricks

────────────────────────────────────────────────────────

MongoDB Aggregation Pipeline: Tips & Tricks

Tip: Basic Aggregation

Post::raw(function ($collection) {
    return $collection->aggregate([
        ['$match' => ['published' => true]],
        ['$group' => ['_id' => '$category_id', 'count' => ['$sum' => 1]]],
        ['$sort' => ['count' => -1]],
    ]);
});

Gotcha: $match Should Come First

Put $match at the start of the pipeline to filter early. MongoDB can use indexes at the first stage.

Tip: $unwind for Array Data

Post::raw(function ($collection) {
    return $collection->aggregate([
        ['$unwind' => '$tags'],
        ['$group' => ['_id' => '$tags', 'count' => ['$sum' => 1]]],
    ]);
});

Counts tag frequency across all posts.

Gotcha: $lookup for Joins

['$lookup' => [
    'from' => 'categories',
    'localField' => 'category_id',
    'foreignField' => '_id',
    'as' => 'category',
]]

MongoDB's version of a LEFT JOIN.

Tip: $project to Shape Output

['$project' => [
    'title' => 1,
    'comment_count' => ['$size' => '$comments'],
    'author_name' => '$author.name',
]]

Gotcha: Pipeline Memory Limit

Aggregation pipelines have a 100MB memory limit per stage. Use allowDiskUse: true for larger datasets.

Tip: $facet for Multiple Aggregations

['$facet' => [
    'totalPosts' => [['$count' => 'count']],
    'categories' => [['$group' => ['_id' => '$category_id']]],
]]

Runs multiple aggregations in one pipeline.

Gotcha: $out Writes to a Collection

['$out' => 'post_stats']

Creates/replaces a collection with the pipeline output. Destructive operation.

Tip: Embed or Reference? The 80/20 Rule

If you always access data together, embed it. If you access it independently, reference it. The 16MB document size limit is the hard boundary — stay under 1MB for most documents.

Tip: Index Your Query Patterns, Not All Fields

Creating indexes on every field wastes RAM. Use explain() to find in-memory sorts and collection scans. Index only what your actual queries filter on.

Gotcha: No Transaction Rollback for Index Builds

Building an index on a large collection can take hours. If it fails midway, the partial index is silently discarded. Plan index builds during maintenance windows.

Senior Insight

I've learned to be explicit about MongoDB write concerns. The default w: 1 acknowledges writes from the primary only, which means a failover can lose acknowledged writes. For critical data, I use w: majority to ensure writes are replicated to a majority of replicas. The trade-off is latency — waiting for majority acknowledgment adds network round-trips. For a logging system where data loss is acceptable, w: 1 is fine. For financial transactions, w: majority is the minimum.

Source: MongoDB Developer Center (https://www.mongodb.com/developer/), MongoDB Engineering Blog (https://www.mongodb.com/blog/channel/engineering-blog), Studio 3T Blog (https://studio3t.com/blog/)

────────────────────────────────────────────────────────
<-- back to posts