$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[September 20, 2024] MongoDB

MongoDB Map-Reduce: Legacy Aggregation

MongoDB Map-Reduce: Legacy Aggregation

────────────────────────────────────────────────────────

MongoDB Map-Reduce: Legacy Aggregation

Tip: Map Function

$map = 'function() { emit(this.category_id, 1); }';
$reduce = 'function(key, values) { return Array.sum(values); }';

Gotcha: Deprecated

Map-reduce is deprecated since MongoDB 5.0. Use aggregation pipeline instead.

Tip: Aggregation Replacement

['$group' => ['_id' => '$category_id', 'count' => ['$sum' => 1]]]

Gotcha: Map-Reduce Performance

Aggregation pipeline is faster and more flexible than map-reduce.

Tip: Output Collection

$collection->mapReduce($map, $reduce, ['out' => 'category_counts']);

Gotcha: JavaScript Execution

Map-reduce runs JavaScript. Aggregation pipeline runs native code.

Tip: Embed or Reference? The 80/20 Rule

If you always access data together, embed it. If you access it independently, reference it. The 16MB document size limit is the hard boundary — stay under 1MB for most documents.

Tip: Index Your Query Patterns, Not All Fields

Creating indexes on every field wastes RAM. Use explain() to find in-memory sorts and collection scans. Index only what your actual queries filter on.

Gotcha: No Transaction Rollback for Index Builds

Building an index on a large collection can take hours. If it fails midway, the partial index is silently discarded. Plan index builds during maintenance windows.

Senior Insight

The aggregation pipeline is MongoDB's equivalent of SQL's complex queries, and it's far more powerful. My most important lesson: always put $match as the first stage. An aggregation that filters 10 million documents down to 1,000 should scan only those 1,000 through the remaining stages. I've optimized pipelines by moving $match from position 5 to position 1 and reducing execution time from 30 seconds to 200ms.

Source: MongoDB Developer Center (https://www.mongodb.com/developer/), MongoDB Engineering Blog (https://www.mongodb.com/blog/channel/engineering-blog), Studio 3T Blog (https://studio3t.com/blog/)

────────────────────────────────────────────────────────
<-- back to posts