$ lexprog.com

// notes from an old coder -- php, databases, and the occasional rant

[June 03, 2024] MongoDB

MongoDB Text Search: Implementation

MongoDB Text Search: Implementation

────────────────────────────────────────────────────────

MongoDB Text Search: Implementation

Tip: Create Text Index

$collection->createIndex([
    'title' => 'text',
    'content' => 'text',
]);

Only one text index per collection.

Gotcha: Text Search Syntax

$collection->find([
    '$text' => ['$search' => 'php laravel'],
]);

Searches for documents containing "php" OR "laravel".

Tip: Exact Phrase Search

$collection->find([
    '$text' => ['$search' => '"php laravel"'],
]);

Quotes search for the exact phrase.

Gotcha: Exclude Terms

$collection->find([
    '$text' => ['$search' => 'php -mongodb'],
]);

Minus sign excludes documents containing "mongodb".

Tip: Text Score for Ranking

$collection->find(
    ['$text' => ['$search' => 'php']],
    ['projection' => ['score' => ['$meta' => 'textScore']]]
)->sort(['score' => ['$meta' => 'textScore']]);

Gotcha: Language Specification

$collection->createIndex(
    ['title' => 'text', 'content' => 'text'],
    ['default_language' => 'english']
);

Stemming and stop words depend on the language.

Tip: Embed or Reference? The 80/20 Rule

If you always access data together, embed it. If you access it independently, reference it. The 16MB document size limit is the hard boundary — stay under 1MB for most documents.

Tip: Index Your Query Patterns, Not All Fields

Creating indexes on every field wastes RAM. Use explain() to find in-memory sorts and collection scans. Index only what your actual queries filter on.

Gotcha: No Transaction Rollback for Index Builds

Building an index on a large collection can take hours. If it fails midway, the partial index is silently discarded. Plan index builds during maintenance windows.

Senior Insight

MongoDB's text search with $text is adequate for simple search use cases but doesn't match PostgreSQL's full-text search capabilities. The $text operator creates a case-insensitive search with stemming for supported languages. I've used it for site search on content management systems with up to 500K documents. Beyond that scale, dedicated search services (Elasticsearch, Meilisearch) provide better relevance tuning and faster search.

Source: MongoDB Developer Center (https://www.mongodb.com/developer/), MongoDB Engineering Blog (https://www.mongodb.com/blog/channel/engineering-blog), Studio 3T Blog (https://studio3t.com/blog/)

────────────────────────────────────────────────────────
<-- back to posts