Wednesday, April 03, 2013

Enable Full Text Search for MongoDB

If you get this in mongo console
db.coll.ensureIndex({'content':'text'})
{
    "err" : "text search not enabled",
    "code" : 16633,
    "n" : 0,
    "connectionId" : 1,
    "ok" : 1
}

, and this in mongod console
[conn1] insert test.system.indexes keyUpdates:0 exception: text search not enabled code:16633 locks(micros) w:336411 336ms

, you need to enable full text search when starting MongoDB.
mongod --setParameter textSearchEnabled=true

Try again.
db.coll.ensureIndex({'content':'text'})

What's happening in background?
[initandlisten] connection accepted from 127.0.0.1:51347 #1 (1 connection now open)
[conn1] build index test.coll { _fts: "text", _ftsx: 1 }
[conn1]     Index: (1/3) External Sort Progress: 3500/6245 56%
[conn1]     Index: (1/3) External Sort Progress: 5400/6245 86%
[conn1]  external sort used : 413 files in 25 secs
[conn1]     Index: (2/3) BTree Bottom Up Progress: 185800/2616966 7%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 401900/2616966 15%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 554200/2616966 21%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 769700/2616966 29%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 973700/2616966 37%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 1175400/2616966 44%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 1380700/2616966 52%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 1588900/2616966 60%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 1794900/2616966 68%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 1936500/2616966 73%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 2125800/2616966 81%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 2315800/2616966 88%
[conn1]     Index: (2/3) BTree Bottom Up Progress: 2528700/2616966 96%
[conn1]  done building bottom layer, going to commit
[conn1] build index done. scanned 6245 total records. 160.617 secs
[conn1] insert test.system.indexes ninserted:1 keyUpdates:0 locks(micros) w:160642416 160645ms

Check indexes.
db.coll.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "test.coll",
        "name" : "_id_"
    },
    {
        "v" : 1,
        "key" : {
            "_fts" : "text",
            "_ftsx" : 1
        },
        "ns" : "test.coll",
        "name" : "content_text",
        "weights" : {
            "content" : 1
        },
        "default_language" : "english",
        "language_override" : "language",
        "textIndexVersion" : 1
    }
]

Check index size. It's about half the text size it indexed.
db.coll.stats()
{
    "ns" : "test.coll",
    "count" : 6245,
    "size" : 69054068,
    "avgObjSize" : 11057.496877502002,
    "storageSize" : 178520064,
    "numExtents" : 12,
    "nindexes" : 4,
    "lastExtentSize" : 49213440,
    "paddingFactor" : 1.0000000000003018,
    "systemFlags" : 0,
    "userFlags" : 1,
    "totalIndexSize" : 86314032,
    "indexSizes" : {
        "_id_" : 212576,
        "content_text" : 85381968
    },
    "ok" : 1
}

Have a test.
db.coll.runCommand("text", {search:'Hello'})
{
    "queryDebugString" : "hello||||||",
    "language" : "english",
    "results" : []
    "stats" : {
        "nscanned" : 4,
        "nscannedObjects" : 0,
        "n" : 4,
        "nfound" : 4,
        "timeMicros" : 157
    },
    "ok" : 1
}

Not bad.