Troubleshooting and FAQs

Troubleshooting and FAQs

Full Text logs can be found in the file fts.log. This is stored with the other Couchbase Server logs; the exact location depends on your operating system.

Q: I just did a search but I don’t see the document contents, I just see the document IDs. I don’t want to have to click on document IDs.

You need to store fields in order to see result snippets and highlighting. Be aware that doing so makes your indexes larger and take longer to build, so make sure you need these features.

Q: Why doesn’t a search for X return document Y?

Check the following:

  • How are the search terms being analyzed?

    Text is analyzed when it is written to the index, and separately, search terms are analyzed when you perform a search. These processes are independent. Unless the same analyzer is used for both indexing and search, the terms may not match.

    The analyzer being used on your search terms may not the one you expected. This may happen if the search is not be scoped to the field, or is scoped to the wrong field name.

    If the search clause is scoped to a field, FTS uses the same analyzer that the index mapping dictates for that field.

  • After analysis, what terms is the search looking for in the index?

    Once you know what analyzer is being used on your search query, use http://analysis.blevesearch.com/ to see the exact terms that the analyzer outputs for your search. For example, if you’ve determined that the "en" analyzer is being used, a search for "marketing blunders" will result in term searches for "market" and "blunder."

  • Run a special query called a term search for one term at a time, exactly as the terms come from the analyzer in step 2. Be sure to return the document id match. Rerun for each term.
    curl -XPOST -H "Content-Type: application/json" \
            http://localhost:9200/api/index/perf_fts_index/query \
            -d '{
            "query": {
            "conjuncts": [
            {
            "term": "market"
            },
            {
            "ids": ["Y"]
            }
            ]
            }
            }'
  • Did you get any results?
    If yes, the search query is working fine. If not, check the following:
    • The document (as analyzed) does not contain the terms in question. Review the contents of the document.
    • The terms in the document don’t analyze the way the user expected. Try changing the analyzer, possibly define a custom analyzer.
    • The index mapping may be incorrect and the expected analyzer isn’t actually being used.
  • Advanced: Digging deeper into the index

    Run command-line tools on the pindex that contains the document you expect to match.

    $ cbft-bleve-dump -docID Y -index path_to.pindex

    This dumps everything the index knows about that document.

Q: Why is this document in the search results, or why is this document ranked high in the search results?

Check the following:
  • Run the query with the option "explain" set to true. Understand which terms contributed to the document scoring the way it did. Higher numbers indicate that a term contributes more weight.
  • Experiment with boost to adjust this. For example, a query like "name:margaret^2 hobby:singing" would give twice as much weight to a match in the name field as the hobby field.

Q: Why is indexing so slow?

By default, FTS tries to index all the fields in a document yet you typically only need to search on a few key fields. Indexing all fields requires more computation and results in larger indexes than you get if you limit the index to just the fields you need.

Here are some tips to speed up indexing:
  • Build a custom mapping, set "dynamic": false to disable automatic indexing, and explicitly index only the named fields that you need to search.
  • Avoid numeric/date fields if you do not plan to query on these fields.
  • Avoid storing fields unless you absolutely need snippets or highlighting.
  • Consider disabling term vectors for a particular field if you know that you do not need snippets, highlights, or phrase search.

Q: Why is querying so slow?

Slow queries can be caused by searches that require a lot of processing to satisfy. This is commonly due to a search includes high frequency terms, that is, terms that appear in many or even all documents. Searches on high frequency terms require more work than searches for low frequency terms and can easily be 100x slower due to the large number of matches.
  • If possible, consider adding high frequency terms to a stop word list and reindex so they are not excluded from the index altogether. High frequency words often contribute little to the search effectiveness and they can have a seriously detrimental impact on search performance, index size and index build times.
  • Check your indexes to rule out unexpected high frequency terms. Words like "the" and "a" in English are relatively obvious candidates to add to a stopword list, but you may also have unexpected high frequency terms in your index, perhaps because there is a common token in your data you didn’t expect or a tokenizer is breaking words in an unexpected way, leading to other common terms.
  • Try rewriting queries to be more efficient if you must search on high frequency terms. Using conjunction (AND) queries can help make the search faster. In a query string, you can do this by using the + operator (MUST). For If searching for the following is slow: lowFreqTerm highFreqTerm, try +lowFreqTerm highFreqTerm.

    Although it is a different search, it should perform better and may be acceptable.

Q: Sort isn't working like I think it should. Why do I see some weird characters in my search response object's sort field?

When you sort results on a field that isn't indexed, or when a particular document is missing a value for that field, you will see the following series of Unicode non-printable characters appear in the sort field: \ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd. The same characters may render differently when using a graphic tool or command line tools like jq.

      "sort": [
        "����������",
        "hotel_9723",
        "_score"
      ]

Check your index definition to confirm that you're indexing all the fields you intend to sort by. You can control the sort behavior for missing attributes using the missing field in the advanced sort syntax.

Also remember, documents that have the same value for every field you specified in the sort field will be sorted non-determinisitically. Try adding _id, which is guaranteed unique.

Q: Are there command-line tools to help troubleshoot?

Yes - cbft-bleve dump and cbft-bleve query.
  • cbft-bleve dump
    • Dump entire contents of pindex file
    • Dump all rows in pindex for single document
    • Dump term dictionary for field in pindex
    • Dump list of fields in pindex
    • Dump index mapping stored inside pindex
  • cbft-bleve query
    • Runs query on single pindex file