total_rows values are too high

total_rows values are too high

There are cases where the total_rows value is higher than expected.

In some scenarios, it’s expected to see queries returning a total_rows field with a value higher than the maximum rows they can return (map view queries without an explicit limit , skip , startkey or endkey ).

The expected scenarios are during rebalance, and immediately after a failover for a finite period of time.

This happens because in these scenarios some vbuckets are marked for cleanup in the indexes, temporarily marked as passive, or data is being transferred from the replica index to the main index (after a failover). While the rows originated from those vbuckets are never returned to queries, they contribute to the reduction value of every view btree, and this value is what is used for the total_rows field in map view query responses (it’s simply a counter with total number of Key-Value pairs per view).

Ensuring that total_rows always reflected the number of rows originated from documents in active vbuckets would be very expensive, severely impacting performance. For example, we would need to maintain a different value in the btree reductions which would map vbucket IDs to row counts:

{"0":56, "1": 2452435, ..., "1023": 432236} 

This would significantly reduce the btrees branching factor, making them much more deep, using more disk space and taking more time to compute reductions on inserts/updates/deletes.

To know if there are vbuckets under cleanup, vbuckets in passive state or vbuckets being transferred from the replica index to main index (on failover), one can query the following URL:

> curl -s 'http://localhost:8092/_set_view/default/_design/dev_test2/_info' | json_xs { "passive_partitions" : [1, 2, 3], "cleanup_partitions" : [], "replicas_on_transfer" : [1, 2, 3], (....) } 

Note that the example above intentionally hides all non-relevant fields. If any of the fields above is a non-empty list, than total_rows for a view may be higher than expected, that is, we’re under one of those expected scenarios mentioned above. In steady state all of the above fields are empty lists.