View

View

Edit on GitHub
Platform: iOS Android WPF

A View is a persistent index of documents in a database, which you then query to find data. Couchbase Lite doesn't have a query language like SQL; instead, it uses a technique called map/reduce to generate indexes (views) according to arbitrary app-defined criteria. Queries can then look up a range of rows from a view, and either use the rows' keys and values directly or get the documents they came from.

The main component of a view (other than its name) is its map function. This function is written in the same language as your app—most likely Objective-C or Java—so it’s very flexible. It takes a document's JSON as input, and emits (outputs) any number of key/value pairs to be indexed. The view generates a complete index by calling the map function on every document in the database, and adding each emitted key/value pair to the index, sorted by key. For example, a map function might grind through an address-book database and produce a set of mappings from names to phone numbers. The resulting index is persistent, and updated incrementally as documents change. (It’s very much like the type of index a SQL database creates internally to optimize queries.)

A view may also have a reduce function. If present, it can be used during queries to combine multiple rows into one. It can be used to compute aggregate values like totals or averages, or to group rows by common criteria (like collecting all the artists in a record collection.) We'll explain reduce functions later on.

Remember: a view is not a query, it’s an index. Views are persistent, and need to be updated (incrementally) whenever documents change, so having large numbers of them can be expensive. Instead, it’s better to have a smaller number of views that can be queried in interesting ways.

Creating and initializing views

View objects belong to a Database. You create or find a view by calling the database's viewNamed method, which will create and return a new View if none exists by that name.

Even though a view is persistent, its map and reduce functions aren't: they're just function pointers (or blocks, or inner classes) and have to be registered at runtime, before the view is queried. It's good practice to set up views when your app starts up, right after opening the database:

// Create a view and register its map function:
CBLView* phoneView = [db viewNamed: @"phones"];
[view setMapBlock: MAPBLOCK({
    for (NSString* phone in doc[@"phones"]) {
        emit(phone, doc[@"name"]);
    }
}) version: @"2"];
// Create a view and register its map function:
let phoneView = db.viewNamed("phones")
phoneView.setMapBlock({ (doc, emit) in
    if let phones = doc["phones"] as? [String] {
        for phone in phones {
            emit(phone, doc["name"])
        }
    }
}, version: "2")
// Create a view and register its map function:
View phoneView = database.getView("phones");
phoneView.setMap(new Mapper() {
    @Override
    public void map(Map<String, Object> document, Emitter emitter) {
        List<String> phones = (List) document.get("phones");
        for (String phone : phones) {
            emitter.emit(phone, document.get("name"));
        }
    }
}, "2");
// Create a view and register its map function:
View phoneView = database.getView("phones");
phoneView.setMap(new Mapper() {
    @Override
    public void map(Map<String, Object> document, Emitter emitter) {
        List<String> phones = (List) document.get("phones");
        for (String phone : phones) {
            emitter.emit(phone, document.get("name"));
        }
    }
}, "2");
// Create a view and register its map function:
var view = database.GetView("phones");
view.SetMap((doc, emit) =>
{
    var phones = doc["phones"].AsList<string>();
    foreach(var phone in phones) 
    {
        emit(phone, doc["name"]);
    }
}, "2");

The version parameter to setMapBlock takes a bit of explanation. During development, and as you update the app, you may change the behavior of a map function. This invalidates any existing index generated by that function, so the next time the app runs, the view should rebuild the index from scratch using the new function. Unfortunately the view indexer can't tell that the map function has changed because it can't see its source code! Instead, you have to provide a version string that the indexer can compare, and you must change that string whenever you change the function. The easiest thing to remember is just to set the version to "1" initially, and then increment it every time you edit any source code in the map function (or any function of yours that it calls.)

Querying views

You query a view by using a Query object that you create from a View by calling createQuery. This is a big topic, and is covered in a separate article on the Query class.

Map functions

Understanding map functions

As discussed in the introduction, a map function's job is to look at a document's JSON contents and from them produce (emit) zero or more key/value pairs to be indexed. If you know SQL, you can think of it as corresponding to the expressions that immediately follow the SELECT and WHERE keywords, only more powerful because you have the full power of a programming language available.

For discussion purposes, here's a simple map function in JavaScript:

function(doc) {
    if (doc["type"] == "person")
        emit(doc["name"], doc["phone"]);
}

This function works with a database that contains, among other things, documents representing people, which are tagged with a type property whose value is "person". (This use of a type property is a common idiom.) Every person document contains name and phone properties. The map function simply checks whether the document represents a person, and if it does, it calls emit to add the name and phone number to the index.

The resulting index maps names to phone numbers. You can query it to look up someone by name and find their phone number. You can also query it to get ranges of names, in alphabetical order, which is very useful for driving GUI list views.

Rules for the map function

The map function is called by the indexer to help generate an index, and it has to meet certain requirements, otherwise the index won't be consistent. It's important to understand some rules so you can create a proper map function, otherwise your queries can misbehave in strange ways.

  • It must be a "pure" function: That means any time it's called with the same input, it must produce exactly the same output. In other words, it can't use any external state, just its input JSON.
  • It can't have side effects: It shouldn't change any external state, because it's unpredictable when it's called or how often it's called or in what order documents are passed to it.
  • It must be thread-safe: It may be called on a background thread belonging to the indexer, or even in parallel on several threads at once.

In particular, avoid these common mistakes:

  • Don't do anything that depends on the current date and time -- that breaks the first rule, since your function's output can change depending on the date/time it's called. Common mistakes include emitting the current time as a timestamp, emitting a person's age, or emitting only documents that have been modified in the past week.
  • Don't try to "parameterize" the map function by referring to an external variable whose value you change when querying. It won't work. People sometimes try this because they want to find various subsets of the data, like all the items of a particular color. Instead, emit all the values of that property, and use a key range in the query to pick out the rows with the specific value you want.
  • Don't make any assumptions about when the map function is called. That's an implementation detail of the indexer. (For example, it's not called every time a document changes.)
  • Avoid having the map function call out into complex external code. That code might change later on to be stateful or have side effects, breaking your map function.

Keys and values

Both the key and value passed to emit can be any JSON-compatible objects: not just strings, but also numbers, booleans, arrays, dictionaries/maps, and the special JSON null object (which is distinct from a null/nil pointer.) In addition, the value emitted, but not the key, can be a null/nil pointer. (It's pretty common to not need a value in a view, in which case it's more efficient to not emit one.)

Keys are commonly strings, but it turns out that arrays are a very useful type of key as well. This is because of the way arrays are sorted: given two array keys, the first items are compared first, then if those match the second items are compared, and so on. That means that you can use array keys to establish multiple levels of sorting. If the map function emits keys of the form [lastname, firstname], then the index will be sorted by last name, and entries with the same last name will be sorted by first name, just as if you'd used ORDER BY lastname, firstname in SQL.

Here are the exact rules for sorting (collation) of keys. The most significant factor is the key's object type; keys of one type always sort before or after keys of a different type. This list gives the types in order, and states how objects of that type are compared:

  • null
  • false, true (in that order)
  • Numbers, in numeric order of course
  • Strings, case-insensitive. The exact ordering is specified by the Unicode Collation Algorithm. This is not the same as ASCII ordering, so the results might surprise you -- for example, all symbols, including "~", sort before alphanumeric characters.
  • Arrays, compared item-by-item as described above.
  • Maps/dictionaries, also compared item-by-item. Unfortunately the order of items is ambiguous (since JSON doesn't specify any ordering of keys, and most implementations use hash tables which randomize the order) so using these as keys isn't recommended.

The source document, and redirection

In addition to its key and value, every index row also remembers the ID of the document that emitted it. This can be accessed at query time via the QueryRow.documentID property, or more commonly via the shortcut QueryRow.document which uses the ID to load the Document object.

It can sometimes be useful to redirect this reference, i.e. to make the index row point to a different document instead. You do this by emitting a value that's a dictionary with a key _id whose value is the document ID you want the row to reference. The QueryRow.documentID and accessors will then use this document ID instead.

// This example indexes documents that record Facebook-style "likes".
// When querying, the document we really want to look at is the post being
// liked, so we redirect the emitted row at that document.
[view setMapBlock: MAPBLOCK({
    if ([doc[@"type"] isEqual: @"like"]) {
        NSString* associatedID = doc[@"likedPostID"];
        NSArray* key = @[doc[@"creator"], doc[@"date"]];
        NSDictionary* value = @{@"_id": associatedID};
        emit(key, value);
    }
}) version: @"1"];
// This example indexes documents that record Facebook-style "likes".
// When querying, the document we really want to look at is the post being
// liked, so we redirect the emitted row at that document.
view.setMapBlock({ (doc, emit) -> Void in
    if doc["type"] as? String == "like" {
        let associatedID = doc["likePostID"] as String
        let key = [doc["creator"]!, doc["date"]!]
        let value = ["_id": associatedID]
        emit(key, value)
    }
}, version: "1")
// This example indexes documents that record Facebook-style "likes".
// When querying, the document we really want to look at is the post being
// liked, so we redirect the emitted row at that document.
view.setMap(new Mapper() {
    @Override
    public void map(Map<String, Object> document, Emitter emitter) {
        if (document.get("type").equals("like")) {
            String associatedID = (String) document.get("likedPostID");
            List<Object> key = new ArrayList<Object>();
            key.add(document.get("creator"));
            key.add(document.get("date"));
            HashMap<String, Object> value = new HashMap<String, Object>();
            value.put("_id", associatedID);
            emitter.emit(key, value);
        }
    }
}, "1");
// This example indexes documents that record Facebook-style "likes".
// When querying, the document we really want to look at is the post being
// liked, so we redirect the emitted row at that document.
view.setMap(new Mapper() {
    @Override
    public void map(Map<String, Object> document, Emitter emitter) {
        if (document.get("type").equals("like")) {
            String associatedID = (String) document.get("likedPostID");
            List<Object> key = new ArrayList<Object>();
            key.add(document.get("creator"));
            key.add(document.get("date"));
            HashMap<String, Object> value = new HashMap<String, Object>();
            value.put("_id", associatedID);
            emitter.emit(key, value);
        }
    }
}, "1");
No code example is currently available.

Even if you've used the redirect technique, at query time you can still recover the ID of the actual document that emitted the row, by using the QueryRow.sourceDocumentID property.

Reduce functions

Understanding reduce functions

Reduce functions are the other half of the map/reduce technique. They're optional, and less commonly used. A reduce function post-processes the indexed key/value pairs generated by the map function, by aggregating the values together. Very commonly it counts them, or (if the values are numeric) totals or averages them. The reduce function boils down data the way a chef reduces a sauce. Or if you're a SQL user, reduce functions are like SQL aggregation operators like COUNT or AVERAGE (only you get to define your own.)

In general, most views don't need reduce functions, so don't feel like you're missing something if you haven't written one. But if you find yourself writing a query and counting the returned rows or adding up their values, you could do that more efficiently with a reduce function.

A reduce function takes an ordered list of key/value pairs, aggregates them together into a single object, and returns that object. Here's an example, building on the phone-numbers example up above:

// Create a view and register its map and reduce functions:
CBLView* phoneView = [db viewNamed: @"phones"];
[view setMapBlock: MAPBLOCK({
    for (NSString* phone in doc[@"phones"]) {
        emit(phone, doc[@"name"]);
    }
}) reduceBlock:^id(NSArray *keys, NSArray *values, BOOL rereduce) {
    return @(values.count);
} version: @"2"];
let phoneView = db.viewNamed("phones")
phoneView.setMapBlock({ (doc, emit) -> Void in
    if let phones = doc["phones"] as? [String] {
        for phone in phones {
            emit(phone, doc["name"])
        }
    }
}, reduceBlock: { (keys, values, rereduce) -> AnyObject! in
    return values.count
}, version: "2")
// Create a view and register its map and reduce functions:
View phoneView = database.getView("phones");
phoneView.setMapReduce(new Mapper() {
    @Override
    public void map(Map<String, Object> document, Emitter emitter) {
        List<String> phones = (List) document.get("phones");
        for (String phone : phones) {
            emitter.emit(phone, document.get("name"));
        }
    }
}, new Reducer() {
    @Override
    public Object reduce(List<Object> keys, List<Object> values, boolean rereduce) {
       return new Integer(values.size());
    }
}, "2");
// Create a view and register its map and reduce functions:
View phoneView = database.getView("phones");
phoneView.setMapReduce(new Mapper() {
    @Override
    public void map(Map<String, Object> document, Emitter emitter) {
        List<String> phones = (List) document.get("phones");
        for (String phone : phones) {
            emitter.emit(phone, document.get("name"));
        }
    }
}, new Reducer() {
    @Override
    public Object reduce(List<Object> keys, List<Object> values, boolean rereduce) {
       return new Integer(values.size());
    }
}, "2");
// Create a view and register its map and reduce functions:
var view = database.GetView("phones");
view.SetMapReduce((doc, emit) => 
{
    var phones = doc["phones"].AsList<string>();
    foreach(var phone in phones) 
    {
        emit(phone, doc["name"]);
    }
}, (keys, values, rereduce) => values.ToList().Count, "2.0");

For efficiency, the key/value pairs are passed in as two parallel arrays. This reduce block just counts the number of values and returns that number as an object. We could query this view, with reduce enabled, and get the total number of phone numbers in the database. Or by specifying a key range we could find the number of phone numbers in that range, for example the number in a single area code.

Here's just the body of a reduce function that totals up numbers. (This function would belong in a different view, whose map function emitted numeric values.)

double total = 0;
for (NSNumber* value in values) {
    total += value.doubleValue;
}
return @(total);
var total: Double = 0.0
let numberValues = values as [Double];
for value in numberValues {
    total += value
}
return total
double total = 0;
for (Double value : values) {
    total += value.doubleValue();
}
return new Double(total);
double total = 0;
for (Double value : values) {
    total += value.doubleValue();
}
return new Double(total);
var total = 0.0;
foreach(var value in values.ToList())
{
    total += Convert.ToDouble(value);
}
return total;

This totaling is common enough that CBLView provide a utility to do it for you, the totalValues method.

Rereducing

The previous section ignored the boolean rereduce parameter that's passed to the reduce function. What's it for? Unfortunately, from your perspective as a reduce-function-writer it's just there to make your job a bit harder. The reason it exists is because it's part of a major optimization that makes reducing more efficient for the query engine.

Think of a view with a hundred million rows in its index. To run a reduced query against the whole index (with no startKey or endKey) the database will have to read all hundred million keys and values into memory at once, so it can pass them all to your reduce function. That's a lot of overhead, and on a mobile device it's likely to crash your app.

Instead, the database will read the rows in chunks. It'll read some number of rows into memory, send them to your reduce function, release them from memory, then go on to the next rows. This scales very well, but now there's the problem of what to do with the multiple reduced values returned by your function. Reducing is supposed to produce one end result, not several! The answer is to reduce the list of reduced values -- to re-reduce.

The rereduce parameter is there to tell your reduce function that it's being called in this special re-reduce mode. When re-reducing there are no keys, and the values are the ones already returned by previous runs of the same reduce function. The function's job is, once again, to combine the values into a single value and return it.

Sometimes you can handle re-reduce mode exactly like reduce mode. The second reduce block shown above (the one that totals up the values) can do this. Since its input values are numbers, and its output is a number, the re-reduce is done the same way as the reduce, and it can just ignore the rereduce flag.

But sometimes re-reduce has to work differently, because the output of the reduce stage doesn't look like the indexed values. The first reduce example -- the one that just counts the rows -- is an example. To re-reduce a list of row counts, you can't just count them, you have to add them. Let's revisit that example and add proper support for re-reducing:

// Create a view and register its map and reduce functions:
CBLView* phoneView = [db viewNamed: @"phones"];
[view setMapBlock: MAPBLOCK({
    for (NSString* phone in doc[@"phones"]) {
        emit(phone, doc[@"name"]);
    }
}) reduceBlock:^id(NSArray *keys, NSArray *values, BOOL rereduce) {
    if (rereduce) {
        return [CBLView totalValues: values];  // re-reduce mode adds up counts
    } else {
        return @(values.count);
    }
}) version: @"2"];
// Create a view and register its map and reduce functions:
let phoneView = db.viewNamed("phones")
phoneView.setMapBlock({ (doc, emit) in
    if let phones = doc["phones"] as? [String] {
        for phone in phones {
            emit(phone, doc["name"])
        }
    }
}, reduceBlock: { (keys, values, rereduce) in
    if rereduce {
        return CBLView.totalValues(values) // re-reduce mode adds up counts
    } else {
        return values.count
    }
}, version: "2")
// Create a view and register its map and reduce functions:
View phoneView = database.getView("phones");
phoneView.setMapReduce(new Mapper() {
    @Override
    public void map(Map<String, Object> document, Emitter emitter) {
        List<String> phones = (List) document.get("phones");
        for (String phone : phones) {
            emitter.emit(phone, document.get("name"));
        }
    }
}, new Reducer() {
    @Override
    public Object reduce(List<Object> keys, List<Object> values, boolean rereduce) {
        if (rereduce) {
            return View.totalValues(values);
        } else {
            return new Integer(values.size());
        }
    }
}, "2");
// Create a view and register its map and reduce functions:
View phoneView = database.getView("phones");
phoneView.setMapReduce(new Mapper() {
    @Override
    public void map(Map<String, Object> document, Emitter emitter) {
        List<String> phones = (List) document.get("phones");
        for (String phone : phones) {
            emitter.emit(phone, document.get("name"));
        }
    }
}, new Reducer() {
    @Override
    public Object reduce(List<Object> keys, List<Object> values, boolean rereduce) {
        if (rereduce) {
            return View.totalValues(values);
        } else {
            return new Integer(values.size());
        }
    }
}, "2");
// Create a view and register its map and reduce functions:
let phoneView = database.GetView("phones");
phoneView.SetMapReduce((doc, emit) =>
{
    var phones = doc["phones"].AsList<string>();
    foreach(var phone in phones) 
    {
        emit(phone, doc["name"]);
    }
}, (keys, values, rereduce) => {
    if (rereduce) 
    {
        return View.TotalValues(values.ToList());
    } 
    else 
    {
        return values.ToList().Count;
    }
}, "2.0");

When the rereduce flag is off, this just counts the raw values as before. But when the flag is on, it knows it's been given an array of row counts, so it invokes the totalValues method to add them up.

Now that you know how re-reduce works, we should let you know that Couchbase Lite 1.0 doesn't actually use re-reduce -- your reduce function will always be given index rows, never already-reduced values. The rereduce parameter is in the API for future expansion, because in the future Couchbase Lite will use it. For now, it's up to you whether you want to ignore re-reduce (and maybe find that your reduce function breaks in the future) or code defensively and implement it now even though it isn't used yet.

Rules for the reduce function

The reduce function has the same restrictions as the map function (see above): It must be a "pure" function that always produce the same output given the same input. It must not have side effects. And it must be thread-safe. In addition:

  • Its output should be no larger than its input. Usually this comes naturally. But it is legal to return an array or dictionary, and sometimes people have tried to make reduce functions that transform the input values without actually making them any smaller. The problem with this is that it scales badly, and as the size of the index grows, the indexer will eventually run out of memory and fail.

Development considerations

Map function design

When to emit a whole document as the value? In some places you'll see code that does something like emit(key, doc) , i.e. emitting the document's entire body as the value. (Some people seem to do this by reflex whenever they don't have a specific value in mind.) It's not necessarily bad, but most of the time you shouldn't do it. The benefit is that, by having the document's properties right at hand when you process a query row, it can make querying a little bit faster (saving a trip to the database to load the document.) But the downside is that it makes the view index a lot larger, which can make querying slower. So whether it's a net gain or loss depends on the specific use case. We recommend that you just set the value to null if you don't need to emit any specific value.

Is it OK is the same key is emitted more than once? The index allows duplicate keys, whether emitted by the same document or different documents. A query will return all of those key/value pairs if they match. They'll be sorted by the ID of the document that was responsible for emitting them; if a doc emits the same key multiple times, the order is undefined.

When is the map function called? View indexes are updated on demand when queried. So after a document changes, the next query made to a view will cause that view's map function to be called on the doc's new contents, updating the view index. (But remember that you shouldn't write any code that makes assumptions about when map functions are called.)

If a document has conflicts, which conflicting revision gets indexed? The document's currentRevision, sometimes called the "winning" revision, is the one that you see in the API if you don't request a revision by ID.

Performance

How to improve your view indexing: The main thing you have control over is the performance of your map function, both how long it takes to run and how many objects it allocates. Try profiling your app while the view is indexing and see if a lot of time is spent in the map function; if so, optimize it. See if you can short-circuit the map function and give up early if the document isn't a type that will produce any rows. Also see if you could emit less data. (If you're emitting the entire document as a value, don't.)