Filtering and extracting data

Filtering and extracting data

One of the simplest ways to learn about views is to create a basic map function which extracts data from entries. Imagine we have our own blog application and we want to provide a list of blog posts by title. First imagine what the JSON documents would look like for our blog posts:

{ "title":"Move Today", "body":"We just moved into a new big apartment in Mountain View just off of....", "date":"2012/07/30 18:12:10" } { "title":"Bought New Fridge", "body":"Our freezer broke down so ordered this new one on Amazon....", "date":"2012/09/17 21:13:39" } { "title":"Paint Ball", "body":"Had so much fun today when my company took the whole team out for...", "date":"2012/9/25 15:52:20" } 

Then we create our map function which will extract our blog post titles:

function(doc) { if(doc.title) { emit(doc.title, null); } } 

This function will look at a JSON document and if the document has a title attribute, it will include that title in the result set as a key. The null indicates no value should be provided in the result set. In reality if you look at all the details, a standard view function syntax is a bit more complex in Couchbase 2.1.0.

Here is how the map function appears when you provide full handling of all JSON document information:

function (doc, meta) { if (meta.type == "json" && doc.title && doc.date) { // Check if doc is JSON emit(doc.title, doc.date); } else { // do something with binary value } } 

As a best practice we want make sure that the fields we want to emit in our index actually exist before we emit it to the index. Therefore we have our map function within a conditional: if (doc.title && doc.date) . For instance, if we wanted to perform a views function that tried to emit doc.name.length we would get a “undefined reference” exception if the field does not exist and the view function would fail. By checking for the field we avoid these potential types of errors.

If you have ever looked at a view in Couchbase Admin Console, this map function will be more familiar. In Couchbase 2.1.0 we separate metadata about an entry such as expiration and the entry itself into two parts in a JSON document. So in our function we have the parameter meta for all document meta-data and doc as the parameter for document values, such as the title and blog text. Our function first looks at the metadata to determine if it is a JSON document by doing a if..else . If the document is JSON, the map function extracts the blog title and the date/time for the blog entry.

If the document is binary data, you would need to provide some code to handle it, but typically if you are going to query an index data, you would do so on JSON documents.

The emit() function takes two arguments: the first one is key , and the second one is value . The emit() creates an entry, or row, in our result set. You are able to call the emit function multiple times in a map function; this will create multiple entries in the result set from a single document. We will discuss that more in depth later.

Once you have your view functions, you store them to Couchbase Server and then query the view to get the result set. When you query your view, Couchbase Server takes the code in your view and runs it on every document persisted on disk . You store your map function as a string in a design document as follows:

{ "_id": "_design/blog", "language": "javascript", "views": { "titles": { "map": "function(doc, meta){ if (meta.type == "json" && doc.date && doc.title) { // Check if doc is JSON emit(doc.date, doc.title); } else { // do something with binary value } } } } 

All design documents are prefixed with the id _design/ and then your name for the design document. We store all view functions in the views attribute and name this particular view titles . Using a Couchbase SDK, you can read the design document in as a file from the file system and store the design document to the server. In this case we name our design document file blog.json :

client = Couchbase.connect("http://localhost:8091/pools/default/buckets/bucketName") client.save_design_doc(File.open('blog.json')) 

This code will create a Couchbase client instance with a connection to the bucket, bucketName . We then read the design document into memory and write it to Couchbase Server. At this point we can query the view and retrieve our map function results:

posts = client.design_docs['blog'] posts.views #=> ["titles"] posts.titles 

Couchbase Server will take each document on disk, determine if the document is JSON and then put the blog title and date into a list. Each row in that list includes a key and value:

KeyValue "2012/07/30 18:12:10" "Move Today" "2012/09/17 21:13:39" "Bought New Fridge" "2012/09/25 15:52:20" "Paint Ball" 

You may wonder how effective it is to run query your view if Couchbase Server will run it on every persisted document in the database. But Couchbase Server is designed to avoid duplicate work. It will run the function on all documents once, when you first query the view. For subsequent queries on the view Couchbase Server will recompute the keys and values only for documents that have changed.

When you query this view, Couchbase Server will send the list of all documents as JSON. It will contain the key, value and the document id, plus some additional metadata.