Document design considerations

Document design considerations

When you work on document design, there are a few other considerations you should have in mind. This will help you determine whether you use one or more documents to represent something in your application. It will also help you determine how and when you provide references to show relationships between multiple documents. Consider:

  • Whether you will represent the items as separate objects.

  • Whether you want to access the objects together at run time.

  • If you want some data to be atomic; that is, changes occur at once to this data, or the change fails and will not made.

  • Whether you will index and query data through views , which are stored functions you use to find, extract, sort, and perform calculations on documents in Couchbase Server. For more information see Finding data with views .

The following provides some guidelines on when you would prefer using one or more than one document to represent your data.

When you use one document to contain all related data you typically get these benefits:

  • Application data is denormalized.

  • Can read/write related information in one operation.

  • Eliminate need for client-side joins.

  • If you put all information for a transaction in a single document, you can better guarantee atomicity since any changes will occur to a single document at once.

When you provide a single document to represent an entire entity and any related records, the document is known as an aggregate . You can also choose to use separate documents for different object types in your application. This approach is known as denormalization in NoSQL database terms. In this case you provide cross references between objects as we demonstrated earlier in the beer-brewery documents. You typically gain the following from separate documents:

  • Reduce data duplication.

  • May provide better application performance and scale by keeping document size smaller.

  • Application objects do not need to be in same document; separate documents may better reflect the objects as they are in the real world.

The following examples demonstrate the use of a single document compared to separate documents for a simple blog. In the blog application a user can create an entry with title and content. Other users can add comments to the post. In the first case, we have a single JSON document to represent a blog post, plus all the comments for the post:

{ "post_id": "dborkar_Hello_World", "author": "dborkar", "type": "post" "title": "Hello World", "format": "markdown", "body": "Hello from [Couchbase](", "html": "<p>Hello from <a href=\"http: … "comments":[ {"format": "markdown", "body":"Awesome post!"}, {"format”: "markdown", "body":"Like it." } ] } 

The next JSON documents show the same blog post, however we have split the post into the actual entry document and a separate comment document. First is the core blog post document as JSON. Notice we have a reference to two comments under the comments key and two values in an array:

{ "post_id": "dborkar_Hello_World", "author": "dborkar", "type": "post", "title": "Hello World", "format": "markdown", "body": "Hello from [Couchbase](", "html": “<p>Hello from <a href="http: …"> "comments" : ["comment1_jchris_Hello_world", "comment2_kzeller_Hello_World"] } 

The next document contains the first actual comment that is associated with the post. It has the key comment_id with the first value of ‘comment1_dborkar_Hello_world’; this value serves as a reference back to the blog post it belongs to:

{ "comment_id": "comment1_dborkar_Hello_World", "format": "markdown", "body": "Awesome post!" } 

The next example demonstrates our beer and breweries example as single and separate documents. If we wanted to use a single-document approach to represent a beer, it could look like this in JSON:

{ "beer_id": 10.0, "name": "Hoptimus Prime", "category": "North American Ale", "style": "Imperial or Double India Pale Ale", "brewery": "Legacy Brewing Co." : { "address1" : "Easy Peasy St.", "address2" : "Suite 4", "city" : "Baltimore", "state" : "Maryland", "zip" : "21215", "capacity" : 10000, }, "updated": [2010, 7, 22, 20, 0, 20], "available": true } 

In this case we provide information about the brewery as a subset of the beer. But consider the case where we have more than one beer from the brewery, in this case:

{ "beer_id": 12.0, "name": "Pleny the Hipster", "category": "Wheat Beer", "style": "Koelsch", "brewery": "Legacy Brewing Co." : { "address1" : "Easy Peasy St.", "address2" : "Suite 4", "city" : "Baltimore", "state" : "Maryland", "zip" : "21215", "capacity" : 10000, }, "updated": [2011, 8, 2, 20, 0, 20], "available": true } 

Here we are starting to develop duplicate information because we have the same brewery information in each beer document. In this case it makes sense to separate the brewery and beers as different documents and relate them through fields. The revised, separate beer document appears below. Notice we have added a new field to represent the brewery and provide the brewer id:

{ "beer_id": 10.0, "name": "Hoptimus Prime", "category": "North American Ale", "style": "Imperial or Double India Pale Ale", "brewery" : "leg_brew_10" "updated": [2010, 7, 22, 20, 0, 20], "available": true } 

And here is the associated brewery as a separate brewery document. In this case, we may simplify the document structure since it is separate from the beer data, and provide all the brewery information at the same level:

{ "brewery_id" : "leg_brew_10", "name": "Legacy Brewing Co.", "address1" : "Easy Peasy St.", "address2" : "Suite 4", "city" : "Baltimore", "state" : "Maryland", "zip" : "21215", "capacity" : 10000, }