A data modeling exercise typically consists of two phases: logical data modeling and physical data modeling. Logical data modeling focuses on describing your entities and relationships. Physical data modeling takes the logical data model and maps the entities and relationships to physical containers.
The logical data modeling phase focuses on describing your entities and relationships. Logical data modeling is done independently of the requirements and facilities of the underlying database platform.
At a high level, the outcome of this phase is a set of entities (objects) and their attributes that are central to your application's objectives, as well as a description of the relationships between these entities. For example, entities in an order management application might be users, orders, order items and products where their relationships might be "users can have many orders, and in turn each order can have many items".
Lets look at some of the key definitions you need from your logical data modeling exercise:
Lets look at a highly simplified Order Management System as an example.
In the below diagram: Order embeds Items, and refs external Product (1:n) and Paytype (1:1) docs.
In the below diagram: Order embeds Paytype and refs Items which embeds Product.
Logical data modeling starts with a decision on how to map your entities to documents. JSON documents provide great flexibility in mapping 1-to-1, 1-to-many or many-to-many relationships.
At one end, you can model each entity to its own document with references to represent relationships. At the other end, you can embed all related entities into a single large document. However, the right design for your application usually lies somewhere in between. Exactly how you should balance these alternatives depends on the access patterns and requirements of your application.
Lets take a look at the example of a stock management system to track Couchbase-branded swag.
Let's imagine the standard path is:
Embedding:
If we chose to embed all the data in one document, we might end up with something like this:
{
"orderID": 200,
"customer": {
"name": "Steve Rothery",
"address": "11-21 Paul Street",
"city": "London"
},
"products": [
{
"itemCode": "RedTShirt",
"itemName": "Red Couchbase t-shirt",
"supplier": "Lovely t-shirt company",
"location": "warehouse 1, aisle 3, location 4",
"quantityOrdered": 3
},
{
"itemCode": "USB",
"supplier": "Memorysticks Foreva",
"itemName": "Black 8GB USB stick with red Couchbase logo",
"location": "warehouse 1, aisle 42, location 12",
"quantityOrder": 51
}
],
"status": "paid"
}
Here, everything we need to fulfill the order is stored in one document. Despite having separate customer profile and item details documents, we replicate parts of their data in the order document. This might seem wasteful or even dangerous, if you're coming from the relational world. However, it's quite normal for a document database. As we saw earlier, document databases operate around the idea that one document could store everything you need for a particular situation.
There are, though, some trade-offs to embedding data like this.
First, let's look at what's potentially bad:
So, what benefits does embedding give us? Mostly, it gives us:
When to embed:
You might want to embed data when:
Why are we asking whether reads outnumber writes?
In our example above, each time someone reads our order they're also likely to update the state of the order:
So, here the reads and writes are likely to be fairly balanced.
Imagine, though, that we add a blog to our swag management system and then write a post about our new Couchbase branded USB charger. We'd make two, maybe three, writes to the document while finessing our post. Then, for the rest of that document's lifetime, it'd be all reads. If the post is popular, we could see a hundred or thousand times the number of reads compared to writes.
As the benefits of embedding come at read-time, and the risks mostly at write-time, it seems reasonable to embed all the contents of the blog post page in one document rather than, for example, pull in the author details from a separate profile document.
There's another compelling reason to embed data:
In our swag order above, we're using the customer's address as the despatch address. By embedding the despatch address, as we are, we can easily offer the option to choose a different despatch address for each order. We also get a historic record of where each order went even if the customer later changes the address stored in their profile.
Referring:
Another way to represent our order would be to refer to the user profile document and stock item details document but not to pull their contents into the order document.
Let's imagine our customer profiles are keyed by the customer's email address and our stock items are keyed by a stock code. We can use those to refer to the original documents:
{
"orderID": 200,
"customer": "steve@gmail.com",
"products": [
{
"itemCode": "RedTShirt",
"quantityOrdered": 3
},
{
"itemCode": "USB",
"quantityOrder": 51
}
],
"status": "paid"
}
When we view Steve's order, we can fill in the details with three more reads: his user profile (keyed by the email address) and the stock item details (keyed by their item codes).
It requires three additional reads but it gives us some benefits:
There are also disadvantages:
When to Refer:
Referring to canonical instances of documents is a good default when modeling with Couchbase. You should be especially keen to use referrals when:
That last point is particularly important where your documents have an unbound potential for growth.
Imagine we were storing activity logs related to each user of our system. Embedding those logs in the user profile could lead to a rather large document.
It's unlikely we'd breach Couchbase's 20 MB upper limit for an individual document but processing the document on the application side would be less efficient as the log element of the profile grows. It'd be much more efficient to refer to a separate document, or perhaps paginated documents, holding the logs.
If... | Then Consider... |
---|---|
Relationship is 1:1 or 1:many | Nest related data as nested objects |
Relationship is many:1 or many:many | Refer to related data as separate docs |
Reads are mostly parent fields | Refer to children as separate docs |
Reads are mostly parent+child fields | Nest children as nested objects |
Writes are mostly either parent or child | Refer to children as separate docs |
Writes are mostly both parent and child | Nest children as nested objects |
The physical data model takes the logical data model and maps the entities and relationships to physical containers.
In Couchbase Server, items are used to store associated values that can be accessed with a unique key. Couchbase Server also provides buckets to group items. Based on the access patterns, performance requirements, and atomicity and consistency requirements, you can choose the type of container(s) to use to represent your logical data model.
The data representation and containment in Couchbase Server is drastically different from relational databases. The following table provides a high level comparison to help you get familiar with Couchbase Server containers.
Data representation and containment in Couchbase Server versus relational databases:
Couchbase Server | Relational databases |
---|---|
Buckets | Databases |
Buckets or Items (with type designator attribute) | Tables |
Items (key-value or document) | Rows |
Index | Index |
Items consist of a key and a value. A key is a unique identifier within the bucket. Value can be a binary or a JSON document. You can mix binary and JSON values inside a bucket.
The JSON document attributes can represent both basic types such as number, string, Boolean, and complex types including embedded documents and arrays. In the examples below, a1 and a2 represent attributes that have a numeric and string value respectively, a3 represents an embedded document, and a4 represents an array of embedded documents.
{
"a1":number,
"a2":"string",
"a3":{
"b1":[ number, number, number ]
},
"a4":[
{ "c1":"string", "c2":number },
{ "c1":"string", "c2":number }
]
}