Data Model

Data Model

Data in Couchbase Server can be either JSON documents or binary format.

Document Data Model

Documents are stored in Couchbase Server in JSON format, a simple, lightweight notation that is compact and human readable. JSON supports both basic data types like numbers and strings, and complex types such as embedded dictionaries and arrays.

There are several advantages to using JSON as a data format in Couchbase. JSON is the lingua franca of data interchange in mobile applications and on the web: it is the most common REST API return type by a large margin. Because of this popularity, JSON is easy and efficient to create and consume from any programming language. Serialization and deserialization are very fast. JSON is native to JavaScript, which makes it extremely convenient for web application programming.

In Couchbase, a document usually represents a single instance of an object in the application code. Couchbase documents are like rows in a relational table: each one is a record. Each attribute is like a column in the table.

Unlike a relational table however, documents of varying schemas can be stored in Couchbase. If an application wishes to distinguish between multiple ‘types’, several approaches can be taken (such as adding a ‘type’ field within a document, as shown below).

Couchbase documents themselves can contain nested structures (sub-documents). This allows developers to naturally express many-to-many relationships without requiring a "reference table" or "junction table". It is also expressive for naturally hierarchical data.

To understand the document model in practice, think about an online flight booking application that lets a user search for flights by dates. In a relational data model, there would be tables for airlines, flights, and schedules among many others that might look something like the picture below.
Figure 1. Relational model for flight schedules

In Couchbase however, the document model would likely be a single route object that embeds an array of schedules for all flights between each two airports.
Figure 2. Document data model for flight bookings

In contrast to the relational model, each route document is self-contained. A single request to one Couchbase node can return all the information needed to fulfill the application request. The relative independence of documents has important implications for scalability and latency. For example, documents can be easily replicated or atomically changed without interfering with any other document. This allows Couchbase to scale horizontally by employing a cluster of commodity hardware without introducing contention or requiring nodes to coordinate in order to fulfill application requests.

Dynamic Schema

In Couchbase, when we refer to a schema we refer to the way the application structures its documents. As opposed to traditional RDBMS, schemas in Couchbase are entirely defined and managed by the application.

Couchbase documents get their schemas solely from the structure of their JSON. Instead of the database enforcing the schema and requiring all data to be uniform, Couchbase (and NoSQL in general), passes this control to the application and relaxes it at the database level. This design allows variation among documents, even documents of the same type. Consider a large online retail catalog that contains a wide variety of office products, such as pens and laser printers. Although many properties such as size or weight would apply to both items, only some documents need to have properties like "connectivity type" and "double sided printing." With Couchbase Server, there’s no need to create a multitude of null properties or to define hundreds of different schemas to handle such data – the necessary properties can be included in documents that need them and simply left out of the ones that don’t. Flexible schema documents are also well suited to data of varying size, such as the flight booking example above: it makes no difference that some route documents contain thousands of scheduled flights and others have only a few.

Since a schema in Couchbase is a logical construct defined by the application, schemas are therefore dynamic - they change along with your application, allowing it to be truly agile. For programmers, an important benefit of flexible schemas is that objects can be defined once in application code rather than needing to be defined and kept in sync in both application code and separately in a database schema. Unlike a traditional RDBMS with a fixed schema, a programmer can add properties or structures to some documents without going back and adding them to other documents of the same type. In this way, applications can evolve without requiring programmers to alter the structure of underlying tables, manage schema versioning, or suffer other headaches of schema migrations.

The advantage to an application-defined schema is clear in large and long-lived applications. Developers can change application code and have the updated schema immediately stored and accessible. Documents can seamlessly evolve over time as the web application changes.

Document Design Considerations

Flexible documents make schema evolution easier, but it’s still necessary to design the JSON for best performance and scalability. The main question is how much information to include in a single document. Is it better to have fewer, richer documents that embed complex information, or more, simpler documents that refer to other independently maintained simple documents? To answer this, consider the information access patterns and how objects are managed in application code. Grouping properties together that must be accessed or written at the same time generally yields the best scalability and performance: all information is read or written in a single operation, atomicity is improved because all mutations occur simultaneously, and scalability improves because there are fewer relations between objects. Grouping properties together also ensures that they remain consistent with each other - logically related properties, so long as they exist in the same document, can be read and updated atomically. On the other hand, multiple simple documents generally should be favored when access patterns are predictable and the size of the information transported needs to be kept as small as possible to reduce network usage.

Documents can also establish basic relationships with other documents. Using JOIN you can easily retrieve documents and related documents via N1QL, the Couchbase query (SQL-like) language. Having this relational capacity built-in to N1QL makes referring to other documents significantly simpler in Couchbase as opposed to other NoSQL databases. For a full discussion of Couchbase document design, see Data Modeling Basics.

Normalization and Denormalization

For performance reasons, Couchbase encourages users to keep together data that is accessed together, even though it may lead to duplication, repeating values, or non-atomic values. Relaxing normalization constraints yields better scalability and query performance at the minor cost of possibly increasing the data size.