Phases of Data Modeling

Phases of Data Modeling

A data modeling exercise typically consists of two phases: logical data modeling and physical data modeling. Logical data modeling focuses on describing your entities and relationships. Physical data modeling takes the logical data model and maps the entities and relationships to physical containers.

Logical Data Modeling

The logical data modeling phase focuses on describing your entities and relationships. Logical data modeling is done independently of the requirements and facilities of the underlying database platform.

At a high level, the outcome of this phase is a set of entities (objects) and their attributes that are central to your application’s objectives, as well as a description of the relationships between these entities. For example, entities in an aerospace application might be "satellite", "module" and "instrument," where their relationships might be "satellites carry many modules, which in turn are made up of many instruments".

Lets look at some of the key definitions you need from your logical data modeling exercise:
  • Entity keys: Each entity instance is identified by a unique key. The unique key can be a composite of multiple attributes or a surrogate key generated using a counter or a UUID generator. Composite or compound keys can be utilized to represent immutable properties and efficient processing without retrieving values. The key can be used to reference the entity instance from other entities for representing relationships.
  • Entity attributes: Attributes can be any of the basic data types such as string, numeric, or Boolean, or they can be an array of these types. For example, a satellite might define a number of simple attributes such as name and weight, as well as a complex attribute called launch which in turn contains the attributes launch-date and launch-site.
  • Entity relationships: Entities can have 1-to-1, 1-to-many, or many-to-many relationships. For example, "a satellite has many modules" is a 1-to-many relationship.

You can find various methods of data modeling at https://en.wikipedia.org/wiki/Data_modeling.

Physical Data Modeling

The physical data model takes the logical data model and maps the entities and relationships to physical containers.

In Couchbase Server, items are used to store associated values that can be accessed with a unique key. Couchbase Server also provides buckets to group items. Based on the access patterns, performance requirements, and atomicity and consistency requirements, you can choose the type of container(s) to use to represent your logical data model.

The data representation and containment in Couchbase Server is drastically different from relational databases. The following table provides a high level comparison to help you get familiar with Couchbase Server containers.
Table 1. Data representation and containment in Couchbase Server versus relational databases
Couchbase Server Relational databases
Buckets Databases
Buckets or Items (with type designator attribute) Tables
Items (key-value or document) Rows
Index Index
Items
Items consist of a key and a value. A key is a unique identifier within the bucket. Value can be a binary or a JSON document. You can mix binary and JSON values inside a bucket.
  • Keys

    Each value (binary or JSON) is identified by a unique key. The key is typically a surrogate key generated using a counter or a UUID generator. Keys are immutable. Thus, if you use composite or compound keys, ensure that you use attributes that don’t change over time.

  • Values
    • Binary values: Binary values can be used for high performance access to compact data through keys. Encrypted secrets, IoT instrument measurements, proplarity session states, or other not-human-readable data are typical cases for binary data. However, using binary values limits the functionality your application can take advantage of, ruling out indexing and querying in Couchbase Server as binary values have a proprietary representation.
    • JSON values: JSON provides rich representation for entities. Couchbase Server can parse, index and query JSON values. JSON provide a name and a value for each attribute. You can find the JSON definition at RFC 7159 or at ECMA 404.
      The JSON document attributes can represent both basic types such as number, string, Boolean, and complex types including embedded documents and arrays. In the examples below, a1 and a2 represent attributes that have a numeric and string value respectively, a3 represents an embedded document, and a4 represents an array of embedded documents.
      {  
         "a1":number,
         "a2":"string",
         "a3":{
            "b1":[ number, number, number ]
         },
         "a4":[
            { "c1":"string", "c2":number },
            { "c1":"string", "c2":number }
         ]
      }

      The powerful range of representations enabled by JSON in Couchbase Server documents allows developers the utmost flexibility in modeling their entities and relationships.

Buckets
Couchbase Server also provides a container called a bucket to group items. Buckets are primarily used to control resource allocation and to define security and storage properties. For data with differing caching and RAM quota needs, compaction requirements, availability requirements and IO priorities, buckets act as the control boundary.

For example, you choose to create one replica for medical-codes data that contain drug, symptom and operation codes for a standard based electronic health record. This data can be recovered easily from other sources, so a single replica may be fine. However, patient data may require higher protection with 2 replicas. To achieve better protection for patient data without wasting additional space for medical-codes you could choose separate buckets for these 2 types of information.

It is important to note that buckets can mix binary and document-based items. All keys within the bucket need to be unique regardless of the value type (binary or JSON document).