Wrapping up our Data Modeling Guide, we talk about standardization and optimization of your documents and modeling and share some resources to help you dive deeper.
We have discussed about docType
in the earlier section above. Please refer Document Key Prefixing section in this document for details on docType.
Every Document ID can be a combination of two or more parts/values, that should be delimited by a character such as a colon or an underscore. Pick a delimiter, and be consistent throughout your enterprise.
It is a best practice to only use a single-byte delimiter since this can make a significant difference based on the volume of data.
Applications are typically versioned using Semantic Versioning, i.e. 2.5.1. Where:
Versioning the application informs users of features, functionality, updates, etc. The term "schemaless", is often associated with NoSQL, while this is technically correct, it is better stated as:
Note: "There is no schema managed by the database, however, there is still a schema, and it is an "Application Enforced Schema." The application is now responsible for enforcing the schema as well as maintaining the integrity of the data and relationships".
As schemas change and evolve, documenting the version of the schema provides a mechanism of notifying applications about the schema version of the document that they're working with. This also enables a migration path for updating models which is discussed further in the Schema Versioning section.
{
"_type": "user",
"_schema": "1.2",
"userId": 123
}
Please refer to Document Management Strategies document for a more thorough discussion of schema versioning.
The use of a leading _
creates a standardized approach to global attributes across all documents within the enterprise.
{
"_type": "user",
"_schema": "1.2",
"_created": 1544734688923
"userId": 123
}
The same can be applied through a top-level property i.e. "meta": {}
.
{
"meta": {
"type": "user",
"schema": "1.2",
"created": 1544734688923
},
"userId": 123
}
Choose an approach that works within your organization and be consistent throughout your applications.
JSON gives us a flexible schema, that allows our models to rapidly adapt to change, this is because the schema is explicitly stored alongside each value. Whereas, in an RDBMS the schema is defined by the table columns, which are defined once. In any database, every byte of stored data adds up, historically this has been abstracted from developers as the schema and the database are managed by a DBA. With an application enforced schema, the model size is now controlled by the application. As developers we tend to be overly verbose when describing variables throughout our applications, this practice tends to carry over to our JSON models. While it is generally preferred to maintain human-readable field names for developer productivity, there are often well-understood abbreviations for many fields that will not reduce document readability.
As a general approach, consider the following options to proactively reduce document sizes:
YYYY-MM-DD
instead of MMM DD, YYYY
.null
, empty String/Array/Object
, or a known default.It is very common in almost any application, there is a need to store a date. This could be when the document was created, modified, when an order was placed, etc. Generally, this date is stored in ISO-8601 format.
Take the date 2018-12-14T03:45:24.478Z
as an example, this is very readable, but is it the most efficient way to store the date? Storing this same date as Unix Epoch Time we can represent this same date as 1544759124478
. ISO-8601 is 24 bytes, where epoch format is 13 bytes, this saves 11 bytes. This might not seem like a lot, but consider this scenario: 500,000,000 documents and each document has an average of 2 date properties. If we used epoch format, we'd save 11,000,000,000 bytes or 11Gb of space.
Now, take this a step further and ask the question, "What level of precision does the application require?". Often times we do not need millisecond precision, we can divide the epoch date accordingly for seconds, minutes, hours, etc. This applies if dates are being stored in Epoch format.
Epoch Date | Precision | Reduction | Output | Length / Bytes |
---|---|---|---|---|
1544759124478 | milliseconds | n/a |
1544759124478 | 13 |
1544759124478 | seconds | / 1000 |
1544759124 | 10 |
1544759124478 | minutes | /1000 / 60 |
25745985 | 8 |
1544759124478 | hours | /1000 / 60 / 60 |
429099 | 6 |
1544759124478 | days | /1000 / 60 / 60 / 24 |
17879 | 5 |
Please refer Document Management Strategies guide for a more in-depth discussion of this topic.