Understanding distributed relational databases

Azure Cosmos DB is a multiple-regionally distributed database platform for both NoSQL and relational databases of any scale. This article explores distributed relational databases in the context of Azure Cosmos DB’s relational API option.

For more information about other data storage options in Azure, see choosing the right data store in the Azure Architecture Center.

Challenges

Many times when you read about large volume or high transactional workloads, it’s easy to think that these workloads are much larger than any that your application may face. The assumption that your workload will stay small can be a safe assumption at the start of a project, idea, or organization. However, that assumption can quickly lead to a scenario where your application’s workload grows far beyond any predictions you have made. It’s not uncommon to hear stories of workloads that meet the maximum throughput or processing power of the single-instance database that was economical and performant at the start of a project.

Relational databases

Relational databases organize data into a tabular (row/column) format with relations between different tables in the databases. Relational databases are common in various enterprises. These enterprises often have a wealth of software developers who have written code against a relational database or administrators who design schemas and manage relational database platforms. Relational databases also often support transactions with ACID guarantees.

Unfortunately, many relational database systems are initially configured by organizations in a single-node manner with upper constraints on compute, memory, and networking resources. This context can lead to an incorrect assumption that all relational databases are single node by their very nature.

Distributed databases

With many cloud-native whitepapers, it’s common to hear about the benefits of NoSQL databases making it seem like relational databases aren't a reasonable choice for large scale databases or distributed workloads. While many distributed databases are non-relational, that are options out there for distributed relational database workloads.

Many of these options for distributed relational databases require your organization to plan for large scale and distribution from the beginning of the project. This planning requirement can add significant complexity at the start of a project to make sure all relevant server nodes are configured, managed, and maintained by your team. The planning, implementation, and networking requirements for a multiple-regionally distributed relational database can easily grow to be far more complex than standing up a single instance (or node).

Next steps

Want to get started with Azure Cosmos DB?