Merge partitions in Azure Cosmos DB (preview)

APPLIES TO: NoSQL MongoDB

Merging partitions in Azure Cosmos DB (preview) allows you to reduce the number of physical partitions used for your container in place. With merge, containers that are fragmented in throughput (have low RU/s per partition) or storage (have low storage per partition) can have their physical partitions reworked. If a container's throughput has been scaled up and needs to be scaled back down, merge can help resolve throughput fragmentation issues. For the same amount of provisioned RU/s, having fewer physical partitions means each physical partition gets more of the overall RU/s. Minimizing partitions reduces the chance of rate limiting if a large quantity of data is removed from a container and RU/s per partition is low. Merge can help clear out unused or empty partitions, effectively resolving storage fragmentation problems.

Getting started

To get started using partition merge, navigate to the Features page in your Azure Cosmos DB account. Select and enable the Partition merge (preview) feature.

Before enabling the feature, verify that your Azure Cosmos DB account(s) meet all the preview eligibility criteria. Once you've enabled the feature, it takes 15-20 minutes to take effect.

Caution

When merge is enabled on an account, only requests from .NET SDK version >= 3.27.0 or Java SDK >= 4.42.0 or Azure Cosmos DB Spark connector >= 4.18.0 will be allowed on the account, regardless of whether merges are ongoing or not. Requests from other SDKs (older .NET SDK, older Java SDK, any JavaScript SDK, any Python SDK, any Go SDK) or unsupported connectors (Azure Data Factory, Azure Search, Azure Functions, Azure Stream Analytics, and others) will be blocked and fail. Ensure you have upgraded to a supported SDK version before enabling the feature. After the feature is enabled or disabled, it may take 15-20 minutes to fully propagate to the account. If you plan to disable the feature after you've completed using it, it may take 15-20 minutes before requests from SDKs and connectors that are not supported for merge are allowed.

Screenshot of Features pane and Partition merge feature.

To check whether an Azure Cosmos DB account is eligible for the preview, you can use the built-in eligibility checker in the Azure portal. From your Azure Cosmos DB account overview page in the Azure portal, navigate to Diagnose and solve problems -> Throughput and Scaling -> Partition Merge. Run the Check eligibility for partition merge preview diagnostic.

Screenshot of Throughput and Scaling content in Diagnose and solve issues page.

Screenshot of merge eligibility check with table of all preview eligibility criteria.

How to identify containers to merge

Containers that meet both of these conditions are likely to benefit from merging partitions:

  • Condition 1: The current RU/s per physical partition is <3000 RU/s
  • Condition 2: The current average storage in GB per physical partition is <20 GB

Condition 1 often occurs when you've previously scaled up the RU/s (often for a data ingestion) and now want to scale down in steady state. Condition 2 often occurs when you delete/TTL a large volume of data, leaving unused partitions.

Condition 1

To determine the current RU/s per physical partition, from your Cosmos account, navigate to Metrics. Select the metric Physical Partition Throughput and filter to your database and container. Apply splitting by PhysicalPartitionId.

For containers using autoscale, this metric shows the max RU/s currently provisioned on each physical partition. For containers using manual throughput, this metric shows the manual RU/s on each physical partition.

In the below example, we have an autoscale container provisioned with 5000 RU/s (scales between 500 - 5000 RU/s). It has five physical partitions and each physical partition has 1000 RU/s.

Screenshot of Azure Monitor metric Physical Partition Throughput in Azure portal.

Condition 2

To determine the current average storage per physical partition, first find the overall storage (data + index) of the container.

Navigate to Insights > Storage > Data & Index Usage. The total storage is the sum of the data and index usage. In the below example, the container has a total of 74 GB of storage.

Screenshot of Azure Monitor storage (data + index) metric for container in Azure portal.

Next, find the total number of physical partitions. This metric is the distinct number of PhysicalPartitionIds in the PhysicalPartitionThroughput chart we saw in Condition 1. In our example, we have five physical partitions.

Finally, calculate: Total storage in GB / number of physical partitions. In our example, we have an average of (74 GB / five physical partitions) = 14.8 GB per physical partition.

Based on conditions 1 and 2, our container can potentially benefit from merging partitions.

Merging physical partitions

In PowerShell, when the flag -WhatIf is passed in, Azure Cosmos DB runs a simulation and return the expected result of the merge. This result is returned despite the merge itself not running. When the flag isn't passed in, the merge executes against the resource. When finished, the command outputs the current amount of storage in KB per physical partition post-merge.

Tip

Before running a merge, it's recommended to set your provisioned RU/s (either manual RU/s or autoscale max RU/s) as close as possible to your desired steady state RU/s post-merge, to help ensure the system calculates an efficient partition layout.

Use Install-Module to install the Az.CosmosDB module with prerelease features enabled.

$parameters = @{
    Name = "Az.CosmosDB"
    AllowPrerelease = $true
    Force = $true
}
Install-Module @parameters

Use Invoke-AzCosmosDBSqlContainerMerge with the -WhatIf parameter to preview the merge without actually performing the operation.

$parameters = @{
    ResourceGroupName = "<resource-group-name>"
    AccountName = "<cosmos-account-name>"
    DatabaseName = "<cosmos-database-name>"
    Name = "<cosmos-container-name>"
    WhatIf = $true
}
Invoke-AzCosmosDBSqlContainerMerge @parameters

Start the merge by running the same command without the -WhatIf parameter.

$parameters = @{
    ResourceGroupName = "<resource-group-name>"
    AccountName = "<cosmos-account-name>"
    DatabaseName = "<cosmos-database-name>"
    Name = "<cosmos-container-name>"
}
Invoke-AzCosmosDBSqlContainerMerge @parameters

Monitor merge operations

Partition merge is a long-running operation and there's no SLA on how long it takes to complete. The time depends on the amount of data in the container and the number of physical partitions. It's recommended to allow at least 5-6 hours for merge to complete.

While partition merge is running on your container, it isn't possible to change the throughput or any container settings (TTL, indexing policy, unique keys, etc.). Wait until the merge operation completes before changing your container settings.

You can track whether merge is still in progress by checking the Activity Log and filtering for the events Merge the physical partitions of a MongoDB collection or Merge the physical partitions of a SQL container.

Limitations

The following are limitations of the merge feature at this time.

Preview eligibility criteria

To enroll in the preview, your Azure Cosmos DB account must meet all the following criteria:

  • Your Azure Cosmos DB account uses API for NoSQL or MongoDB with version >=3.6.
  • Your Azure Cosmos DB account is using provisioned throughput (manual or autoscale). Merge doesn't apply to serverless accounts.
    • Currently, merge isn't supported for shared throughput databases. You may enroll an account that has both shared throughput databases and containers with dedicated throughput (manual or autoscale).
    • However, only the containers with dedicated throughput are able to be merged.
  • Your Azure Cosmos DB account is a single-write region account (merge isn't currently supported for multi-region write accounts).
  • Your Azure Cosmos DB account doesn't use any of the following features:
  • Your Azure Cosmos DB account uses bounded staleness, session, consistent prefix, or eventual consistency (merge isn't currently supported for strong consistency).
  • If you're using API for NoSQL, your application must use the Azure Cosmos DB .NET v3 SDK (version 3.27.0 or higher) or Java v4 SDK (version 4.42.0 or higher). When merge preview is enabled on your account, the account doesn't accept requests sent from non .NET/Java SDKs or older .NET/Java SDK versions.
    • There are no SDK or driver requirements to use the feature with API for MongoDB.
  • Your Azure Cosmos DB account doesn't use any currently unsupported connectors:
    • Azure Data Factory
    • Azure Stream Analytics
    • Logic Apps
    • Azure Functions
    • Azure Search
    • Azure Cosmos DB Spark connector
    • Any third party library or tool that has a dependency on an Azure Cosmos DB SDK that isn't .NET v3 SDK >= v3.27.0 or Java v4 SDK >= 4.42.0

Account resources and configuration

  • Merge is only available for API for NoSQL and MongoDB accounts. For API for MongoDB accounts, the MongoDB account version must be 3.6 or greater.
  • Merge is only available for single-region write accounts. Multi-region write account support isn't available.
  • Accounts using merge functionality can't also use these features (if these features are added to a merge enabled account, the account can't merge resources):
  • Containers using merge functionality must have their throughput provisioned at the container level. Database-shared throughput support isn't available.
  • Merge is only available for accounts using bounded staleness, session, consistent prefix, or eventual consistency. It isn't currently supported for strong consistency.
  • After a container has been merged, it isn't possible to read the change feed with start time. Support for this feature is planned for the future.

SDK requirements (API for NoSQL only)

Accounts with the merge feature enabled are supported only when you use the latest version of the .NET v3 SDK or Java v4 SDK. When the feature is enabled on your account (regardless of whether you run the merge), you must only use the supported SDK using the account. Requests sent from other SDKs or earlier versions aren't accepted. As long as you're using the supported SDK, your application can continue to run while a merge is ongoing.

Find the latest version of the supported SDK:

SDK Supported versions Package manager link
.NET SDK v3 >= 3.27.0 https://www.nuget.org/packages/Microsoft.Azure.Cosmos
Java SDK v4 >= 4.42.0 https://mvnrepository.com/artifact/com.azure/azure-cosmos

Support for other SDKs is planned for the future.

Tip

You should ensure that your application has been updated to use a compatible SDK version prior to enrolling in the preview. If you're using a legacy SDK, follow the appropriate migration guide:

Unsupported connectors

If you enroll in the preview, the following connectors fail.

  • Azure Data Factory ¹
  • Azure Stream Analytics ¹
  • Logic Apps ¹
  • Azure Functions ¹
  • Azure Search ¹
  • Azure Cosmos DB Spark connector ¹
  • Any third party library or tool that has a dependency on an Azure Cosmos DB SDK that isn't .NET v3 SDK >= v3.27.0 or Java v4 SDK >= 4.42.0

¹ Support for these connectors is planned for the future.

Next steps