Enable fleet analytics in Azure Cosmos DB (preview)

Important

This feature is currently in preview and is provided without a service-level agreement. At this time, previews aren't recommended for production workloads. Certain features of this preview aren't supported or might have capability constraints. For more information, see supplemental terms of use for Azure previews.

In this guide, you enable Azure Cosmos DB fleet analytics for your Microsoft Fabric workspace.

In this guide, you enable Azure Cosmos DB fleet analytics for your Azure Data Lake Storage account.

Prerequisites

  • An existing Azure Cosmos DB fleet

    • If you don't have an existing a fleet, create a new fleet.

    • Fleet analytics only supports Azure Cosmos DB for NoSQL accounts that are configured with the fleet.

  • An existing Microsoft Fabric workspace

    • The workspace must use OneLake as the default storage location.

    • The workspace must be backed by a licensed or trial Fabric capacity.

  • An existing Azure Storage account compatible with Azure Data Lake Storage (Gen2)

    • The hierarchical namespace feature must be enabled at account creation.

Enable fleet analytics

First, configure the resources required for fleet analytics.

  1. Sign-in to the Azure portal (https://portal.azure.cn).

  2. Navigate to the existing Azure Cosmos DB fleet.

  3. On the page for the fleet, select Fleet analytics in the Monitoring section of the resource menu.

  4. Then, select Add destination.

  5. In the Fleet analytics dialog, select Send to Fabric workspace. Then, select your existing Fabric workspace, select an existing OneLake lakehouse, and then save the destination.

    Screenshot of the fleet analytics dialog for a Fabric workspace in the Azure portal.

  6. Navigate to your Fabric workspace in the Microsoft Fabric portal.

  7. In the Manage section of your workspace, add the principal for fleet analytics as the Contributor role by searching for the shared Cosmos DB Fleet Analytics service principal.

    Screenshot of the workspace role-based access control configuration in the Fabric portal.

    Important

    Failure to complete this step results in data not being written to your destination Fabric workspace.

  8. Save your changes.

  1. Sign-in to the Azure portal (https://portal.azure.cn).

  2. Navigate to the existing Azure Cosmos DB fleet.

  3. On the page for the fleet, select Fleet analytics in the Monitoring section of the resource menu.

  4. Then, select Add destination.

  5. In the Fleet analytics dialog, select Send to storage account. Then, select your existing Azure Storage account, select an existing container, and then save the destination.

    Screenshot of the fleet analytics dialog for an Azure Storage account in the Azure portal.

  6. Navigate to your Azure Storage account. Then, navigate to the Access Control (IAM) page.

  7. Select the Add role assignment menu option.

  8. On the Add role assignment page, select the Storage Blob Contributor role to grant permissions to contribute blobs to the existing account.

  9. Now, use the + Select members option. In the dialog, search for and select the shared Cosmos DB Fleet Analytics service principal.

    Screenshot of the account role-based access control configuration in the Azure portal.

    Important

    Failure to complete this step results in data not being written to your destination Azure Storage account.

  10. Review + assign your role assignment.

Query and visualize data

In a star schema design, retrieving detailed information typically requires joining fact tables with their related dimension tables, following standard best practices. This section walks through the steps to query and visualize data using Microsoft Fabric.

  1. Open your Fabric workspace.

  2. Navigate to your existing OneLake resource.

    Screenshot of an existing OneLake resource within a Microsoft Fabric workspace.

  3. In the SQL endpoint explorer, select any table and run a SELECT TOP 100 query to quickly observe the data. This query can be found in the context menu.

    Screenshot of the context menu with the 'SELECT TOP 100' query option in the SQL endpoint explorer within the workspace.

    Tip

    Alternatively, run the following query to view Account-level details:

    SELECT TOP (100) [Timestamp],
        [ResourceId],
        [FleetId],
        [DefaultConsistencyLevel],
        [IsSynapseLinkEnabled],
        [IsFreeTierEnabled],
        [IsBurstEnabled],
        [BackupMode],
        [BackupStrategy],
        [BackupRedundancy],
        [BackupIntervalInMinutes],
        [BackupRetentionIntervalInHours],
        [TotalRUPerSecLimit],
        [APISettings],
        [AccountKeySettings],
        [LastDateAnyAccountKeyRotated]
    FROM [FactAccountHourly]
    
  4. Observe the results of the query. Notice that you only have a reference to the ResourceId field. Using just the results of this query, you can't determine the exact database or container for individual rows.

  5. Run this sample query joining both the DimResource and FactRequestHourly tables to find your top 100 most active accounts by transactions.

    SELECT TOP 100
        DR.[SubscriptionId],
        DR.[AccountName],
        DR.[ResourceGroup],
        SUM(FRH.[TotalRequestCount]) AS sum_total_requests
    FROM 
        [FactRequestHourly] FRH
    JOIN 
        [DimResource] DR
        ON FRH.[ResourceId] = DR.[ResourceId]
    WHERE 
        FRH.[Timestamp] >= DATEADD(DAY, -7, GETDATE()) -- Filter for the last 7 days
        AND ResourceName IN ('Document', 'StoredProcedure') -- Filter for Dataplane Operations
    GROUP BY 
        DR.[AccountName],
        DR.[SubscriptionId],
        DR.[ResourceGroup]
    ORDER BY
    sum_total_requests DESC; -- Order by total requests in descending order
    
  6. Run this query to find the top 100 largest accounts by storage.

    SELECT TOP 100
        DR.[SubscriptionId],
        DR.[AccountName],
        MAX(FRH.[MaxDataStorageInKB] / (1024.0 * 1024.0)) AS DataUsageInGB,
        MAX(FRH.[MaxIndexStorageInKB] / (1024.0 * 1024.0)) AS IndexUsageInGB,
        MAX(
            FRH.[MaxDataStorageInKB] / (1024.0 * 1024.0) + 
            FRH.[MaxIndexStorageInKB] / (1024.0 * 1024.0)
        ) AS StorageInGB
    FROM 
        [FactResourceUsageHourly] FRH
    JOIN 
        [DimResource] DR
        ON FRH.[ResourceId] = DR.[ResourceId]
    WHERE 
        FRH.[Timestamp] >= DATEADD(DAY, -1, GETDATE()) -- Filter for the last 1 day
    GROUP BY 
        DR.[AccountName],
        DR.[SubscriptionId]
    ORDER BY
        StorageInGB DESC; -- Order by total storage usage
    
  7. Now, create a view on the data by opening the context menu and selecting Save as view. Give the view a unique name and then select Ok.

    Screenshot of the 'Save as view' context menu option for a SQL query in the workspace.

    Screenshot of the dialog to specify the name of the new view in the workspace.

    Tip

    Alternatively, create a view directly using this query:

    CREATE VIEW [MostActiveCustomers]
    AS
    SELECT 
        a.ResourceId AS UsageResourceId,
        a.Timestamp,
        a.MeterId,
        a.FleetId,
        a.ConsumedUnits,
        b.ResourceId AS ResourceDetailId
    FROM
        FactMeterUsageHourly a
    INNER JOIN
        DimResource b ON a.ResourceId = b.ResourceId
    
  8. Navigate to the newly created view within the Views folder of your endpoint.

    Screenshot of the 'Views' folder within the hierarchy of a SQL endpoint in the workspace.

  9. Navigate to the view you recently created (or a query) and then select Explorer this data (preview) and then select Visualize results.

    Screenshot of the menu option to visualize an existing query in the workspace.

  10. On the Power BI landing page, create relevant visuals for your scenario. For example, you can show the percentage of your Azure Cosmos DB workload with the autoscale feature enabled.

    Screenshot of the Power BI visualization dialog for a SQL query or view within the workspace.

This section walks through the steps to create and query a table or DataFrame loaded from data stored in Azure Storage (ADLS) or Azure Databricks. This section uses a Notebook connected to Apache Spark with Python and SQL cells.

  1. First, define the Azure Storage account configuration targeting the

    # Define storage configuration
    container_name = "<azure-storage-container-name>"
    account_name = "<azure-storage-account-name>"
    base_url = f"abfss://{container_name}@{account_name}.dfs.core.chinacloudapi.cn"
    source_path = f"{base_url}/FactResourceUsageHourly"
    
  2. Create the data as a table. Reload and refresh data from an external source (Azure Storage - ADLS) by dropping and recreating the fleet_data table.

    table_name = "fleet_data"
    
    # Drop the table if it exists
    spark.sql(f"DROP TABLE IF EXISTS {table_name}")
    
    # Create the table
    spark.sql(f"""
        CREATE TABLE {table_name}
        USING delta
        LOCATION '{source_path}'
    """)
    
  3. Query and render the results from the fleet_data table.

    # Query and display the table
    df = spark.sql(f"SELECT * FROM {table_name}")
    display(df)
    
  4. Define the full list of extra tables to be created for processing fleet analytics data.

    # Table names and folder paths (assumed to match)
    tables = [
        "DimResource",
        "DimMeter",
        "FactResourceUsageHourly",
        "FactAccountHourly",
        "FactRequestHourly",
        "FactMeterUsageHourly"
    ]
    
    # Drop and recreate each table
    for table in tables:
        spark.sql(f"DROP TABLE IF EXISTS {table}")
        spark.sql(f"""
            CREATE TABLE {table}
            USING delta
            LOCATION '{base_url}/{table}'
        """)
    
  5. Run a query using any of those tables. For example, this query finds your top 100 most active accounts by transactions.

    SELECT 
        DR.SubscriptionId,
        DR.AccountName,
        DR.ResourceGroup,
        SUM(FRH.TotalRequestCount) AS sum_total_requests
    FROM 
        FactRequestHourly FRH
    JOIN 
        DimResource DR
        ON FRH.ResourceId = DR.ResourceId
    WHERE 
        FRH.Timestamp >= DATE_SUB(CURRENT_DATE(), 7) -- Filter for the last 7 days
        AND FRH.ResourceName IN ('Document', 'StoredProcedure') -- Filter for Dataplane Operations
    GROUP BY 
        DR.AccountName,
        DR.SubscriptionId,
        DR.ResourceGroup
    ORDER BY
        sum_total_requests DESC
    LIMIT 100; -- Limit to top 100 results
    
  6. Run this query to find the top 100 largest accounts by storage.

    SELECT 
        DR.SubscriptionId,
        DR.AccountName,
        MAX(FRH.MaxDataStorageInKB / (1024.0 * 1024.0)) AS DataUsageInGB,
        MAX(FRH.MaxIndexStorageInKB / (1024.0 * 1024.0)) AS IndexUsageInGB,
        MAX(
            FRH.MaxDataStorageInKB / (1024.0 * 1024.0) + 
            FRH.MaxIndexStorageInKB / (1024.0 * 1024.0)
        ) AS StorageInGB
    FROM 
        FactResourceUsageHourly FRH
    JOIN 
        DimResource DR
        ON FRH.ResourceId = DR.ResourceId
    WHERE 
        FRH.Timestamp >= DATE_SUB(CURRENT_DATE(), 1) -- Filter for the last 1 day
    GROUP BY 
        DR.AccountName,
        DR.SubscriptionId
    ORDER BY
        StorageInGB DESC
    LIMIT 100; -- Limit to top 100 results