什么是 Azure Synapse Link for Azure Cosmos DB?What is Azure Synapse Link for Azure Cosmos DB?

适用于: SQL API Azure Cosmos DB API for MongoDB

Azure Synapse Link for Azure Cosmos DB 是一种云原生混合事务和分析处理 (HTAP) 功能,可用于对 Azure Cosmos DB 中的操作数据运行准实时分析。Azure Synapse Link for Azure Cosmos DB is a cloud-native hybrid transactional and analytical processing (HTAP) capability that enables you to run near real-time analytics over operational data in Azure Cosmos DB. Azure Synapse Link 在 Azure Cosmos DB 和 Azure Synapse Analytics 之间建立紧密无缝的集成。Azure Synapse Link creates a tight seamless integration between Azure Cosmos DB and Azure Synapse Analytics.

通过使用 Azure Cosmos DB 分析存储(完全隔离的列存储),Azure Synapse Link 可以在 Azure Synapse Analytics 中针对大规模操作数据提供无提取-转换-加载 (ETL) 分析。Using Azure Cosmos DB analytical store, a fully isolated column store, Azure Synapse Link enables no Extract-Transform-Load (ETL) analytics in Azure Synapse Analytics against your operational data at scale. 业务分析师、数据工程师和数据科学家现在可以互换使用 Synapse Spark 或 Synapse SQL 来运行准实时商业智能、分析和机器学习管道。Business analysts, data engineers and data scientists can now use Synapse Spark or Synapse SQL interchangeably to run near real-time business intelligence, analytics, and machine learning pipelines. 可以实现此目的,而不会影响 Azure Cosmos DB 上的事务工作负荷的性能。You can achieve this without impacting the performance of your transactional workloads on Azure Cosmos DB.

下图显示了 Azure Synapse Link 与 Azure Cosmos DB 和 Azure Synapse Analytics 的集成:The following image shows the Azure Synapse Link integration with Azure Cosmos DB and Azure Synapse Analytics:

Azure Synapse Analytics 与 Azure Cosmos DB 集成的体系结构关系图

优点Benefits

若要分析较大的操作数据集,同时最大限度地降低对任务关键型事务工作负荷性能的影响,通常情况下会通过提取-转换-加载 (ETL) 管道提取并处理 Azure Cosmos DB 中的操作数据。To analyze large operational datasets while minimizing the impact on the performance of mission-critical transactional workloads, traditionally, the operational data in Azure Cosmos DB is extracted and processed by Extract-Transform-Load (ETL) pipelines. ETL 管道需要大量数据移动,这样会增加操作复杂性,并对事务工作负荷产生性能影响。ETL pipelines require many layers of data movement resulting in much operational complexity, and performance impact on your transactional workloads. 还会增加延迟以从原始时间分析操作数据。It also increases the latency to analyze the operational data from the time of origin.

与传统的基于 ETL 的解决方案相比,Azure Synapse Link for Azure Cosmos DB 提供了多种优势,例如:When compared to the traditional ETL-based solutions, Azure Synapse Link for Azure Cosmos DB offers several advantages such as:

降低了复杂性,无需管理 ETL 作业Reduced complexity with No ETL jobs to manage

借助 Azure Synapse Link,可以使用 Azure Synapse Analytics 直接访问 Azure Cosmos DB 分析存储,无需进行复杂的数据移动。Azure Synapse Link allows you to directly access Azure Cosmos DB analytical store using Azure Synapse Analytics without complex data movement. 对操作数据所做的任何更新都准实时显示在分析存储中,不包含任何 ETL 或更改源作业。Any updates made to the operational data are visible in the analytical store in near real-time with no ETL or change feed jobs. 可从 Azure Synapse Analytics 针对分析存储运行大规模分析,无需额外的数据转换。You can run large scale analytics against analytical store, from Azure Synapse Analytics, without additional data transformation.

准实时了解操作数据Near real-time insights into your operational data

使用 Azure Synapse Link,现在可以准实时深入了解操作数据。You can now get rich insights on your operational data in near real-time, using Azure Synapse Link. 由于有许多层要提取、转换和加载操作数据,因此基于 ETL 的系统在分析操作数据时往往会产生较高的延迟。ETL-based systems tend to have higher latency for analyzing your operational data, due to many layers needed to extract, transform and load the operational data. 通过 Azure Cosmos DB 分析存储与 Azure Synapse Analytics 的本机集成,可以准实时分析操作数据,从而启用新的业务方案。With native integration of Azure Cosmos DB analytical store with Azure Synapse Analytics, you can analyze operational data in near real-time enabling new business scenarios.

不会影响操作工作负荷No impact on operational workloads

借助 Azure Synapse Link,可以在使用事务工作负荷(基于行的事务存储)的预配吞吐量处理事务操作时,针对 Azure Cosmos DB 分析存储(单独的列存储)运行分析查询。With Azure Synapse Link, you can run analytical queries against an Azure Cosmos DB analytical store (a separate column store) while the transactional operations are processed using provisioned throughput for the transactional workload (a row-based transactional store). 分析工作负荷独立于事务工作负荷流量提供,无需使用为操作数据预配的任何吞吐量。The analytical workload is served independent of the transactional workload traffic without consuming any of the throughput provisioned for your operational data.

针对大规模分析工作负荷进行了优化Optimized for large-scale analytics workloads

Azure Cosmos DB 分析存储已经过优化,可为分析工作负荷提供可伸缩性、弹性和性能,无需依赖计算运行时。Azure Cosmos DB analytical store is optimized to provide scalability, elasticity, and performance for analytical workloads without any dependency on the compute run-times. 存储技术是自行管理,可优化分析工作负荷。The storage technology is self-managed to optimize your analytics workloads. 通过对 Azure Synapse Analytics 的内置支持,访问此存储层可提供简单性和高性能。With built-in support into Azure Synapse Analytics, accessing this storage layer provides simplicity and high performance.

经济高效Cost effective

借助 Azure Synapse Link,可以为运营分析获取成本优化且完全托管的解决方案。With Azure Synapse Link, you can get a cost-optimized, fully managed solution for operational analytics. 它消除了用于分析操作数据的传统 ETL 管道所需的额外存储层和计算层。It eliminates the extra layers of storage and compute required in traditional ETL pipelines for analyzing operational data.

Azure Cosmos DB 分析存储遵循基于消耗的定价模型,该模型基于数据存储和分析读/写操作以及已执行的查询。Azure Cosmos DB analytical store follows a consumption-based pricing model, which is based on data storage and analytical read/write operations and queries executed . 这不会要求你预配任何吞吐量,因为目前如此操作是为了事务工作负荷。It doesn't require you to provision any throughput, as you do today for the transactional workloads. 通过 Azure Synapse Analytics 中极具弹性的计算引擎来访问你的数据,使运行存储和计算的总体成本非常高效。Accessing your data with highly elastic compute engines from Azure Synapse Analytics makes the overall cost of running storage and compute very efficient.

针对本地可用的多区域分布式多区域写入进行的分析Analytics for locally available, multiple-regionally distributed, multi-region writes

你可以高效地对 Azure Cosmos DB 中距离最近的区域的数据副本运行分析查询。You can run analytical queries effectively against the nearest regional copy of the data in Azure Cosmos DB. Azure Cosmos DB 提供了最先进的功能,可按主动-主动的方式运行多区域分发的分析工作负载和事务工作负载。Azure Cosmos DB provides the state-of-the-art capability to run the multiple-regionally distributed analytical workloads along with transactional workloads in an active-active manner.

为操作数据启用 HTAP 方案Enable HTAP scenarios for your operational data

Synapse Link 汇集了 Azure Cosmos DB 分析存储和 Azure Synapse Analytics 运行时支持。Synapse Link brings together Azure Cosmos DB analytical store with Azure Synapse analytics runtime support. 通过此集成,你可以构建云原生 HTAP(混合事务/分析处理)解决方案,该解决方案基于大型数据集的操作数据的实时更新生成见解。This integration enables you to build cloud native HTAP (Hybrid transactional/analytical processing) solutions that generate insights based on real-time updates to your operational data over large datasets. 它可解锁新的业务方案,以根据实时趋势发出警报,构建准实时仪表板并基于用户行为生成业务体验。It unlocks new business scenarios to raise alerts based on live trends, build near real-time dashboards, and business experiences based on user behavior.

Azure Cosmos DB 分析存储Azure Cosmos DB analytical store

Azure Cosmos DB 分析存储是 Azure Cosmos DB 中的操作数据的面向列的表示形式。Azure Cosmos DB analytical store is a column-oriented representation of your operational data in Azure Cosmos DB. 此分析存储适用于对大型操作数据集进行快速且经济高效的查询,无需复制数据和影响事务工作负荷的性能。This analytical store is suitable for fast, cost effective queries on large operational data sets, without copying data and impacting the performance of your transactional workloads.

分析存储准实时自动选择事务工作负载中的高频率插入、更新、删除,作为 Azure Cosmos DB 的完全托管功能(“自动同步”)。Analytical store automatically picks up high frequency inserts, updates, deletes in your transactional workloads in near real time, as a fully managed capability ("auto-sync") of Azure Cosmos DB. 无需更改源或 ETL。No change feed or ETL is required.

如果你拥有多区域分发的 Azure Cosmos DB 帐户,为容器启用分析存储后,它将适用于该帐户的所有区域。If you have a multiple-regionally distributed Azure Cosmos DB account, after you enable analytical store for a container, it will be available in all regions for that account. 有关分析存储的详细信息,请参阅 Azure Cosmos DB 分析存储概述一文。For more information on the analytical store, see Azure Cosmos DB Analytical store overview article.

与 Azure Synapse Analytics 集成Integration with Azure Synapse Analytics

借助 Synapse Link,你现在可以直接从 Azure Synapse Analytics 连接到 Azure Cosmos DB 容器,并访问没有单独连接器的分析存储。With Synapse Link, you can now directly connect to your Azure Cosmos DB containers from Azure Synapse Analytics and access the analytical store with no separate connectors.

可以在 Azure Synapse Analytics 支持的不同分析运行时间内以互操作方式同时从 Azure Cosmos DB 分析存储查询数据。You can query the data from Azure Cosmos DB analytical store simultaneously, with interop across different analytics run times supported by Azure Synapse Analytics. 不需要其他数据转换来分析操作数据。No additional data transformations are required to analyze the operational data. 可以使用以下对象来查询和分析分析存储数据:You can query and analyze the analytical store data using:

  • 完全支持 Scala、Python、SparkSQL 和 C# 的 Synapse Apache Spark。Synapse Apache Spark with full support for Scala, Python, SparkSQL, and C#. Synapse Spark 是数据工程和数据科学方案的核心Synapse Spark is central to data engineering and data science scenarios

  • 采用 T-SQL 语言且支持熟悉的 BI 工具(例如 Power BI Premium 等)的无服务器 SQL 池Serverless SQL pool with T-SQL language and support for familiar BI tools (for example, Power BI Premium, etc.)

备注

通过 Azure Synapse Analytics,可以同时访问 Azure Cosmos DB 容器中的分析存储和事务存储。From Azure Synapse Analytics, you can access both analytical and transactional stores in your Azure Cosmos DB container. 但是,如果要对操作数据运行大规模分析或扫描,我们建议你使用分析存储来避免对事务工作负荷的性能影响。However, if you want to run large-scale analytics or scans on your operational data, we recommend that you use analytical store to avoid performance impact on transactional workloads.

备注

可以通过将 Azure Cosmos DB 容器连接到 Azure 区域中的 Synapse 运行时,在该区域中以较低的延迟运行分析。You can run analytics with low latency in an Azure region by connecting your Azure Cosmos DB container to Synapse runtime in that region.

此集成可为不同用户启用以下 HTAP 方案:This integration enables the following HTAP scenarios for different users:

  • BI 工程师,想要对 Power BI 报表建模后将其发布,以便通过 Synapse SQL 直接访问 Azure Cosmos DB 中的实时操作数据。A BI engineer who wants to model and publish a Power BI report to access the live operational data in Azure Cosmos DB directly through Synapse SQL.

  • 数据分析人员,想要通过使用 Synapse SQL 查询 Azure Cosmos DB 容器中的操作数据来从中获得见解,大规模读取数据并将这些发现与其他数据源合并。A data analyst who wants to derive insights from the operational data in an Azure Cosmos DB container by querying it with Synapse SQL, read the data at scale and combine those findings with other data sources.

  • 数据科学家,想要使用 Synapse Spark 查找一项功能来改善其模型并在不进行复杂的数据工程的情况下训练该模型。A data scientist who wants to use Synapse Spark to find a feature to improve their model and train that model without doing complex data engineering. 他们还可以将模型 post 推理的结果写入 Azure Cosmos DB 以便通过 Spark Synapse 对数据进行实时评分。They can also write the results of the model post inference into Azure Cosmos DB for real-time scoring on the data through Spark Synapse.

  • 数据工程师,想要无需手动 ETL 过程即可对 Azure Cosmos DB 容器创建 SQL 或 Spark 表,以便使数据可供使用者访问。A data engineer who wants to make data accessible for consumers, by creating SQL or Spark tables over Azure Cosmos DB containers without manual ETL processes.

安全性Security

Synapse Link 让你可以对 Azure Cosmos DB 中的关键任务型数据进行准实时分析。Synapse Link enables you to run near real-time analytics over your mission-critical data in Azure Cosmos DB. 务必确保关键业务数据安全地存储在事务存储和分析存储中。It is vital to make sure that critical business data is stored securely across both transactional and analytical stores. 适用于 Azure Cosmos DB 的 Azure Synapse Link 旨在通过以下功能帮助满足这些安全要求:Azure Synapse Link for Azure Cosmos DB is designed to help meet these security requirements through the following features:

  • 使用客户管理的密钥进行数据加密 - 可以采用自动且透明的方式使用相同的客户管理密钥无缝地跨事务存储和分析存储加密数据。Data encryption with customer-managed keys - You can seamlessly encrypt the data across transactional and analytical stores using the same customer-managed keys in an automatic and transparent manner. 若要了解详细信息,请参阅配置客户管理的密钥一文。To learn more, see how to Configure customer-managed keys article.

建议在以下情况下使用 Synapse Link:Synapse Link is recommended in the following cases:

  • 如果你是 Azure Cosmos DB 客户,并且想要对操作数据运行分析、BI 和机器学习。If you are an Azure Cosmos DB customer and you want to run analytics, BI, and machine learning over your operational data. 在此类情况下,Synapse Link 提供了更集成的分析体验,同时不影响事务存储的预配吞吐量。In such cases, Synapse Link provides a more integrated analytics experience without impacting your transactional store's provisioned throughput. 例如:For example:

    • 如果你现在直接使用单独的连接器对 Azure Cosmos DB 操作数据运行分析或 BI,或If you are running analytics or BI on your Azure Cosmos DB operational data directly using separate connectors today, or

    • 如果你正在运行 ETL 过程以将操作数据提取到单独的分析系统中。If you are running ETL processes to extract operational data into a separate analytics system.

在此类情况下,Synapse Link 提供了更集成的分析体验,同时不影响事务存储的预配吞吐量。In such cases, Synapse Link provides a more integrated analytics experience without impacting your transactional store's provisioned throughput.

如果需要传统的数据仓库要求(例如,高并发、工作负荷管理和跨多个数据源的聚合持久性),则不建议使用 Synapse Link。Synapse Link is not recommended if you are looking for traditional data warehouse requirements such as high concurrency, workload management, and persistence of aggregates across multiple data sources. 有关详细信息,请参阅可由 Synapse Link for Azure Cosmos DB 提供帮助的常见方案For more information, see common scenarios that can be powered with Azure Synapse Link for Azure Cosmos DB.

限制Limitations

  • SQL API 和 Azure Cosmos DB API for MongoDB 支持 Azure Synapse Link for Azure Cosmos DB。Azure Synapse Link for Azure Cosmos DB is supported for SQL API and Azure Cosmos DB API for MongoDB. 而 Gremlin API、Cassandra API 和表 API 不支持它。It is not supported for Gremlin API, Cassandra API, and Table API.

  • 只能为新容器启用分析存储。Analytical store can only be enabled for new containers. 若要对现有容器使用分析存储,可以使用 Azure Cosmos DB 迁移工具将数据从现有容器迁移到新容器。To use analytical store for existing containers, migrate data from your existing containers to new containers using Azure Cosmos DB migration tools. 可以在新的和现有的 Azure Cosmos DB 帐户上启用 Synapse Link。You can enable Synapse Link on new and existing Azure Cosmos DB accounts.

  • 对于启用了分析存储的容器,目前不支持自动备份和还原分析存储中的数据。For the containers with analytical store turned on, automatic backup and restore of your data in the analytical store is not supported at this time. 在数据库帐户上启用 Synapse Link 后,Azure Cosmos DB 会像往常一样,继续按计划的备份间隔以自动方式仅对容器的事务性存储中的数据执行备份When Synapse Link is enabled on a database account, Azure Cosmos DB will continue to automatically take backups of your data in the transactional store (only) of containers at scheduled backup interval, as always. 需要特别注意的是,将启用了分析存储的容器还原到新帐户时,将仅使用事务性存储来还原容器,而不会启用分析存储。It is important to note that when a container with analytical store turned on is restored to a new account, the container will be restored with only transactional store and no analytical store enabled.

  • 当前无法访问预配了 Synapse SQL 的 Azure Cosmos DB 分析存储。Accessing the Azure Cosmos DB analytics store with Synapse SQL provisioned is currently not available.

定价Pricing

Azure Synapse Link 的计费模型包括使用 Azure Cosmos DB 分析存储和 Synapse 运行时所产生的成本。The billing model of Azure Synapse Link includes the costs incurred by using the Azure Cosmos DB analytical store and the Synapse runtime. 若要了解详细信息,请参阅 Azure Cosmos DB 分析存储定价Azure Synapse Analytics 定价文章。To learn more, see the Azure Cosmos DB analytical store pricing and Azure Synapse Analytics pricing articles.

后续步骤Next steps

若要了解更多信息,请参阅下列文档:To learn more, see the following docs: