什么是 Azure Synapse Link for Azure Cosmos DB(预览版)?What is Azure Synapse Link for Azure Cosmos DB (Preview)?

重要

Azure Synapse Link for Azure Cosmos DB 目前以预览版提供。Azure Synapse Link for Azure Cosmos DB is currently in preview. 此预览版在提供时没有附带服务级别协议,不建议将其用于生产工作负荷。This preview version is provided without a service level agreement, and it's not recommended for production workloads. 有关详细信息,请参阅适用于 Azure 预览版的补充使用条款For more information, see Supplemental terms of use for Azure previews.

Azure Synapse Link for Azure Cosmos DB 是一种云原生混合事务和分析处理 (HTAP) 功能,可用于对 Azure Cosmos DB 中的操作数据运行准实时分析。Azure Synapse Link for Azure Cosmos DB is a cloud-native hybrid transactional and analytical processing (HTAP) capability that enables you to run near real-time analytics over operational data in Azure Cosmos DB. Azure Synapse Link 在 Azure Cosmos DB 和 Azure Synapse Analytics 之间建立紧密无缝的集成。Azure Synapse Link creates a tight seamless integration between Azure Cosmos DB and Azure Synapse Analytics.

通过使用 Azure Cosmos DB 分析存储(完全隔离的列存储),Azure Synapse Link 可以在 Azure Synapse Analytics 中针对大规模操作数据提供无提取-转换-加载 (ETL) 分析。Using Azure Cosmos DB analytical store, a fully isolated column store, Azure Synapse Link enables no Extract-Transform-Load (ETL) analytics in Azure Synapse Analytics against your operational data at scale. 业务分析师、数据工程师和数据科学家现在可以互换使用 Synapse Spark 或 Synapse SQL 来运行准实时商业智能、分析和机器学习管道。Business analysts, data engineers and data scientists can now use Synapse Spark or Synapse SQL interchangeably to run near real-time business intelligence, analytics, and machine learning pipelines. 可以实现此目的,而不会影响 Azure Cosmos DB 上的事务工作负荷的性能。You can achieve this without impacting the performance of your transactional workloads on Azure Cosmos DB.

下图显示了 Azure Synapse Link 与 Azure Cosmos DB 和 Azure Synapse Analytics 的集成:The following image shows the Azure Synapse Link integration with Azure Cosmos DB and Azure Synapse Analytics:

Azure Synapse Analytics 与 Azure Cosmos DB 集成的体系结构关系图

优点Benefits

若要分析较大的操作数据集,同时最大限度地降低对任务关键型事务工作负荷性能的影响,通常情况下会通过提取-转换-加载 (ETL) 管道提取并处理 Azure Cosmos DB 中的操作数据。To analyze large operational datasets while minimizing the impact on the performance of mission-critical transactional workloads, traditionally, the operational data in Azure Cosmos DB is extracted and processed by Extract-Transform-Load (ETL) pipelines. ETL 管道需要大量数据移动,这样会增加操作复杂性,并对事务工作负荷产生性能影响。ETL pipelines require many layers of data movement resulting in much operational complexity, and performance impact on your transactional workloads. 还会增加延迟以从原始时间分析操作数据。It also increases the latency to analyze the operational data from the time of origin.

与传统的基于 ETL 的解决方案相比,Azure Synapse Link for Azure Cosmos DB 提供了多种优势,例如:When compared to the traditional ETL-based solutions, Azure Synapse Link for Azure Cosmos DB offers several advantages such as:

降低了复杂性,无需管理 ETL 作业Reduced complexity with No ETL jobs to manage

借助 Azure Synapse Link,可以使用 Azure Synapse Analytics 直接访问 Azure Cosmos DB 分析存储,无需进行复杂的数据移动。Azure Synapse Link allows you to directly access Azure Cosmos DB analytical store using Azure Synapse Analytics without complex data movement. 对操作数据所做的任何更新都准实时显示在分析存储中,不包含任何 ETL 或更改源作业。Any updates made to the operational data are visible in the analytical store in near real-time with no ETL or change feed jobs. 可以从 Synapse Analytics 对分析存储运行大型分析,无需额外的数据转换。You can run large scale analytics against analytical store, from Synapse Analytics, without additional data transformation.

准实时了解操作数据Near real-time insights into your operational data

使用 Azure Synapse Link,现在可以准实时深入了解操作数据。You can now get rich insights on your operational data in near real-time, using Azure Synapse Link. 由于有许多层要提取、转换和加载操作数据,因此基于 ETL 的系统在分析操作数据时往往会产生较高的延迟。ETL-based systems tend to have higher latency for analyzing your operational data, due to many layers needed to extract, transform and load the operational data. 通过 Azure Cosmos DB 分析存储与 Azure Synapse Analytics 的本机集成,可以准实时分析操作数据,从而启用新的业务方案。With native integration of Azure Cosmos DB analytical store with Azure Synapse Analytics, you can analyze operational data in near real-time enabling new business scenarios.

不会影响操作工作负荷No impact on operational workloads

借助 Azure Synapse Link,可以在使用事务工作负荷(基于行的事务存储)的预配吞吐量处理事务操作时,针对 Azure Cosmos DB 分析存储(单独的列存储)运行分析查询。With Azure Synapse Link, you can run analytical queries against an Azure Cosmos DB analytical store (a separate column store) while the transactional operations are processed using provisioned throughput for the transactional workload (a row-based transactional store). 分析工作负荷独立于事务工作负荷流量提供,无需使用为操作数据预配的任何吞吐量。The analytical workload is served independent of the transactional workload traffic without consuming any of the throughput provisioned for your operational data.

针对大规模分析工作负荷进行了优化Optimized for large-scale analytics workloads

Azure Cosmos DB 分析存储已经过优化,可为分析工作负荷提供可伸缩性、弹性和性能,无需依赖计算运行时。Azure Cosmos DB analytical store is optimized to provide scalability, elasticity, and performance for analytical workloads without any dependency on the compute run-times. 存储技术是自行管理,可优化分析工作负荷。The storage technology is self-managed to optimize your analytics workloads. 通过对 Azure Synapse Analytics 的内置支持,访问此存储层可提供简单性和高性能。With built-in support into Azure Synapse Analytics, accessing this storage layer provides simplicity and high performance.

经济高效Cost effective

借助 Azure Synapse Link,可以为运营分析获取成本优化且完全托管的解决方案。With Azure Synapse Link, you can get a cost-optimized, fully managed solution for operational analytics. 它消除了用于分析操作数据的传统 ETL 管道所需的额外存储层和计算层。It eliminates the extra layers of storage and compute required in traditional ETL pipelines for analyzing operational data.

Azure Cosmos DB 分析存储遵循基于消耗的定价模型,该模型基于数据存储和分析读/写操作以及已执行的查询。Azure Cosmos DB analytical store follows a consumption-based pricing model, which is based on data storage and analytical read/write operations and queries executed . 这不会要求你预配任何吞吐量,因为目前如此操作是为了事务工作负荷。It doesn't require you to provision any throughput, as you do today for the transactional workloads. 通过 Azure Synapse Analytics 中极具弹性的计算引擎来访问你的数据,使运行存储和计算的总体成本非常高效。Accessing your data with highly elastic compute engines from Azure Synapse Analytics makes the overall cost of running storage and compute very efficient.

本地可用、多区域分发的多主数据的分析Analytics for locally available, multiple-regionally distributed, multi master data

你可以高效地对 Azure Cosmos DB 中距离最近的区域的数据副本运行分析查询。You can run analytical queries effectively against the nearest regional copy of the data in Azure Cosmos DB. Azure Cosmos DB 提供了最先进的功能,可按主动-主动的方式运行多区域分发的分析工作负载和事务工作负载。Azure Cosmos DB provides the state-of-the-art capability to run the multiple-regionally distributed analytical workloads along with transactional workloads in an active-active manner.

为操作数据启用 HTAP 方案Enable HTAP scenarios for your operational data

Synapse Link 汇集了 Azure Cosmos DB 分析存储和 Azure Synapse Analytics 运行时支持。Synapse Link brings together Azure Cosmos DB analytical store with Azure Synapse analytics runtime support. 通过此集成,你可以构建云原生 HTAP(混合事务/分析处理)解决方案,该解决方案基于大型数据集的操作数据的实时更新生成见解。This integration enables you to build cloud native HTAP (Hybrid transactional/analytical processing) solutions that generate insights based on real-time updates to your operational data over large datasets. 它可解锁新的业务方案,以根据实时趋势发出警报,构建准实时仪表板并基于用户行为生成业务体验。It unlocks new business scenarios to raise alerts based on live trends, build near real-time dashboards, and business experiences based on user behavior.

Azure Cosmos DB 分析存储Azure Cosmos DB analytical store

Azure Cosmos DB 分析存储是 Azure Cosmos DB 中的操作数据的面向列的表示形式。Azure Cosmos DB analytical store is a column-oriented representation of your operational data in Azure Cosmos DB. 此分析存储适用于对大型操作数据集进行快速且经济高效的查询,无需复制数据和影响事务工作负荷的性能。This analytical store is suitable for fast, cost effective queries on large operational data sets, without copying data and impacting the performance of your transactional workloads.

分析存储准实时自动选择事务工作负载中的高频率插入、更新、删除,作为 Azure Cosmos DB 的完全托管功能(“自动同步”)。Analytical store automatically picks up high frequency inserts, updates, deletes in your transactional workloads in near real time, as a fully managed capability ("auto-sync") of Azure Cosmos DB. 无需更改源或 ETL。No change feed or ETL is required.

如果你拥有多区域分发的 Azure Cosmos DB 帐户,为容器启用分析存储后,它将适用于该帐户的所有区域。If you have a multiple-regionally distributed Azure Cosmos DB account, after you enable analytical store for a container, it will be available in all regions for that account. 有关分析存储的详细信息,请参阅 Azure Cosmos DB 分析存储概述一文。For more information on the analytical store, see Azure Cosmos DB Analytical store overview article.

与 Azure Synapse Analytics 集成Integration with Azure Synapse Analytics

借助 Synapse Link,你现在可以直接从 Azure Synapse Analytics 连接到 Azure Cosmos DB 容器,并访问没有单独连接器的分析存储。With Synapse Link, you can now directly connect to your Azure Cosmos DB containers from Azure Synapse Analytics and access the analytical store with no separate connectors.

可以在 Azure Synapse Analytics 支持的不同分析运行时间内以互操作方式同时从 Azure Cosmos DB 分析存储查询数据。You can query the data from Azure Cosmos DB analytical store simultaneously, with interop across different analytics run times supported by Azure Synapse Analytics. 不需要其他数据转换来分析操作数据。No additional data transformations are required to analyze the operational data. 可以使用以下对象来查询和分析分析存储数据:You can query and analyze the analytical store data using:

  • 完全支持 Scala、Python、SparkSQL 和 C# 的 Synapse Apache Spark。Synapse Apache Spark with full support for Scala, Python, SparkSQL, and C#. Synapse Spark 是数据工程和数据科学方案的核心Synapse Spark is central to data engineering and data science scenarios

  • 具有 T-SQL 语言且支持熟悉的 BI 工具(例如,Power BI Premium 等)的 SQL 无服务器SQL serverless with T-SQL language and support for familiar BI tools (for example, Power BI Premium, etc.)

备注

通过 Azure Synapse Analytics,可以同时访问 Azure Cosmos DB 容器中的分析存储和事务存储。From Azure Synapse Analytics, you can access both analytical and transactional stores in your Azure Cosmos DB container. 但是,如果要对操作数据运行大规模分析或扫描,我们建议你使用分析存储来避免对事务工作负荷的性能影响。However, if you want to run large-scale analytics or scans on your operational data, we recommend that you use analytical store to avoid performance impact on transactional workloads.

备注

可以通过将 Azure Cosmos DB 容器连接到 Azure 区域中的 Synapse 运行时,在该区域中以较低的延迟运行分析。You can run analytics with low latency in an Azure region by connecting your Azure Cosmos DB container to Synapse runtime in that region.

此集成可为不同用户启用以下 HTAP 方案:This integration enables the following HTAP scenarios for different users:

  • BI 工程师,想要建模和发布报表,以便通过 Synapse SQL 直接访问 Azure Cosmos DB 中的操作数据。A BI engineer who wants to model and publish a report to access the operational data in Azure Cosmos DB directly through Synapse SQL.

  • 数据分析人员,想要通过使用 Synapse SQL 查询 Azure Cosmos DB 容器中的操作数据来从中获得见解,大规模读取数据并将这些发现与其他数据源合并。A data analyst who wants to derive insights from the operational data in an Azure Cosmos DB container by querying it with Synapse SQL, read the data at scale and combine those findings with other data sources.

  • 数据科学家,想要使用 Synapse Spark 查找一项功能来改善其模型并在不进行复杂的数据工程的情况下训练该模型。A data scientist who wants to use Synapse Spark to find a feature to improve their model and train that model without doing complex data engineering. 他们还可以将模型 post 推理的结果写入 Azure Cosmos DB 以便通过 Spark Synapse 对数据进行实时评分。They can also write the results of the model post inference into Azure Cosmos DB for real-time scoring on the data through Spark Synapse.

  • 数据工程师,想要无需手动 ETL 过程即可对 Azure Cosmos DB 容器创建 SQL 或 Spark 表,以便使数据可供使用者访问。A data engineer who wants to make data accessible for consumers, by creating SQL or Spark tables over Azure Cosmos DB containers without manual ETL processes.

建议在以下情况下使用 Synapse Link:Synapse Link is recommended in the following cases:

  • 如果你是 Azure Cosmos DB 客户,并且想要对操作数据运行分析、BI 和机器学习。If you are an Azure Cosmos DB customer and you want to run analytics, BI, and machine learning over your operational data. 在此类情况下,Synapse Link 提供了更集成的分析体验,同时不影响事务存储的预配吞吐量。In such cases, Synapse Link provides a more integrated analytics experience without impacting your transactional store's provisioned throughput. 例如:For example:

    • 如果你现在直接使用单独的连接器对 Azure Cosmos DB 操作数据运行分析或 BI,或If you are running analytics or BI on your Azure Cosmos DB operational data directly using separate connectors today, or

    • 如果你正在运行 ETL 过程以将操作数据提取到单独的分析系统中。If you are running ETL processes to extract operational data into a separate analytics system.

在此类情况下,Synapse Link 提供了更集成的分析体验,同时不影响事务存储的预配吞吐量。In such cases, Synapse Link provides a more integrated analytics experience without impacting your transactional store's provisioned throughput.

如果需要传统的数据仓库要求(例如,高并发、工作负荷管理和跨多个数据源的聚合持久性),则不建议使用 Synapse Link。Synapse Link is not recommended if you are looking for traditional data warehouse requirements such as high concurrency, workload management, and persistence of aggregates across multiple data sources. 有关详细信息,请参阅可由 Synapse Link for Azure Cosmos DB 提供帮助的常见方案For more information, see common scenarios that can be powered with Azure Synapse Link for Azure Cosmos DB.

限制Limitations

  • 在公共预览版中,仅 Azure Cosmos DB SQL (Core) API 支持 Azure Synapse Link。During the public preview, Azure Synapse Link is supported only for the Azure Cosmos DB SQL (Core) API. 对 Azure Cosmos DB 的用于 MongoDB 的 API 和 Cassandra API 的支持当前处于封闭预览阶段。Support for Azure Cosmos DB's API for MongoDB & Cassandra API are currently under a gated preview. 若要请求对封闭预览版的访问权限,请向 Azure Cosmos DB 团队发送电子邮件。To request access to the gated preview, email the Azure Cosmos DB team.

  • 目前,只能为新容器(在新 Azure Cosmos DB 帐户和现有 Azure Cosmos DB 帐户中)启用分析存储。Currently, the analytical store can only be enabled for new containers (both in new and existing Azure Cosmos DB accounts).

  • 在预览版中,对于启用了 Synapse Link 的数据库帐户,不支持对容器进行备份和还原。In preview, for Synapse Link enabled database accounts, backup and restore of containers is not supported. 如果拥有需要备份和还原功能的生产工作负荷,建议不要在这些数据库帐户上启用 Synapse Link。If you have production workloads that require backup and restore functionality, we recommended not to enable Synapse Link on those database accounts.

  • 通过 Synapse SQL 无服务器访问 Azure Cosmos DB 分析存储当前处于封闭预览阶段。Accessing the Azure Cosmos DB analytic store with Synapse SQL serverless is currently under gated preview. 若要请求访问权限,请向 Azure Cosmos DB 团队发送电子邮件。To request access, email the Azure Cosmos DB team.

  • 当前无法访问预配了 Synapse SQL 的 Azure Cosmos DB 分析存储。Accessing the Azure Cosmos DB analytics store with Synapse SQL provisioned is currently not available.

定价Pricing

Azure Synapse Link 的计费模型包括使用 Azure Cosmos DB 分析存储和 Synapse 运行时所产生的成本。The billing model of Azure Synapse Link includes the costs incurred by using the Azure Cosmos DB analytical store and the Synapse runtime. 若要了解详细信息,请参阅 Azure Cosmos DB 分析存储定价Azure Synapse Analytics 定价文章。To learn more, see the Azure Cosmos DB analytical store pricing and Azure Synapse Analytics pricing articles.

后续步骤Next steps

若要了解更多信息,请参阅下列文档:To learn more, see the following docs: