Azure Synapse Link for Azure Cosmos DB:准实时分析用例Azure Synapse Link for Azure Cosmos DB: Near real-time analytics use cases

适用于 Azure Cosmos DB 的 Azure Synapse Link 是一种云原生混合事务和分析处理 (HTAP) 功能,可用于对操作数据运行准实时分析。Azure Synapse Link for Azure Cosmos DB is a cloud native hybrid transactional and analytical processing (HTAP) capability that enables you to run near real-time analytics over operational data. Synapse Link 在 Azure Cosmos DB 和 Azure Synapse Analytics 之间建立紧密的无缝集成。Synapse Link creates a tight seamless integration between Azure Cosmos DB and Azure Synapse Analytics.

你可能会想知道,哪些行业用例可以利用此云原生 HTAP 功能对操作数据进行准实时分析。You might be curious to understand what industry use cases can leverage this cloud native HTAP capability for near real-time analytics over operational data. 下面是 Azure Synapse Link for Azure Cosmos DB 的三个常见用例:Here are three common use cases for Azure Synapse Link for Azure Cosmos DB:

  • 供应链分析、预测和报告Supply chain analytics, forecasting & reporting
  • 实时个性化设置Real-time personalization
  • IOT 方案中的预测性维护,异常情况检测Predictive maintenance, anomaly detection in IOT scenarios

备注

Azure Synapse Link for Azure Cosmos DB 面向企业团队想要运行准实时分析的方案。Azure Synapse Link for Azure Cosmos DB targets the scenario where enterprise teams are looking to run near real-time analytics. 这些分析是在没有 ETL 的情况下针对操作数据运行的,这些数据是由基于 Azure Cosmos DB 的事务应用程序生成的。These analytics are run without ETL over operational data generated across transactional applications built on Azure Cosmos DB. 在以下情况下,这并不能取代对单独的数据仓库的需求:当多个数据源中存在传统的数据仓库要求(例如,工作负荷管理、高并发、暂留聚合)时。This does not replace the need for a separate data warehouse when there are traditional data warehouse requirements such as workload management, high concurrency, persistence aggregates across multiple data sources.

供应链分析、预测和报告Supply chain analytics, forecasting & reporting

研究表明,在供应链操作中嵌入大数据分析会改进订单循环交付时间并提高供应链效率。Research studies show that embedding big data analytics in supply chain operations leads to improvements in order-to-cycle delivery times and supply chain efficiency.

制造商载入云原生技术,以打破旧企业资源规划 (ERP) 和供应链管理 (SCM) 系统的约束。Manufacturers are onboarding to cloud-native technologies to break out of constraints of legacy Enterprise Resource Planning (ERP) and Supply Chain Management (SCM) systems. 随着供应链每分钟不断生成操作数据(订单、发货、事务数据),制造商需要一个操作数据库。With supply chains generating increasing volumes of operational data every minute (order, shipment, transaction data), manufacturers need an operational database. 此操作数据库应进行扩展,以便能够处理数据卷和分析平台,以达到某一级别的实时上下文智能,从而保持领先优势。This operational database should scale to handle the data volumes as well as an analytical platform to get to a level of real-time contextual intelligence to stay ahead of the curve.

以下体系结构展示了将 Azure Cosmos DB 用作供应链分析中的云原生操作数据库和 Synapse Link 的能力:The following architecture shows the power of leveraging Azure Cosmos DB as the cloud-native operational database and Synapse Link in supply chain analytics:

供应链分析中的 Azure Synapse Link for Azure Cosmos DB

根据以前的体系结构,可以通过 Synapse Link for Azure Cosmos DB 实现以下用例:Based on previous architecture, you can achieve the following use cases with Synapse Link for Azure Cosmos DB:

  • 准备和训练预测管道:使用机器学习转换生成有关供应链中的操作数据的见解。Prepare & train predictive pipeline: Generate insights over the operational data across the supply chain using machine learning translates. 这样,你就可以降低库存、运营成本,并为客户减少订单交付时间。This way you can lower inventory, operations costs, and reduce the order-to-delivery times for customers.

    通过 Synapse Link,可以在没有任何手动 ETL 过程的情况下分析 Azure Cosmos DB 中不断更改的操作数据。Synapse Link allows you to analyze the changing operational data in Azure Cosmos DB without any manual ETL processes. 这节省了额外的成本,并降低了延迟和操作复杂性。It saves you from additional cost, latency, and operational complexity. Synapse Link 使数据工程师和数据科学家能够生成可靠的预测管道:Synapse Link enables data engineers and data scientists to build robust predictive pipelines:

    • 通过利用与 Azure Synapse Analytics 中的 Apache Spark 池的本机集成来查询 Azure Cosmos DB 分析存储中的操作数据。Query operational data from Azure Cosmos DB analytical store by leveraging native integration with Apache Spark pools in Azure Synapse Analytics. 无需复杂的数据工程即可查询交互式笔记本或计划的远程作业中的数据。You can query the data in an interactive notebook or scheduled remote jobs without complex data engineering.

    • 使用 Azure Synapse Analytics 中的 Spark ML 算法和 Azure ML 集成生成机器学习 (ML) 模型。Build Machine Learning (ML) models with Spark ML algorithms and Azure ML integration in Azure Synapse Analytics.

    • 将模型推理后的结果写回到 Azure Cosmos DB 中以进行准实时操作评分。Write back the results after model inference into Azure Cosmos DB for operational near-real-time scoring.

  • 操作报告:供应链团队需要针对实时、准确的操作数据的灵活且自定义报表。Operational reporting: Supply chain teams need flexible and custom reports over real-time, accurate operational data. 需要这些报表来获取供应链有效性、盈利和效率的快照视图。These reports are required to obtain a snapshot view of supply chain effectiveness, profitability, and productivity. 它允许数据分析人员和其他关键利益干系人不断地重新评估业务,并标识要调整的领域以降低运营成本。It allows data analysts and other key stakeholders to constantly reevaluate the business and identify areas to tweak to reduce operational costs.

    Synapse Link for Azure Cosmos DB 支持丰富的商业智能 (BI)/报告方案:Synapse Link for Azure Cosmos DB enables rich business intelligence (BI)/reporting scenarios:

    • 通过使用与 Synapse SQL 无服务器的本机集成和 T-SQL 语言的完全表达性来查询 Azure Cosmos DB 分析存储中的操作数据。Query operational data from Azure Cosmos DB analytical store by using native integration with Synapse SQL Serverless and full expressiveness of T-SQL language.

    • 借助对熟悉的 BI 工具的 Synapse SQL 无服务器支持,通过 Azure Cosmos DB 建模和发布自动刷新 BI 仪表板。Model and publish auto refreshing BI dashboards over Azure Cosmos DB through Synapse SQL Serverless support for familiar BI tools. 例如,Azure Analysis Services、Power BI Premium 等。For example, Azure Analysis Services, Power BI Premium, etc.

下面是将批处理数据和流式处理数据集成到 Azure Cosmos DB 的一些指南:The following is some guidance for data integration for batch & streaming data into Azure Cosmos DB:

  • 批处理数据集成和业务流程:供应链变得越复杂,供应链数据平台就越需要与各种数据源和格式集成。Batch data integration & orchestration: With supply chains getting more complex, supply chain data platforms need to integrate with variety of data sources and formats. Azure Synapse 内置了与 Azure 数据工厂相同的数据集成引擎和体验。Azure Synapse comes built-in with the same data integration engine and experiences as Azure Data Factory. 此集成使数据工程师无需单独的业务流程引擎即可创建丰富的数据管道:This integration allows data engineers to create rich data pipelines without a separate orchestration engine:

  • 流式处理数据集成和处理: 随着行业 IoT(“楼层到商店”中的传感器跟踪资产、已连接的物流群等)的增加,将以流式处理的方式呈爆炸式生成实时数据,该数据需要与传统的缓慢移动的数据集成以生成见解。Streaming data integration & processing: With the growth of Industrial IoT (sensors tracking assets from 'floor-to-store', connected logistics fleets, etc.), there is an explosion of real-time data being generated in a streaming fashion that needs to be integrated with traditional slow moving data for generating insights. Azure 流分析是一种建议流式处理 ETL 以及使用各种方案在 Azure 上处理的服务。Azure Stream Analytics is a recommended service for streaming ETL and processing on Azure with a wide range of scenarios. Azure 流分析支持将 Azure Cosmos DB 作为本机数据接收器Azure Stream Analytics supports Azure Cosmos DB as a native data sink.

实时个性化设置Real-time personalization

当前,零售商必须构建安全、可缩放且满足客户和业务需求的电子商务解决方案。Retailers today must build secure and scalable e-commerce solutions that meet the demands of both customers and business. 这些电子商务解决方案需要通过自定义的产品和套餐吸引客户,快速且安全地处理事务,将重点放在履约和客户服务上。These e-commerce solutions need to engage customers through customized products and offers, process transactions quickly and securely, and focus on fulfillment and customer service. 通过 Azure Cosmos DB 和最新的 Synapse Link for Azure Cosmos DB,零售商可以实时为客户生成个性化建议。Azure Cosmos DB along with the latest Synapse Link for Azure Cosmos DB allows retailers to generate personalized recommendations for customers in real time. 它们使用低延迟和可优化的一致性设置快速获取见解,如以下体系结构所示:They use low-latency and tunable consistency settings for immediate insights as shown in the following architecture:

实时个性化设置中的 Azure Synapse Link for Azure Cosmos DB

Synapse Link for Azure Cosmos DB 用例:Synapse Link for Azure Cosmos DB use case:

  • 准备和训练预测管道:你可以使用 Synapse Spark 和机器学习模型跨业务部门或客户细分生成有关操作数据的见解。Prepare & train predictive pipeline: You can generate insights over the operational data across your business units or customer segments using Synapse Spark and machine learning models. 这将转换为个性化交付,面向客户细分、预测的最终用户体验和有针对性的营销,以满足最终用户的要求。This translates to personalized delivery to target customer segments, predictive end-user experiences and targeted marketing to fit your end-user requirements.

IOT 预测性维护IOT predictive maintenance

行业 IOT 创新大大降低了机器停机时间,并提高了行业所有领域的整体效率。Industrial IOT innovations have drastically reduced downtimes of machinery and increased overall efficiency across all fields of industry. 此类创新之一是云边缘的机器的预测性维护分析。One of such innovations is predictive maintenance analytics for machinery at the edge of the cloud.

下面是利用 IoT 预测性维护中的 Azure Synapse Link for Azure Cosmos DB 的云原生 HTAP 功能的体系结构:The following is an architecture leveraging the cloud native HTAP capabilities of Azure Synapse Link for Azure Cosmos DB in IoT predictive maintenance:

IOT 预测性维护中的 Azure Synapse Link for Azure Cosmos DB

Synapse Link for Azure Cosmos DB 用例:Synapse Link for Azure Cosmos DB use cases:

  • 准备和训练预测管道:来自 IoT 设备传感器的历史操作数据可用于训练异常检测器等预测模型。Prepare & train predictive pipeline: The historical operational data from IoT device sensors could be used to train predictive models such as anomaly detectors. 然后,会将这些异常检测器部署回边缘以进行实时监视。These anomaly detectors are then deployed back to the edge for real-time monitoring. 此类良性循环允许持续重新训练预测模型。Such a virtuous loop allows for continuous retraining of the predictive models.

  • 操作报告:随着数字孪生计划的增加,公司从大量传感器收集大量操作数据,以生成每台计算机的数字副本。Operational reporting: With the growth of digital twin initiatives, companies are collecting vast amounts of operational data from large number of sensors to build a digital copy of each machine. 此数据支持 BI 需求,以了解历史数据的趋势以及使用最新热数据的实时应用程序。This data powers BI needs to understand trends over historical data in addition to real-time applications over recent hot data.

示例方案:适用于 Azure Cosmos DB 的 HTAPSample scenario: HTAP for Azure Cosmos DB

近十年,大量客户都在使用 Azure Cosmos DB,将其用于需要弹性缩放、统包多区域分发、多主数据库复制的关键应用程序,以实现事务工作负荷中读写操作的低延迟和高可用性。For nearly a decade, Azure Cosmos DB has been used by thousands of customers for mission critical applications that require elastic scale, turnkey multiple-region distribution, multi-master replication for low latency and high availability of both reads & writes in their transactional workloads.

以下列表概述了操作数据支持的使用 Azure Cosmos DB 的各种工作负荷模式:The following list shows an overview of the various workload patterns that are supported with operational data using Azure Cosmos DB:

  • 实时应用和服务Real-time apps & services
  • 事件流处理Event stream processing
  • BI 仪表板BI dashboards
  • 大数据分析Big data analytics
  • 机器学习Machine learning

借助 Azure Synapse Link,Azure Cosmos DB 对历史操作数据不仅可以执行事务工作负荷,还可以执行准实时分析工作负荷。Azure Synapse Link enables Azure Cosmos DB to not just power transactional workloads but also perform near real-time analytical workloads over historical operational data. 它在没有 ETL 要求的情况下发生,并保证了事务工作负荷的性能隔离。It happens with no ETL requirements and guaranteed performance isolation from the transactional workloads.

下图显示了使用 Azure Cosmos DB 的工作负荷模式:Azure Synapse Link for Azure Cosmos DB 工作负荷模式

让我们以包含多区域业务的电子商务公司 CompanyXYZ 为例,说明选择 Azure Cosmos DB 作为同时满足库存管理平台的事务和分析要求的单一实时数据库的好处。Let us take the example of an e-commerce company CompanyXYZ with multiple-regional operations to illustrate the benefits of choosing Azure Cosmos DB as the single real-time database powering both transactional and analytical requirements of an inventory management platform.

  • CompanyXYZ 的核心业务依赖于库存管理系统 - 因此可用性和可靠性是核心支柱要求。CompanyXYZ's core business depends on the inventory management system - hence availability & reliability are core pillar requirements. 使用 Azure Cosmos DB 的好处:Benefits of using Azure Cosmos DB:

    • 凭借与 Azure 基础结构和透明多主数据库多区域复制的深度集成,Azure Cosmos DB 可针对区域性服务中断提供行业领先的 99.999% 的高可用性By virtue of deep integration with Azure infrastructure and transparent multi-master multiple-region replication, Azure Cosmos DB provides industry-leading 99.999% high availability against regional outages.
  • CompanyXYZ 的供应链合作伙伴可能位于不同的地理位置,但他们可能需要查看中国各地的产品库存的单一视图,以支持其本地操作。CompanyXYZ's supply chain partners may be in separate geographic locations but they may have to see a single view of the product inventory across China to support their local operations. 这包括需要能够实时读取其他供应链合作伙伴所做的更新。This includes the need to be able to read updates made by other supply chain partners in real time. 还能够进行更新,而无需担心与其他合作伙伴在高吞吐量方面发生冲突。As well as being able to make updates without worrying about conflicts with other partners at high throughput. 使用 Azure Cosmos DB 的好处:Benefits of using Azure Cosmos DB:

    • 凭借其独特的多主数据库复制协议和免闩锁且优化了写入的事务存储,Azure Cosmos DB 可保证多区域 99% 的情况下已编入索引的读取和写入延迟均低于 10 毫秒。With its unique multi-master replication protocol and latch-free, write-optimized transactional store, Azure Cosmos DB guarantees less than 10-ms latencies for both indexed reads and writes at the 99th percentile multiple-regionally.

    • 通过事务存储中的实时索引,高吞吐量引入批处理和流式处理数据馈送。High throughput ingestion of both batch & streaming data feeds with real-time indexing in transactional store.

    • Azure Cosmos DB 事务存储提供了另外三个选项,而不会走非常一致性和最终一致性这两种极端,从而实现最接近近业务需求的可用性和性能权衡Azure Cosmos DB transactional store provides three more options than the two extremes of strong and eventual consistency levels to achieve the availability vs performance tradeoffs closest to the business need.

  • CompanyXYZ 的供应链合作伙伴具有较大变动的流量模式,范围从数百到数百万个请求,因此库存管理平台需要处理流量中的意外情况。CompanyXYZ's supply chain partners have highly fluctuating traffic patterns ranging from hundreds to millions of requests/s and thus the inventory management platform needs to deal with unexpected burstiness in traffic. 使用 Azure Cosmos DB 的好处:Benefits of using Azure Cosmos DB:

    • Azure Cosmos DB 的事务存储使用水平分区支持存储和吞吐量的弹性可伸缩性。Azure Cosmos DB's transactional store supports elastic scalability of storage and throughput using horizontal partitioning. 在 Autopilot 模式下配置的容器和数据库将会根据应用程序的需求自动即时缩放预配的吞吐量,而不影响多区域工作负荷的可用性、延迟、吞吐量或性能。Containers and databases configured in Autopilot mode can automatically and instantly scale the provisioned throughput based on the application needs without impacting the availability, latency, throughput, or performance of the workload multiple-regionally.
  • CompanyXYZ 需要建立一个安全分析平台来容纳系统范围内的历史库存数据,以跨供应链合作伙伴、业务部门和功能实现分析和见解。CompanyXYZ needs to establish a secure analytics platform to house system-wide historical inventory data to enable analytics and insights across supply chain partner, business units and functions. 分析平台需要跨系统、传统的 BI/报告用例、高级分析用例和针对操作库存数据的预测智能解决方案实现协作。The analytics platform needs to enable collaboration across the system, traditional BI/reporting use cases, advanced analytics use cases and predictive intelligent solutions over the operational inventory data. 使用 Synapse Link for Azure Cosmos DB 的好处:Benefits of using Synapse Link for Azure Cosmos DB:

    • 通过使用 Azure Cosmos DB 分析存储(完全隔离的列存储),Synapse Link 可以在 Azure Synapse Analytics 中针对大规模多区域分布式操作数据启用无提取-转换-加载 (ETL) 分析。By using Azure Cosmos DB analytical store, a fully isolated column store, Synapse Link enables no Extract-Transform-Load (ETL) analytics in Azure Synapse Analytics against multiple-regionally distributed operational data at scale. 业务分析师、数据工程师和数据科学家现在可以可互操作的方式使用 Synapse Spark 或 Synapse SQL,来运行准实时商业智能、分析和机器学习管道,而不会影响 Azure Cosmos DB 上的事务工作负荷的性能。Business analysts, data engineers and data scientists can now use Synapse Spark or Synapse SQL in an interoperable manner to run near real-time business intelligence, analytics, and machine learning pipelines without impacting the performance of their transactional workloads on Azure Cosmos DB. 有关更多详细信息,请参阅 Azure Cosmos DB 中 Synapse Link 的好处See the benefits of Synapse Link in Azure Cosmos DB for more details.

后续步骤Next steps

若要了解更多信息,请参阅下列文档:To learn more, see the following docs: