Azure 数据工厂中数据移动的安全注意事项Security considerations for data movement in Azure Data Factory

本文介绍 Azure 数据工厂中数据移动服务用于帮助保护数据的基本安全基础结构。This article describes basic security infrastructure that data movement services in Azure Data Factory use to help secure your data. 数据工厂管理资源建立在 Azure 安全基础结构上,并使用 Azure 提供的所有可能的安全措施。Data Factory management resources are built on Azure security infrastructure and use all possible security measures offered by Azure.

在数据工厂解决方案中,可以创建一个或多个数据管道In a Data Factory solution, you create one or more data pipelines. “管道”是共同执行一项任务的活动的逻辑分组。A pipeline is a logical grouping of activities that together perform a task. 这些管道位于创建数据工厂的区域。These pipelines reside in the region where the data factory was created.

尽管数据工厂仅在少数区域中可用,但数据移动服务在全局范围内可用,以确保数据符合性、高效和降低网络出口成本。Even though Data Factory is only available in few regions, the data movement service is available globally to ensure data compliance, efficiency, and reduced network egress costs.

除使用证书加密的云数据存储的链接服务凭据外,Azure 数据工厂不存储任何其他数据。Azure Data Factory does not store any data except for linked service credentials for cloud data stores, which are encrypted by using certificates. 使用数据工厂可以创建数据驱动的工作流,协调受支持数据存储之间的数据移动,以及使用计算服务在其他区域或本地环境中处理数据。With Data Factory, you create data-driven workflows to orchestrate movement of data between supported data stores, and processing of data by using compute services in other regions or in an on-premises environment. 还可以使用 SDK 与 Azure Monitor 来监视和管理工作流。You can also monitor and manage workflows by using SDKs and Azure Monitor.

数据工厂已获得以下认证:Data Factory has been certified for:

CSA STAR 认证CSA STAR Certification
ISO 20000-1:2011ISO 20000-1:2011
ISO 22301:2012ISO 22301:2012
ISO 27001:2013ISO 27001:2013
ISO 27017:2015ISO 27017:2015
ISO 27018:2014ISO 27018:2014
ISO 9001:2015ISO 9001:2015
SOC 1, 2, 3SOC 1, 2, 3
HIPAA BAAHIPAA BAA

如果对 Azure 合规性以及 Azure 如何保护其专属基础结构感兴趣,请访问 Azure 信任中心If you're interested in Azure compliance and how Azure secures its own infrastructure, visit the Azure Trust Center. 有关所有 Azure 合规性产品检查的最新列表 - https://aka.ms/AzureComplianceFor the latest list of all Azure Compliance offerings check - https://aka.ms/AzureCompliance.

在本文中,我们将查看以下两个数据移动方案中的安全注意事项:In this article, we review security considerations in the following two data movement scenarios:

  • 云场景:在此场景中,源和目标都可通过 Internet 公开访问。Cloud scenario: In this scenario, both your source and your destination are publicly accessible through the internet. 其中包括托管的云存储服务(如 Azure 存储、Azure SQL 数据仓库、Azure SQL 数据库、Amazon S3 和 Amazon Redshift)、SaaS 服务(如 Salesforce)以及 Web 协议(如 FTP 和 OData)。These include managed cloud storage services such as Azure Storage, Azure SQL Data Warehouse, Azure SQL Database, Amazon S3, Amazon Redshift, SaaS services such as Salesforce, and web protocols such as FTP and OData. 可以在支持的数据存储和格式中找到受支持数据源的完整列表。Find a complete list of supported data sources in Supported data stores and formats.
  • 混合场景:在此场景中,源或目标位于防火墙之后或本地公司网络中。Hybrid scenario: In this scenario, either your source or your destination is behind a firewall or inside an on-premises corporate network. 或者,数据存储位于专用网络或虚拟网络(通常是源)中,且不可公开访问。Or, the data store is in a private network or virtual network (most often the source) and is not publicly accessible. 虚拟机上托管的数据库服务器也属于这种情况。Database servers hosted on virtual machines also fall under this scenario.

Note

本文进行了更新,以便使用新的 Azure PowerShell Az 模块。This article has been updated to use the new Azure PowerShell Az module. 你仍然可以使用 AzureRM 模块,至少在 2020 年 12 月之前,它将继续接收 bug 修补程序。You can still use the AzureRM module, which will continue to receive bug fixes until at least December 2020. 若要详细了解新的 Az 模块和 AzureRM 兼容性,请参阅新 Azure Powershell Az 模块简介To learn more about the new Az module and AzureRM compatibility, see Introducing the new Azure PowerShell Az module. 有关 Az 模块安装说明,请参阅安装 Azure PowerShellFor Az module installation instructions, see Install Azure PowerShell.

云方案Cloud scenarios

保护数据存储凭据Securing data store credentials

  • 在 Azure 数据工厂托管存储中存储加密的凭据Store encrypted credentials in an Azure Data Factory managed store. 数据工厂使用由 Azure 管理的证书对数据存储凭据加密,从而帮助为这些凭据提供保护。Data Factory helps protect your data store credentials by encrypting them with certificates managed by Azure. 这些证书每两年轮换一次(包括证书续订和凭据迁移)。These certificates are rotated every two years (which includes certificate renewal and the migration of credentials). 有关 Azure 存储安全的详细信息,请参阅 Azure 存储安全概述For more information about Azure Storage security, see Azure Storage security overview.
  • 在 Azure Key Vault 中存储凭据Store credentials in Azure Key Vault. 还可以将数据存储的凭据存储在 Azure Key Vault 中。You can also store the data store's credential in Azure Key Vault. 数据工厂在执行某个活动期间会检索该凭据。Data Factory retrieves the credential during the execution of an activity. 有关详细信息,请参阅在 Azure Key Vault 中存储凭据For more information, see Store credential in Azure Key Vault.

传输中的数据加密Data encryption in transit

如果云数据存储支持 HTTPS 或 TLS,则数据工厂中数据移动服务与云数据存储之间的所有数据传输均通过安全通道 HTTPS 或 TLS 进行。If the cloud data store supports HTTPS or TLS, all data transfers between data movement services in Data Factory and a cloud data store are via secure channel HTTPS or TLS .

Note

在与数据库相互传输数据时,与 Azure SQL 数据库和 Azure SQL 数据仓库的所有连接需要经过加密 (SSL/TLS)。All connections to Azure SQL Database and Azure SQL Data Warehouse require encryption (SSL/TLS) while data is in transit to and from the database. 在使用 JSON 创作管道时,请在连接字符串中添加 encryption 属性并将其设置为 trueWhen you're authoring a pipeline by using JSON, add the encryption property and set it to true in the connection string. 对于 Azure 存储,可以在连接字符串中使用 HTTPSFor Azure Storage, you can use HTTPS in the connection string.

Note

若要在从 Oracle 移动数据时启用传输加密,请遵循以下选项之一:To enable encryption in transit while moving data from Oracle follow one of the below options:

  1. 在 Oracle 服务器中,转到“Oracle 高级安全性(OAS)”并配置加密设置,该设置支持三重 DES 加密 (3DES) 和高级加密标准 (AES),请参阅此处了解详细信息。In Oracle server, go to Oracle Advanced Security (OAS) and configure the encryption settings, which supports Triple-DES Encryption (3DES) and Advanced Encryption Standard (AES), refer here for details. ADF 会自动协商加密方法,以便在与 Oracle 建立连接时使用在 OAS 中配置的加密方法。ADF automatically negotiates the encryption method to use the one you configure in OAS when establishing connection to Oracle.
  2. 在 ADF 中,可以在连接字符串中添加 EncryptionMethod=1(在链接服务中)。In ADF, you can add EncryptionMethod=1 in the connection string (in the Linked Service). 这将使用 SSL/TLS 作为加密方法。This will use SSL/TLS as the encryption method. 若要使用此功能,需要在 Oracle 服务器端的 OAS 中禁用非 SSL 加密设置,以避免加密冲突。To use this, you need to disable non-SSL encryption settings in OAS on the Oracle server side to avoid encryption conflict.

Note

使用的 TLS 版本为 1.2。TLS version used is 1.2.

静态数据加密Data encryption at rest

某些数据存储支持静态数据加密。Some data stores support encryption of data at rest. 我们建议为这些数据存储启用数据加密机制。We recommend that you enable the data encryption mechanism for those data stores.

Azure SQL 数据仓库Azure SQL Data Warehouse

Azure SQL 数据仓库中的透明数据加密 (TDE) 可帮助对静态数据进行实时加密和解密,避免恶意活动造成的威胁。Transparent Data Encryption (TDE) in Azure SQL Data Warehouse helps protect against the threat of malicious activity by performing real-time encryption and decryption of your data at rest. 此行为对客户端透明。This behavior is transparent to the client. 有关详细信息,请参阅保护 SQL 数据仓库中的数据库For more information, see Secure a database in SQL Data Warehouse.

Azure SQL 数据库Azure SQL Database

Azure SQL 数据库还支持透明数据加密 (TDE),它无需更改应用程序,即可对数据执行实时加密和解密,从而帮助防止恶意活动的威胁。Azure SQL Database also supports transparent data encryption (TDE), which helps protect against the threat of malicious activity by performing real-time encryption and decryption of the data, without requiring changes to the application. 此行为对客户端透明。This behavior is transparent to the client. 有关详细信息,请参阅 SQL 数据库和数据仓库的透明数据加密For more information, see Transparent data encryption for SQL Database and Data Warehouse.

Azure Blob 存储和 Azure 表存储Azure Blob storage and Azure Table storage

Azure Blob 存储和 Azure 表存储支持存储服务加密 (SSE),它会在将数据保存到存储中前进行自动加密,在检索前进行自动解密。Azure Blob storage and Azure Table storage support Storage Service Encryption (SSE), which automatically encrypts your data before persisting to storage and decrypts before retrieval. 有关详细信息,请参阅静态数据的 Azure 存储服务加密For more information, see Azure Storage Service Encryption for Data at Rest.

Amazon S3Amazon S3

Amazon S3 支持静态数据的客户端和服务器加密。Amazon S3 supports both client and server encryption of data at rest. 有关详细信息,请参阅使用加密保护数据For more information, see Protecting Data Using Encryption.

Amazon RedshiftAmazon Redshift

Amazon Redshift 支持静态数据的群集加密。Amazon Redshift supports cluster encryption for data at rest. 有关详细信息,请参阅 Amazon Redshift 数据库加密For more information, see Amazon Redshift Database Encryption.

SalesforceSalesforce

Salesforce 支持防火墙平台加密,它允许加密所有文件、附件和自定义字段。Salesforce supports Shield Platform Encryption that allows encryption of all files, attachments, and custom fields. 有关详细信息,请参阅 Understanding the Web Server OAuth Authentication Flow(了解 Web 服务器 OAuth 身份验证流)。For more information, see Understanding the Web Server OAuth Authentication Flow.

混合场景Hybrid scenarios

混合场景需要在本地网络、虚拟网络 (Azure) 或虚拟私有云 (Amazon) 中安装自承载集成运行时。Hybrid scenarios require self-hosted integration runtime to be installed in an on-premises network, inside a virtual network (Azure), or inside a virtual private cloud (Amazon). 自承载集成运行时必须能够访问本地数据存储。The self-hosted integration runtime must be able to access the local data stores. 有关自承载集成运行时的详细信息,请参阅如何创建和配置自承载集成运行时For more information about self-hosted integration runtime, see How to create and configure self-hosted integration runtime.

自承载集成运行时通道

使用命令通道可在数据工厂中的数据移动服务与自承载集成运行时之间通信。The command channel allows communication between data movement services in Data Factory and self-hosted integration runtime. 通信包含与活动相关的信息。The communication contains information related to the activity. 数据信道用于在本地数据存储和云数据存储之间传输数据。The data channel is used for transferring data between on-premises data stores and cloud data stores.

本地数据存储凭据On-premises data store credentials

凭据可以存储在数据工厂中,也可以在运行时从 Azure 密钥保管库由数据工厂引用The credentials can be stored within data factory or be referenced by data factory during the runtime from Azure Key Vault. 如果将凭据存储在数据工厂中,则凭据会始终以加密方式存储在自承载集成运行时上。If storing credentials within data factory, it is always stored encrypted on the self-hosted integration runtime.

  • 在本地存储凭据Store credentials locally. 如果直接结合 JSON 中内联的连接字符串和凭据使用 Set-AzDataFactoryV2LinkedService cmdlet,则链接服务将加密并存储在自承载集成运行时中。If you directly use the Set-AzDataFactoryV2LinkedService cmdlet with the connection strings and credentials inline in the JSON, the linked service is encrypted and stored on self-hosted integration runtime. 在这种情况下,凭据将通过 Azure 后端服务(此服务非常安全)传递到自承载集成计算机(在其中最终会对其进行加密和存储)。In this case the credentials flow through azure backend service, which is extremely secure, to the self-hosted integration machine where it is finally encrypted and stored. 自承载集成运行时使用 Windows DPAPI 来加密敏感数据和凭据信息。The self-hosted integration runtime uses Windows DPAPI to encrypt the sensitive data and credential information.

  • 在 Azure Key Vault 中存储凭据Store credentials in Azure Key Vault. 还可以将数据存储的凭据存储在 Azure Key Vault 中。You can also store the data store's credential in Azure Key Vault. 数据工厂在执行某个活动期间会检索该凭据。Data Factory retrieves the credential during the execution of an activity. 有关详细信息,请参阅在 Azure Key Vault 中存储凭据For more information, see Store credential in Azure Key Vault.

  • 在本地存储凭据,而无需通过 Azure 后端将凭据传递到自承载集成运行时Store credentials locally without flowing the credentials through Azure backend to the self-hosted integration runtime. 如果想要在自承载集成运行时本地加密并存储凭据,而不必通过数据工厂后端传递凭据,请按照在 Azure 数据工厂中加密本地数据存储的凭据中的步骤操作。If you want to encrypt and store credentials locally on the self-hosted integration runtime without having to flow the credentials through data factory backend, follow the steps in Encrypt credentials for on-premises data stores in Azure Data Factory. 所有连接器都支持此选项。All connectors support this option. 自承载集成运行时使用 Windows DPAPI 来加密敏感数据和凭据信息。The self-hosted integration runtime uses Windows DPAPI to encrypt the sensitive data and credential information.

    使用 New-AzDataFactoryV2LinkedServiceEncryptedCredential cmdle 可加密链接服务凭据和链接服务中的敏感详细信息。Use the New-AzDataFactoryV2LinkedServiceEncryptedCredential cmdlet to encrypt linked service credentials and sensitive details in the linked service. 然后,可以通过 Set-AzDataFactoryV2LinkedService cmdlet 使用返回的 JSON(结合连接字符串中的 EncryptedCredential 元素)创建链接服务。You can then use the JSON returned (with the EncryptedCredential element in the connection string) to create a linked service by using the Set-AzDataFactoryV2LinkedService cmdlet.

在自承载集成运行时中加密链接服务时使用的端口Ports used when encrypting linked service on self-hosted integration runtime

默认情况下,PowerShell 在装有自承载集成运行时的计算机上使用端口 8060 来确保通信安全。By default, PowerShell uses port 8060 on the machine with self-hosted integration runtime for secure communication. 如有必要,可以更改此端口。If necessary, this port can be changed.

网关的 HTTPS 端口

传输中加密Encryption in transit

所有数据传输都是通过 HTTPS 和 TLS over TCP 安全通道进行的,可防止与 Azure 服务通信期间发生中间人攻击。All data transfers are via secure channel HTTPS and TLS over TCP to prevent man-in-the-middle attacks during communication with Azure services.

还可以使用 IPSec VPNAzure ExpressRoute 进一步保护本地网络和 Azure 之间的通信信道。You can also use IPSec VPN or Azure ExpressRoute to further secure the communication channel between your on-premises network and Azure.

Azure 虚拟网络是网络在云中的逻辑表示形式。Azure Virtual Network is a logical representation of your network in the cloud. 可以通过设置 IPSec VPN(站点到站点)或 ExpressRoute(专用对等互连)将本地网络连接到虚拟网络。You can connect an on-premises network to your virtual network by setting up IPSec VPN (site-to-site) or ExpressRoute (private peering).

下表根据混合数据移动的源和目标位置的不同组合,汇总了有关网络和自承载集成运行时的配置建议。The following table summarizes the network and self-hosted integration runtime configuration recommendations based on different combinations of source and destination locations for hybrid data movement.

SourceSource 目标Destination 网络配置Network configuration 集成运行时安装Integration runtime setup
本地On-premises 虚拟网络中部署的虚拟机和云服务Virtual machines and cloud services deployed in virtual networks IPSec VPN(点到站点或站点到站点)IPSec VPN (point-to-site or site-to-site) 自承载集成运行时应安装在虚拟网络中的 Azure 虚拟机上。The self-hosted integration runtime should be installed on an Azure virtual machine in the virtual network.
本地On-premises 虚拟网络中部署的虚拟机和云服务Virtual machines and cloud services deployed in virtual networks ExpressRoute(专用对等互连)ExpressRoute (private peering) 自承载集成运行时应安装在虚拟网络中的 Azure 虚拟机上。The self-hosted integration runtime should be installed on an Azure virtual machine in the virtual network.
本地On-premises 具有公共终结点的基于 Azure 的服务Azure-based services that have a public endpoint ExpressRoute(Azure 对等互连)ExpressRoute (Azure peering) 自承载集成运行时可以在本地安装,也可以安装在 Azure 虚拟机上。The self-hosted integration runtime can be installed on-premises or on an Azure virtual machine.

下图显示了如何使用自承载集成运行时通过 ExpressRoute 和 IPSec VPN(具有 Azure 虚拟网络)在本地数据库和 Azure 服务之间移动数据:The following images show the use of self-hosted integration runtime for moving data between an on-premises database and Azure services by using ExpressRoute and IPSec VPN (with Azure Virtual Network):

ExpressRouteExpressRoute

结合使用 ExpressRoute 和网关

IPSec VPNIPSec VPN

将 IPSec VPN 与网关配合使用

防火墙配置和针对 IP 地址设置的允许列表Firewall configurations and allow list setting up for IP addresses

本地/专用网络的防火墙要求Firewall requirements for on-premises/private network

在企业中,企业防火墙在组织的中央路由器上运行。In an enterprise, a corporate firewall runs on the central router of the organization. Windows 防火墙在安装自承载集成运行时的本地计算机上作为守护程序运行。Windows Firewall runs as a daemon on the local machine in which the self-hosted integration runtime is installed.

下表提供了企业防火墙的出站端口和域要求:The following table provides outbound port and domain requirements for corporate firewalls:

域名Domain names 出站端口Outbound ports 说明Description
*.servicebus.chinacloudapi.cn 443443 自承载集成运行时连接到数据工厂中的数据移动服务时需要此端口。Required by the self-hosted integration runtime to connect to data movement services in Data Factory.
*.frontend.datamovement.azure.cn 443443 自承载集成运行时连接到数据工厂服务时需要此端口。Required by the self-hosted integration runtime to connect to the Data Factory service.
download.microsoft.com 443443 自承载集成运行时下载更新时需要此端口。Required by the self-hosted integration runtime for downloading the updates. 如果已禁用自动更新,则可以跳过此设置。If you have disabled auto-update then you may skip this.
*.core.chinacloudapi.cn 443443 使用分阶段复制功能时,由自承载集成运行时用来连接到 Azure 存储帐户。Used by the self-hosted integration runtime to connect to the Azure storage account when you use the staged copy feature.
*.database.chinacloudapi.cn 14331433 (可选)从/向 Azure SQL 数据库或 Azure SQL 数据仓库复制时需要。(Optional) Required when you copy from or to Azure SQL Database or Azure SQL Data Warehouse. 在不打开端口 1433 的情况下,使用暂存复制功能将数据复制到 Azure SQL 数据库或 Azure SQL 数据仓库。Use the staged copy feature to copy data to Azure SQL Database or Azure SQL Data Warehouse without opening port 1433.

Note

可能需要按相应数据源的要求在企业防火墙级别为域管理端口或设置允许列表。You might have to manage ports or set up allow list for domains at the corporate firewall level as required by the respective data sources. 此表仅以 Azure SQL 数据库和 Azure SQL 数据仓库为例。This table only uses Azure SQL Database, and Azure SQL Data Warehouse as examples.

下表提供了 Windows 防火墙的入站端口要求:The following table provides inbound port requirements for Windows Firewall:

入站端口Inbound ports 说明Description
8060 (TCP)8060 (TCP) PowerShell 加密 cmdlet(参阅在 Azure 数据工厂中加密本地数据存储的凭据)和凭据管理器应用程序需要使用此端口在自承载集成运行时中安全设置本地数据存储的凭据。Required by the PowerShell encryption cmdlet as described in Encrypt credentials for on-premises data stores in Azure Data Factory, and by the credential manager application to securely set credentials for on-premises data stores on the self-hosted integration runtime.

网关端口要求

IP 配置和数据存储中设置的允许列表IP configurations and allow list setting up in data stores

云中的一些数据存储还要求你允许访问存储的计算机的 IP 地址。Some data stores in the cloud also require that you allow the IP address of the machine accessing the store. 确保已在防火墙中相应地允许或配置自承载集成运行时计算机的 IP 地址。Ensure that the IP address of the self-hosted integration runtime machine is allowed or configured in the firewall appropriately.

以下云数据存储要求允许自承载集成运行时计算机的 IP 地址。The following cloud data stores require that you allow the IP address of the self-hosted integration runtime machine. 默认情况下,其中一些数据存储可能不需要允许列表。Some of these data stores, by default, might not require allow list.

常见问题Frequently asked questions

是否可在不同的数据工厂之间共享自承载集成运行时?Can the self-hosted integration runtime be shared across different data factories?

是的。Yes. 此处提供了更多详细信息。More details here.

需要满足哪些端口要求才能让自承载集成运行时正常工作?What are the port requirements for the self-hosted integration runtime to work?

自承载集成运行时与访问 Internet 建立基于 HTTP 的连接。The self-hosted integration runtime makes HTTP-based connections to access the internet. 必须打开出站端口 443,才能让自承载集成运行时建立此连接。The outbound ports 443 must be opened for the self-hosted integration runtime to make this connection. 仅在计算机级别(不是企业防火墙级别)为凭据管理器应用程序打开入站端口 8060。Open inbound port 8060 only at the machine level (not the corporate firewall level) for credential manager application. 如果使用 Azure SQL 数据库或 Azure SQL 数据仓库作为源或目标,则还需要打开端口 1433。If Azure SQL Database or Azure SQL Data Warehouse is used as the source or the destination, you need to open port 1433 as well. 有关详细信息,请参阅防火墙配置和针对 IP 地址设置的允许列表部分。For more information, see the Firewall configurations and allow list setting up for IP addresses section.

后续步骤Next steps

有关 Azure 数据工厂复制活动性能的信息,请参阅复制活动性能和优化指南For information about Azure Data Factory Copy Activity performance, see Copy Activity performance and tuning guide.