Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Azure Data Factory is a cloud-based data integration service that allows you to create workflows for orchestrating and automating data movement and data transformation. Securing Azure Data Factory is crucial to protect sensitive data, ensure compliance, and maintain the integrity of your data workflows.
This article provides guidance on how to best secure your Azure Data Factory deployment.
Network security
Network security is essential for protecting your Azure Data Factory from unauthorized access and potential threats, and protecting your data in movement. Implementing robust network security measures helps to isolate and secure your data integration processes.
- Isolate and segment workloads using Virtual Networks (VNets): Use VNets to create isolated network environments for your data factory and data sources, enabling segmentation of workloads based on risk. VNets help control traffic within the cloud infrastructure. Depending on your source locations, see:
Control traffic flow with Network Security Groups (NSGs): Currently this only applies to SSIS integration runtimes and self-hosted integration runtimes with your virtual network, and isn't available for managed virtual networks. Apply NSGs to control inbound and outbound traffic for virtual machines and subnets within VNets. Use a "deny by default, permit by exception" approach to restrict traffic flow and protect sensitive resources. If you've joined Azure Data Factory to a virtual network, on the NSG that is automatically created by Azure Data Factory, Port 3389 is open to all traffic by default. Lock the port down to make sure that only your administrators have access. To manage your NSGs, see Network security groups.
Secure your self-hosted integration runtime nodes by enabling remote access from intranet with TLS/SSL certificates - Multiple self-hosted integration runtime nodes can be deployed to balance load and provide high availability, and enabling remote access from intranet with TLS/SSL certificates ensures secure communication between integration runtime nodes.
Secure service access using Private Links: Securely connect to Azure Data Factory from your self-hosted integration runtime and your Azure platform resources, preventing exposure to the public internet. This enhances data privacy and reduces attack vectors. By using Azure Private Link, you can connect to various platforms as a service (PaaS) deployments in Azure via a private endpoint. See Azure Private Link for Data Factory.
Identity management
Identity management ensures that only authorized users and services can access your Azure Data Factory. Implementing strong identity management practices helps to prevent unauthorized access and protect sensitive data.
Apply least privilege principles: Use Azure Data Factory's role-based access control (RBAC) to assign the minimum necessary permissions to users and services, ensuring that they only have access to what is needed to perform their duties. Regularly review and adjust roles to align with the principle of least privilege. See Roles and permissions in Azure Data Factory.
Use managed identities for secure access without credentials: Use managed identities in Azure to securely authenticate Azure Data Factory with Azure services, without the need to manage credentials. This provides a secure and simplified way to access resources like Azure Key Vault or Azure SQL Database. See Managed Identities for Azure Data Factory.
Data protection
Implementing robust data protection measures helps to safeguard sensitive information and comply with regulatory requirements. Azure Data Factory doesn't store data itself, so implementing network security and identity management is essential to protect the data in transit. However, there are some tools and practices you can use to further protect your data in process.
Encrypt data at rest and in transit: Azure Data Factory encrypts data at rest, including entity definitions and any data cached while runs are in progress. By default, data is encrypted with a randomly generated Azure-managed key that is uniquely assigned to your data factory. For extra security guarantees, you can now enable Bring Your Own Key (BYOK) with customer-managed keys feature in Azure Data Factory. See Encrypt Azure Data Factory with customer-managed keys
Restrict the exposure of credentials and secrets: Use Azure Key Vault to securely store and manage sensitive information such as connection strings, secrets, and certificates. Integrate Azure Data Factory with Azure Key Vault to retrieve secrets at runtime, ensuring that sensitive data isn't hard-coded in pipelines or datasets. See Azure Key Vault integration with Data Factory.
Use Azure Policy to enforce data protection standards: Apply Azure Policy to enforce data protection standards across your Azure Data Factory deployment. This helps to ensure compliance with organizational and regulatory requirements. See Azure Policy built-in definitions for Data Factory.
Related content
- For scenario-based security considerations, see Security considerations for Azure Data Factory.