Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In this article, you create the infrastructure required to run Apache Airflow on Azure Kubernetes Service (AKS).
- If you haven't already, review the Overview for deploying an Apache Airflow cluster on Azure Kubernetes Service (AKS).
- An Azure subscription. If you don't have one, create a Trial.
- Azure CLI version 2.61.0. To install or upgrade, see Install Azure CLI.
- Helm version 3 or later. To install, see Installing Helm.
kubectl
.- GitHub Repo to store Airflow Dags.
- Docker installed on your local machine. To install, see Get Docker.
Set the required environment variables for use throughout this guide:
random=$(echo $RANDOM | tr '[0-9]' '[a-z]') export MY_LOCATION=canadacentral export MY_RESOURCE_GROUP_NAME=apache-airflow-rg export MY_IDENTITY_NAME=airflow-identity-123 export MY_ACR_REGISTRY=mydnsrandomname$(echo $random) export MY_KEYVAULT_NAME=airflow-vault-$(echo $random)-kv export MY_CLUSTER_NAME=apache-airflow-aks export SERVICE_ACCOUNT_NAME=airflow export SERVICE_ACCOUNT_NAMESPACE=airflow export AKS_AIRFLOW_NAMESPACE=airflow export AKS_AIRFLOW_CLUSTER_NAME=cluster-aks-airflow export AKS_AIRFLOW_LOGS_STORAGE_ACCOUNT_NAME=airflowsasa$(echo $random) export AKS_AIRFLOW_LOGS_STORAGE_CONTAINER_NAME=airflow-logs export AKS_AIRFLOW_LOGS_STORAGE_SECRET_NAME=storage-account-credentials
Create a resource group using the
az group create
command.az group create --name $MY_RESOURCE_GROUP_NAME --location $MY_LOCATION --output table
Example output:
Location Name ------------- ----------------- $MY_LOCATION $MY_RESOURCE_GROUP_NAME
In this step, we create a user-assigned managed identity that the External Secrets Operator uses to access the Airflow passwords stored in Azure Key Vault.
Create a user-assigned managed identity using the
az identity create
command.az identity create --name $MY_IDENTITY_NAME --resource-group $MY_RESOURCE_GROUP_NAME --output table export MY_IDENTITY_NAME_ID=$(az identity show --name $MY_IDENTITY_NAME --resource-group $MY_RESOURCE_GROUP_NAME --query id --output tsv) export MY_IDENTITY_NAME_PRINCIPAL_ID=$(az identity show --name $MY_IDENTITY_NAME --resource-group $MY_RESOURCE_GROUP_NAME --query principalId --output tsv) export MY_IDENTITY_NAME_CLIENT_ID=$(az identity show --name $MY_IDENTITY_NAME --resource-group $MY_RESOURCE_GROUP_NAME --query clientId --output tsv)
Example output:
ClientId Location Name PrincipalId ResourceGroup TenantId ------------------------------------ ------------- -------------------- ------------------------------------ ----------------------- ------------------------------------ 00001111-aaaa-2222-bbbb-3333cccc4444 $MY_LOCATION $MY_IDENTITY_NAME aaaaaaaa-bbbb-cccc-1111-222222222222 $MY_RESOURCE_GROUP_NAME aaaabbbb-0000-cccc-1111-dddd2222eeee
Create an Azure Key Vault instance using the
az keyvault create
command.az keyvault create --name $MY_KEYVAULT_NAME --resource-group $MY_RESOURCE_GROUP_NAME --location $MY_LOCATION --enable-rbac-authorization false --output table export KEYVAULTID=$(az keyvault show --name $MY_KEYVAULT_NAME --query "id" --output tsv) export KEYVAULTURL=$(az keyvault show --name $MY_KEYVAULT_NAME --query "properties.vaultUri" --output tsv)
Example output:
Location Name ResourceGroup ------------- -------------------- ---------------------- $MY_LOCATION $MY_KEYVAULT_NAME $MY_RESOURCE_GROUP_NAME
Create an Azure Container Registry to store and manage your container images using the
az acr create
command.az acr create \ --name ${MY_ACR_REGISTRY} \ --resource-group $MY_RESOURCE_GROUP_NAME \ --sku Premium \ --location $MY_LOCATION \ --admin-enabled true \ --output table export MY_ACR_REGISTRY_ID=$(az acr show --name $MY_ACR_REGISTRY --resource-group $MY_RESOURCE_GROUP_NAME --query id --output tsv)
Example output:
NAME RESOURCE GROUP LOCATION SKU LOGIN SERVER CREATION DATE ADMIN ENABLED -------------------- ---------------------- ------------- ------- ------------------------------- -------------------- --------------- mydnsrandomnamebfbje $MY_RESOURCE_GROUP_NAME $MY_LOCATION Premium mydnsrandomnamebfbje.azurecr.cn 2024-11-07T00:32:48Z True
Create an Azure Storage Account to store the Airflow logs using the
az acr create
command.az storage account create --name $AKS_AIRFLOW_LOGS_STORAGE_ACCOUNT_NAME --resource-group $MY_RESOURCE_GROUP_NAME --location $MY_LOCATION --sku Standard_ZRS --output table export AKS_AIRFLOW_LOGS_STORAGE_ACCOUNT_KEY=$(az storage account keys list --account-name $AKS_AIRFLOW_LOGS_STORAGE_ACCOUNT_NAME --query "[0].value" -o tsv) az storage container create --name $AKS_AIRFLOW_LOGS_STORAGE_CONTAINER_NAME --account-name $AKS_AIRFLOW_LOGS_STORAGE_ACCOUNT_NAME --output table --account-key $AKS_AIRFLOW_LOGS_STORAGE_ACCOUNT_KEY az keyvault secret set --vault-name $MY_KEYVAULT_NAME --name AKS-AIRFLOW-LOGS-STORAGE-ACCOUNT-NAME --value $AKS_AIRFLOW_LOGS_STORAGE_ACCOUNT_NAME az keyvault secret set --vault-name $MY_KEYVAULT_NAME --name AKS-AIRFLOW-LOGS-STORAGE-ACCOUNT-KEY --value $AKS_AIRFLOW_LOGS_STORAGE_ACCOUNT_KEY
Example output:
AccessTier AllowBlobPublicAccess AllowCrossTenantReplication CreationTime EnableHttpsTrafficOnly Kind Location MinimumTlsVersion Name PrimaryLocation ProvisioningState ResourceGroup StatusOfPrimary ------------ ----------------------- ----------------------------- -------------------------------- ------------------------ --------- ------------- ------------------- ---------------- ----------------- ------------------- ----------------- ----------------- Hot False False 2024-11-07T00:22:13.323104+00:00 True StorageV2 $MY_LOCATION TLS1_0 airflowsasabfbje $MY_LOCATION Succeeded $MY_RESOURCE_GROUP_NAME available Created --------- True
In this step, we create an AKS cluster with workload identity and OIDC issuer enabled. The workload identity gives the External Secrets Operator service account permission to access the Airflow passwords stored in your key vault.
Create an AKS cluster using the
az aks create
command.az aks create \ --location $MY_LOCATION \ --name $MY_CLUSTER_NAME \ --tier standard \ --resource-group $MY_RESOURCE_GROUP_NAME \ --network-plugin azure \ --node-vm-size Standard_DS4_v2 \ --node-count 3 \ --auto-upgrade-channel stable \ --node-os-upgrade-channel NodeImage \ --attach-acr ${MY_ACR_REGISTRY} \ --enable-oidc-issuer \ --enable-blob-driver \ --enable-workload-identity \ --zones 1 2 3 \ --generate-ssh-keys \ --output table
Example output:
AzurePortalFqdn CurrentKubernetesVersion DisableLocalAccounts DnsPrefix EnableRbac Fqdn KubernetesVersion Location MaxAgentPools Name NodeResourceGroup ProvisioningState ResourceGroup ResourceUid SupportPlan ------------------------------------------------------------------------------ -------------------------- ---------------------- ---------------------------------- ------------ ----------------------------------------------------------------------- ------------------- ------------- --------------- ------------------ ----------------------------------------------------- ------------------- ----------------------- ------------------------------------ ------------------ apache-air-apache-airflow-r-363a0a-rhf6saad.portal.hcp.$MY_LOCATION.cx.prod.service.azk8s.cn 1.29.9 False apache-air-apache-airflow-r-363a0a True apache-air-apache-airflow-r-363a0a-rhf6saad.hcp.$MY_LOCATION.cx.prod.service.azk8s.cn 1.29 $MY_LOCATION 100 $MY_CLUSTER_NAME MC_apache-airflow-rg_apache-airflow-aks_$MY_LOCATION Succeeded $MY_RESOURCE_GROUP_NAME b1b1b1b1-cccc-dddd-eeee-f2f2f2f2f2f2 KubernetesOfficial
Get the OIDC issuer URL to use for the workload identity configuration using the
az aks show
command.export OIDC_URL=$(az aks show --resource-group $MY_RESOURCE_GROUP_NAME --name $MY_CLUSTER_NAME --query oidcIssuerProfile.issuerUrl --output tsv)
Assign the
AcrPull
role to the kubelet identity using theaz role assignment create
command.export KUBELET_IDENTITY=$(az aks show -g $MY_RESOURCE_GROUP_NAME --name $MY_CLUSTER_NAME --output tsv --query identityProfile.kubeletidentity.objectId) az role assignment create \ --assignee ${KUBELET_IDENTITY} \ --role "AcrPull" \ --scope ${MY_ACR_REGISTRY_ID} \ --output table
Example output:
CreatedBy CreatedOn Name PrincipalId PrincipalName PrincipalType ResourceGroup RoleDefinitionId RoleDefinitionName Scope UpdatedBy UpdatedOn ------------------------------------ -------------------------------- ------------------------------------ ------------------------------------ ------------------------------------ ---------------- ----------------------- ------------------------------------------------------------------------------------------------------------------------------------------ -------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ -------------------------------- ccccdddd-2222-eeee-3333-ffff4444aaaa 2024-11-07T00:43:26.905445+00:00 b1b1b1b1-cccc-dddd-eeee-f2f2f2f2f2f2 bbbbbbbb-cccc-dddd-2222-333333333333 cccccccc-dddd-eeee-3333-444444444444 ServicePrincipal $MY_RESOURCE_GROUP_NAME /subscriptions/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/providers/Microsoft.Authorization/roleDefinitions/7f951dda-4ed3-4680-a7ca-43fe172d538d AcrPull /subscriptions/aaaa0a0a-bb1b-cc2c-dd3d-eeeeee4e4e4e/resourceGroups/$MY_RESOURCE_GROUP_NAME/providers/Microsoft.ContainerRegistry/registries/mydnsrandomnamebfbje ccccdddd-2222-eeee-3333-ffff4444aaaa 2024-11-07T00:43:26.905445+00:00
Configure
kubectl
to connect to your AKS cluster using theaz aks get-credentials
command.az aks get-credentials --resource-group $MY_RESOURCE_GROUP_NAME --name $MY_CLUSTER_NAME --overwrite-existing --output table
In this section, we download the Apache Airflow images from Docker Hub and upload them to Azure Container Registry. This step ensures that the images are available in your private registry and can be used in your AKS cluster. We don't recommend consuming the public image in a production environment.
Import the Airflow images from Docker Hub and upload them to your container registry using the
az acr import
command.az acr import --name $MY_ACR_REGISTRY --source docker.io/apache/airflow:airflow-pgbouncer-2024.01.19-1.21.0 --image airflow:airflow-pgbouncer-2024.01.19-1.21.0 az acr import --name $MY_ACR_REGISTRY --source docker.io/apache/airflow:airflow-pgbouncer-exporter-2024.06.18-0.17.0 --image airflow:airflow-pgbouncer-exporter-2024.06.18-0.17.0 az acr import --name $MY_ACR_REGISTRY --source docker.io/bitnami/postgresql:16.1.0-debian-11-r15 --image postgresql:16.1.0-debian-11-r15 az acr import --name $MY_ACR_REGISTRY --source quay.io/prometheus/statsd-exporter:v0.26.1 --image statsd-exporter:v0.26.1 az acr import --name $MY_ACR_REGISTRY --source docker.io/apache/airflow:2.9.3 --image airflow:2.9.3 az acr import --name $MY_ACR_REGISTRY --source registry.k8s.io/git-sync/git-sync:v4.1.0 --image git-sync:v4.1.0
Microsoft maintains this article. The following contributors originally wrote it:
- Don High | Principal Customer Engineer
- Satya Chandragiri | Senior Digital Cloud Solution Architect
- Erin Schaffer | Content Developer 2