Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
In this article, you learn how to deploy example scheduler profiles in Azure Kubernetes Service (AKS) to configure advanced scheduling behavior using in-tree scheduling plugins. This guide also explains how to verify the successful application of custom scheduler profiles targeting specific node pools or the entire AKS cluster.
Limitations
- AKS currently doesn't manage the deployment of third-party schedulers or out-of-tree scheduling plugins.
- AKS doesn't support in-tree scheduling plugins targeting the
aks-systemscheduler. This restriction is in place to help prevent unexpected changes to AKS add-ons enabled on your cluster.
Prerequisites
- The Azure CLI version
2.76.0or later. Runaz --versionto find the version, and runaz upgradeto upgrade the version. If you need to install or upgrade, see Install Azure CLI. - Kubernetes version
1.33or later running on your AKS cluster. - The
aks-previewAzure CLI extension version18.0.0b27or later. - Register the
UserDefinedSchedulerConfigurationPreviewfeature flag in your Azure subscription. - Review the supported advanced scheduling concepts and in-tree scheduling plugins on AKS.
Install the aks-preview Azure CLI extension
Important
AKS preview features are available on a self-service, opt-in basis. Previews are provided "as is" and "as available," and they're excluded from the service-level agreements and limited warranty. AKS previews are partially covered by customer support on a best-effort basis. As such, these features aren't meant for production use. For more information, see the following support articles:
Install the
aks-previewextension using theaz extension addcommand.az extension add --name aks-previewUpdate to the latest version of the
aks-previewextension using theaz extension updatecommand.az extension update --name aks-preview
Register the User Defined Scheduler Configuration Preview feature flag
Register the
UserDefinedSchedulerConfigurationPreviewfeature flag using theaz feature registercommand.az feature register --namespace "Microsoft.ContainerService" --name "UserDefinedSchedulerConfigurationPreview"It takes a few minutes for the status to show Registered.
Verify the registration status using the
az feature showcommand.az feature show --namespace "Microsoft.ContainerService" --name "UserDefinedSchedulerConfigurationPreview"When the status reflects Registered, refresh the registration of the Microsoft.ContainerService resource provider using the
az provider registercommand.az provider register --namespace "Microsoft.ContainerService"
Enable scheduler profile configuration on an AKS cluster
You can enable schedule profile configuration on a new or existing AKS cluster.
Create an AKS cluster with scheduler profile configuration enabled using the
az aks createcommand with the--enable-upstream-kubescheduler-user-configurationflag.# Set environment variables export RESOURCE_GROUP=<resource-group-name> export CLUSTER_NAME=<aks-cluster-name> # Create an AKS cluster with schedule profile configuration enabled az aks create \ --resource-group $RESOURCE_GROUP \ --name $CLUSTER_NAME \ --enable-upstream-kubescheduler-user-configuration \ --generate-ssh-keysOnce the creation process completes, connect to the cluster using the
az aks get-credentialscommand.az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME
Verify installation of the scheduler controller
After enabling the feature on your AKS cluster, verify the custom resource definition (CRD) of the scheduler controller was successfully installed using the
kubectl getcommand.kubectl get crd schedulerconfigurations.aks.azure.comNote
This command won't succeed if the feature wasn't successfully enabled in the previous section.
Configure node bin-packing
Node bin-packing is a scheduling strategy that maximizes resource utilization by increasing pod density on nodes, within the set configuration. This strategy helps improve cluster efficiency by minimizing wasted resources and lowering the operational cost of maintaining idle or underutilized nodes.
In this example, the configured scheduler prioritizes scheduling pods on nodes with high CPU usage. Explicitly, this configuration avoids underutilizing nodes that still have free resources and helps to make better use of the resources already allocated to nodes.
Create a file named
aks-scheduler-customization.yamland paste in the following manifest:apiVersion: aks.azure.com/v1alpha1 kind: SchedulerConfiguration metadata: name: upstream spec: profiles: - schedulerName: node-binpacking-scheduler pluginConfig: - name: NodeResourcesFit args: scoringStrategy: type: MostAllocated resources: - name: cpu weight: 1NodeResourcesFitensures that the scheduler checks if a node has enough resources to run the pod.scoringStrategy: MostAllocatedtells the scheduler to prefer nodes with high CPU resource usage. This helps achieve better resource utilization by placing new pods on nodes that are already "highly used".Resourcesspecifies thatCPUis the primary resource being considered for scoring, and with a weight of1, CPU usage is prioritized with a relatively equal level of importance in the scheduling decision.
Apply the scheduling configuration manifest using the
kubectl applycommand.kubectl apply -f aks-scheduler-customization.yamlTo target this scheduling mechanism for specific workloads, update your pod deployments with the following
schedulerName:... ... spec: schedulerName: node-binpacking-scheduler ... ...
Configure pod topology spread
Pod topology spread is a scheduling strategy that seeks to distribute pods evenly across failure domains (such as availability zones or regions) to ensure high availability and fault tolerance in the event of zone or node failures. This strategy helps prevent the risk of all replicas of a pod being placed in the same failure domain. For more configuration guidance, see the [Kubernetes Pod Topology Spread Constraints documentation] (https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/).
Create a file named
aks-scheduler-customization.yamland paste in the following manifest:apiVersion: aks.azure.com/v1alpha1 kind: SchedulerConfiguration metadata: name: upstream spec: rawConfig: | apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration profiles: - schedulerName: pod-distribution-scheduler - pluginConfig: - name: PodTopologySpread args: apiVersion: kubescheduler.config.k8s.io/v1 kind: PodTopologySpreadArgs defaultingType: List defaultConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnywayPodTopologySpreadplugin instructs the scheduler to try and distribute pods as evenly as possible across availability zones.whenUnsatisfiable: ScheduleAnywayspecifies schedule to schedule pods despite the inability to meet the topology constraints. This avoids pod scheduling failures when exact distribution isn't feasible.Listtype applies the default constraints as a list of rules. The scheduler uses the rules in the order they're defined, and they apply to all pods that don’t specify custom topology spread constraints.maxSkew: 1means the number of pods can differ by at most 1 between any two zones.topologyKey: topology.kubernetes.io/zoneindicates that the scheduler should spread pods across availability zones.
Apply the scheduling configuration manifest using the
kubectl applycommand.kubectl apply -f aks-scheduler-customization.yamlTo target this scheduling mechanism for specific workloads, update your pod deployments with the following
schedulerName:... ... spec: schedulerName: pod-distribution-scheduler ... ...
Assign a scheduler profile to an entire AKS cluster
In your scheduler profile configuration, update the
schedulerNamefield as follows:... ... `- schedulerName: default_scheduler` ... ...Reapply the manifest using the
kubectl applycommand.kubectl apply -f aks-scheduler-customization.yamlNow, this configuration will become the default scheduling operation for your entire AKS cluster.
Configure multiple scheduler profiles
You can customize the upstream scheduler with multiple profiles and customize each profile with multiple plugins while using the same configuration file. In the following example, we create two scheduling profiles called scheduler-one and scheduler-two:
scheduler-one prioritizes placing pods across zones and nodes for balanced distribution with the following settings:
- Enforces strict zonal distribution and preferred node distribution using
PodTopologySpread. - Honors hard pod affinity rules and considers the soft affinity rules with
InterPodAffinity. - Prefers nodes in specific zones to reduce cross-zone networking using
NodeAffinity.
- Enforces strict zonal distribution and preferred node distribution using
scheduler-two prioritizes placing pods on nodes with available storage, CPU, and memory resources for timely resource-efficient resource usage with the following settings:
- Ensures pods are placed on nodes where PVCs can bind to PVs using
VolumeBinding. - Validates that nodes and volumes satisfy zonal requirements using
VolumeZoneto avoid cross-zone storage access. - Prioritizes nodes based on CPU, memory, and ephemeral storage utilization, with
NodeResourcesFit. - Favors nodes that already have the required container images using
ImageLocality.
- Ensures pods are placed on nodes where PVCs can bind to PVs using
Note
You might need to adjust zones and other parameters based on your workload type.
Create a file named
aks-scheduler-customization.yamland paste in the following manifest:apiVersion: aks.azure.com/v1alpha1 kind: SchedulerConfiguration metadata: name: upstream spec: rawConfig: | apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration percentageOfNodesToScore: 40 podInitialBackoffSeconds: 1 podMaxBackoffSeconds: 8 profiles: - schedulerName: scheduler-one plugins: multiPoint: enabled: - name: PodTopologySpread - name: InterPodAffinity - name: NodeAffinity pluginConfig: # PodTopologySpread with strict zonal distribution - name: PodTopologySpread args: defaultingType: List defaultConstraints: - maxSkew: 2 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway - name: InterPodAffinity args: hardPodAffinityWeight: 1 ignorePreferredTermsOfExistingPods: false - name: NodeAffinity args: addedAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: topology.kubernetes.io/zone operator: In values: [chinanorth3-1, chinanorth3-2, chinanorth3-3] - schedulerName: scheduler-two plugins: multiPoint: enabled: - name: VolumeBinding - name: VolumeZone - name: NodeAffinity - name: NodeResourcesFit - name: PodTopologySpread - name: ImageLocality pluginConfig: - name: PodTopologySpread args: defaultingType: List defaultConstraints: - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: DoNotSchedule - name: VolumeBinding args: apiVersion: kubescheduler.config.k8s.io/v1 kind: VolumeBindingArgs bindTimeoutSeconds: 300 - name: NodeAffinity args: apiVersion: kubescheduler.config.k8s.io/v1 kind: NodeAffinityArgs addedAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: topology.kubernetes.io/zone operator: In values: [chinanorth3-1, chinanorth3-2] - name: NodeResourcesFit args: apiVersion: kubescheduler.config.k8s.io/v1 kind: NodeResourcesFitArgs scoringStrategy: type: MostAllocated resources: - name: cpu weight: 3 - name: memory weight: 1 - name: ephemeral-storage weight: 2Apply the manifest using the
kubectl applycommand.kubectl apply -f aks-scheduler-customization.yaml
Disable an AKS scheduler profile configuration
To disable the AKS scheduler profile configuration and revert to AKS scheduler default configuration on the cluster, first delete the
schedulerconfigurationresource using thekubectl deletecommand.kubectl delete schedulerconfiguration upstream || trueNote
Ensure that the previous step is complete and confirm that the
schedulerconfigurationresource was deleted before proceeding to disable this feature.Disable the feature using the
az aks updatecommand with the--disable-upstream-kubescheduler-user-configurationflag.az aks update --subscription="${SUBSCRIPTION_ID}" \ --resource-group="${RESOURCE_GROUP}" \ --name="${CLUSTER_NAME}" \ --disable-upstream-kubescheduler-user-configurationVerify the feature is disabled using the
az aks showcommand.az aks show --resource-group="${RESOURCE_GROUP}" \ --name="${CLUSTER_NAME}" \ --query='properties.schedulerProfile'Your output should indicate that the feature is no longer enabled on your AKS cluster.
Frequently asked questions (FAQ)
What happens if I apply misconfigured scheduler profile to my AKS cluster?
Once you apply a scheduler profile, AKS checks if it contains a valid configuration of plugins and arguments. If the configuration targets a disallowed scheduler or sets the in-tree scheduling plugins improperly, AKS rejects the configuration and reverts to the last known "accepted" scheduler configuration. This check aims to limit impact on new and existing AKS clusters due to scheduler misconfiguration.
How can I monitor and validate that the scheduler honored my configuration?
There are three recommended methods for observing the results of your applied scheduler profile:
- View the AKS
kube-schedulercontrol plane logs to ensure that the scheduler received the configuration from the CRD. - Run the
kubectl get schedulerconfigurationcommand. The output displays the status of theconfiguration: pendingduring the rollout andSucceededorFailedafter the configuration is accepted or rejected by the scheduler. - Run the
kubectl describe schedulerconfigurationcommand. The output displays a more detailed state of the scheduler, including any error during the reconciliation, and the current scheduler configuration in effect.
Next steps
To learn more about the AKS scheduler and best practices, see the following resources: