Schedule and deploy batch jobs with Kueue on Azure Kubernetes Service (AKS)

In this article, you learn how to schedule and deploy sample batch jobs on Azure Kubernetes Service (AKS) using Kueue. Also, this guide covers installing Kueue, configuring ResourceFlavors and ClusterQueues for fine-grained resource management, and submitting jobs via LocalQueues. You also learn how to use Kueue to queue up a sample batch job and track the results across Pending, Running, and Finished states.

Important

Open-source software is mentioned throughout AKS documentation and samples. Software that you deploy is excluded from AKS service-level agreements, limited warranty, and Azure support. As you use open-source technology alongside AKS, consult the support options available from the respective communities and project maintainers to develop a plan.

Microsoft takes responsibility for building the open-source packages that we deploy on AKS. That responsibility includes having complete ownership of the build, scan, sign, validate, and hotfix process, along with control over the binaries in container images. For more information, see Vulnerability management for AKS and AKS support coverage.

To learn more about Kueue and common uses cases for batch workload administrators and users, see Kueue overview on AKS.

Prerequisites

An existing AKS cluster. If you don't have a cluster, create one using the Azure CLI, Azure PowerShell, or the Azure portal.
Azure CLI installed on your local machine. To install or upgrade, see Install the Azure CLI.
Helm version 3 or above installed.
The latest version of Kueue installed in a dedicated namespace on your cluster.

Define a ResourceFlavor object

In Kueue, a ResourceFlavors enables fine-grained resource management by associating workloads with specific nodes, taints, tolerations, or availability zones. For nodes, ResourceFlavors can define the characteristics like pricing, availability, brands, models, and architecture (that is, x86 versus ARM CPUs). A ClusterQueue uses these flavors to manage quotas and admission policies for workloads.

This configuration defines a ResourceFlavor without any labels or taints, known as an empty ResourceFlavor. This configuration is perfect when quotas for different flavors don't need to be managed.

Create and save a ResourceFlavor in a file named resourceflavor-sample.yaml with the following manifest:

cat << EOF > resourceflavor-sample.yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
 name: on-demand
EOF

apply

kubectl apply -f resourceflavor-sample.yaml

verify

kubectl get resourceflavors

Example output

NAME        AGE
on-demand   5m32s

Create a ClusterQueue

A ClusterQueue is a cluster-scoped resource that governs a pool of resources, defining usage limits and Fair Sharing rules. Where applicable, Fair Sharing rules allow another ClusterQueue in the same cohort to unused quota for pending jobs. Each ClusterQueue specifies which flavors it supports and how much quota is available for each.

This sample ClusterQueue defines:

namespaceSelector: {}: Indicates that sample-jobs accepts workloads from any namespace that references this ClusterQueue via a LocalQueue (you can restrict usage (for example, to only team A's namespace) with a label selector).
coveredResources: ["cpu", "memory"] in resourceGroups: Defines the standard CPU and memory resource types managed by this ClusterQueue.
flavor of on-demand nodes with 4 CPUs, 8Gi memory: Only workloads scheduled on on-demand nodes consume this quota. If the cluster uses up this quota, it doesn't admit any other workloads using this flavor (unless you allow borrowing from the cohort).

Create and save a Kueue ClusterQueue in a file named clusterqueue-sample.yaml with the following manifest:

cat <<EOF > clusterqueue-sample.yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
   name: sample-jobs
spec:
   cohort: general
  namespaceSelector: {} # Accept workloads from any namespace
  resourceGroups:
   - coveredResources: ["cpu", "memory"]
     flavors:
       - name: on-demand
         resources:
           - name: "cpu"
             nominalQuota: 4
           - name: "memory"
             nominalQuota: 8Gi
EOF

Apply the ClusterQueue manifest using the kubectl apply command.
```
kubectl apply -f clusterqueue-sample.yaml
```

Verify the ClusterQueue` manifest was applied

kubectl get clusterqueues

Example output

NAME          COHORT    PENDING WORKLOADS
sample-jobs   general   0

Note

The ClusterQueue isn't ready for use until a ResourceFlavor object is configured. If you create a ClusterQueue without any existing ResourceFlavor, workloads referencing it are marked as Inadmissible.

Create a LocalQueue

A LocalQueue is a namespace-scoped resource that acts as a gateway for users to submit jobs. A LocalQueue is assigned to one ClusterQueue from which resources are allocated to run its workloads.

This sample LocalQueue configures the following settings:

Enables users in the batch-jobs namespace to submit batch workloads to Kueue.
Route the batch workloads to the sample-jobs ClusterQueue, which manages the actual compute resource quotas and scheduling policies.

Create a namespace named batch-jobs using the kubectl create command.
```
kubectl create ns batch-jobs
```

Create and save a LocalQueue in a file named localqueue-sample.yaml with the following YAML manifest:

cat <<EOF > localqueue-sample.yaml
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  name: sample-queue
  namespace: batch-jobs
spec:
  clusterQueue: sample-jobs
EOF

Apply the LocalQueue manifest using the kubectl apply command.
```
kubectl apply -f localqueue-sample.yaml
```

Verify the LocalQueue manifest was applied

kubectl get localqueues --all-namespaces

Exampmle output

NAMESPACE    NAME           CLUSTERQUEUE   PENDING WORKLOADS   ADMITTED WORKLOADS
batch-jobs   sample-queue   sample-jobs    0                   0

Create 2 batch jobs

This configuration defines two Kubernetes batch jobs submitted to the batch-jobs namespace and assigned to the sample-queue managed by Kueue. Both jobs are single-instance (parallelism: 1, completions: 1) and are configured with Never restart policy. The fields parallelism and completions control how many pods are run and how the job is considered complete. So parallelism and completions of 1 means that one pod can run at once, and the job is marked as complete once one pod finishes successfully, per batch job.

Job test-batch-1: Requests one CPU and 500Mi memory
Job test-batch-2: Requests two CPUs and 1Gi memory

Create two sample batch jobs to deploy in the batch-jobs namespace using the following YAML manifest named batch-workloads.yaml:

cat <<EOF > batch-workloads.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: test-batch-1
  namespace: batch-jobs
  labels:
    kueue.x-k8s.io/queue-name: sample-queue
spec:
  parallelism: 1
  completions: 1
  template:
    spec:
      containers:
        - name: dummy-job
          image: registry.k8s.io/e2e-test-images/agnhost:2.53
          command: ["sh", "-c", "echo Running test-batch-1; sleep 60"]
          resources:
            requests:
              cpu: "1"
              memory: "500Mi"
            limits:
              cpu: "1"
              memory: "500Mi"
      restartPolicy: Never
---
apiVersion: batch/v1
kind: Job
metadata:
  name: test-batch-2
  namespace: batch-jobs
  labels:
    kueue.x-k8s.io/queue-name: sample-queue
spec:
  parallelism: 1
  completions: 1
  template:
    spec:
      containers:
        - name: dummy-job
          image: registry.k8s.io/e2e-test-images/agnhost:2.53
          command: ["sh", "-c", "echo Waiting in queue for CPUs...; sleep 30"]
          resources:
            requests:
              cpu: "2"
              memory: "1Gi"
            limits:
              cpu: "2"
              memory: "1Gi"
      restartPolicy: Never
EOF

Apply the manifest for the batch jobs using the kubectl apply command.
```
kubectl apply -f batch-workloads.yaml
```

Verify Batch Jobs are Submitted to `LocalQueue`

View the status of the batched workloads using the kubectl get command.

kubectl get workloads --namespace batch-jobs

Example output

NAME            ADMITTED    AGE
test-batch-1    True        10s
test-batch-2    False       5s

Run the following command for test-batch-2 while it is in a Pending state

kubectl get workloads test-batch-2 -o yaml

Expected output

...
...
Status:
  Conditions:
    Type:              Admitted
    Status:            False
    Reason:            QuotaUnavailable
    Message:           Insufficient quota in ClusterQueue sample-jobs 
    (flavor on-demand): requested 2 CPUs, available 1
...
...

After test-batch-1 completes, test-batch-2 will be admitted and run.

Now, the output should look like the following example output:

Status:
  Conditions:
    Type:              Admitted
    Status:            True
    Last Transition Time:  1234-56-78T00:00:00Z
  Admission:
    ClusterQueue:      sample-jobs
    PodSetAssignments:
      Name:            main
      Flavors:
        cpu:           on-demand
        memory:        on-demand
      ResourceUsage:
        cpu:           2
        memory:        1Gi

View the final status of the batch-jobs namespace using the kubectl get command.

kubectl get job,deploy,rs,pod,workload --namespace batch-jobs

Example output

NAME                     STATUS     COMPLETIONS   DURATION   AGE
job.batch/test-batch-1   Complete   1/1           97s        3m15s
job.batch/test-batch-2   Complete   1/1           35s        3m15s

NAME                     READY   STATUS      RESTARTS   AGE
pod/test-batch-1-hb8zl   0/1     Completed   0          3m15s
pod/test-batch-2-dx9hk   0/1     Completed   0          3m15s

NAME                                             QUEUE          RESERVED IN   ADMITTED   FINISHED   AGE
workload.kueue.x-k8s.io/job-test-batch-1-6fb85   sample-queue   sample-jobs   True       True       3m15s
workload.kueue.x-k8s.io/job-test-batch-2-84f49   sample-queue   sample-jobs   True       True       3m15s

FAQ

Question 1: How can I confirm that the Kueue controller is available and running as expected?

Confirm the Kueue controller manager pod is running using the kubectl get command.

kubectl get pods --namespace kueue-system

The Kueue controller manager pod should be in a Running state with 1/1 containers ready, as shown in the following example output:

NAME                                                 READY   STATUS      RESTARTS    AGE
kueue-controller-manager-xxxxxxx    1/1        Running     0                  2m

If the Status shows CrashLoopBackOff or Pending, check the deployment logs using the kubectl logs command.
```
kubectl logs --namespace kueue-system deployment/kueue-controller-manager
```

Question 2: One or more of the Kueue custom resources (CRDs) are missing when I install via Helm. How can I ensure all of the CRDs are installed?

After installing Kueue with the Kueue overview on AKS guidance, confirm that all of the CRDs are installed using the kubectl get command.

kubectl get crds | grep kueue

These CRDs should be listed, as shown in the following example output:

admissionchecks.kueue.x-k8s.io
clusterqueues.kueue.x-k8s.io
cohorts.kueue.x-k8s.io
localqueues.kueue.x-k8s.io
multikueueclusters.kueue.x-k8s.io
multikueueconfigs.kueue.x-k8s.io
provisioningrequestconfigs.kueue.x-k8s.io
resourceflavors.kueue.x-k8s.io
topologies.kueue.x-k8s.io
workloadpriorityclasses.kueue.x-k8s.io
workloads.kueue.x-k8s.io

If one or more of the CRDs are missing, you might see errors in controller logs, failed job queuing, CrashLoopBackOff for the controller, or inability to admit or schedule workloads. In this case, you can manually reinstall the Kueue CRDs using the kubectl apply command.
```
kubectl apply -f https://github.com/kubernetes-sigs/kueue/releases/latest/download/kueue-crds.yaml
```
Note

Note that if you manually install the CRDs, you need to manually delete them once you're finished using the kubectl delete command.

Question 3: What's the difference between a LocalQueue and a ClusterQueue

A ClusterQueue is a cluster-scoped resource that defines and governs a pool of compute resources like CPU, memory, pods, and accelerators across the entire Kubernetes cluster. A LocalQueue is a namespace-scoped resource that acts as a gateway for users to submit jobs within the defined Kubernetes cluster. This separation allows for fine-grained control over resource allocation and multi-tenant scheduling without exposing cluster-wide quotas directly to users.

How they work together:

A user submits a job to a LocalQueue in their namespace.
Kueue routes the job to the referenced ClusterQueue.
The ClusterQueue checks resource availability and quota limits.
If admitted, the job is unsuspended and scheduled.

Next steps

In this article, you:

Installed Kueue on your Azure Kubernetes Service (AKS) cluster using Helm and verified CRDs, controller health, and namespace setup.
Configured ClusterQueue and LocalQueue for general-purpose workloads with resource quotas and flavors (such as on-demand).
Submitted two batch jobs to demonstrate queuing: one admitted immediately, the second held due to quota limits, then admitted when resources became available.
Monitored workload status and controller logs to confirm scheduling behavior and queuing logic.

To learn more about Kueue, visit the following resources:

Multi-cluster scheduling and resource placement with Kueue and KubeFleet on AKS.
Kueue developer tools official documentation.

Last updated on 2025-11-14

Schedule and deploy batch jobs with Kueue on Azure Kubernetes Service (AKS)

Prerequisites

Define a ResourceFlavor object

Create a ClusterQueue

Create a LocalQueue

Create 2 batch jobs

Verify Batch Jobs are Submitted to LocalQueue

FAQ

Question 1: How can I confirm that the Kueue controller is available and running as expected?

Question 2: One or more of the Kueue custom resources (CRDs) are missing when I install via Helm. How can I ensure all of the CRDs are installed?

Question 3: What's the difference between a LocalQueue and a ClusterQueue

Next steps

Additional resources

Verify Batch Jobs are Submitted to `LocalQueue`