Best practices for ephemeral NVMe data disks in Azure Kubernetes Service (AKS)

Ephemeral NVMe data disks provide high-performance, low-latency storage that's ideal for demanding workloads running on Azure Kubernetes Service (AKS). Many modern applications, such as AI/ML training, data analytics, and high-throughput databases, require fast temporary storage to process large volumes of intermediate data efficiently. By using ephemeral NVMe disks, you can significantly improve application responsiveness and throughput, while optimizing for cost and scalability in your AKS clusters.

In contrast to remote disks, whose performance scales with the size of the virtual machine (VM), Ephemeral NVMe disks maintain full performance regardless of vCPU count. This is because they are physically attached to the VM and operate without relying on a remote disk controller. The difference is notable:

Ultra Disk: Achieving 400,000 IOPS requires a 112-vCPU VM (for example, Standard_E112ibds_v5).
Local NVMe: An 8-vCPU VM (for example, Standard_L8s_v3) can deliver 400,000 IOPS.

This results in approximately 14 times fewer vCPUs for equivalent IOPS performance, offering a substantial reduction in compute resource requirements.

This best practices article focuses on storage considerations for cluster operators. In this article, you learn:

Common scenarios where ephemeral NVMe data disks provide performance benefits.
How to identify which VM sizes support ephemeral NVMe data disks.
How to use ephemeral NVMe data disks for your Kubernetes workloads.
How ephemeral NVMe data disks work when your AKS nodes use ephemeral OS disks.
How to measure the performance of your workloads using ephemeral NVMe data disks.

Common scenarios of high-performance workloads

Ephemeral NVMe data disks are ideal for workloads that demand high throughput, low latency, and fast access to temporary or intermediate data. The following scenarios highlight where local NVMe disks provide the most significant benefits:

High-performance databases (for example, PostgreSQL)

For databases such as PostgreSQL, especially in high-availability (HA) or read-intensive deployments, local NVMe disks can dramatically improve transaction throughput and reduce query latency. When used for temporary tablespaces, write-ahead logs (WAL), or as a cache layer, NVMe disks help offload I/O from persistent storage, accelerating analytics and transactional workloads.

Best practices:

Use NVMe-backed volumes for PostgreSQL temp directories and WAL logs to maximize IOPS and minimize latency.
For HA scenarios, ensure that persistent data directories remain on durable storage, while using NVMe for non-persistent, high-churn data.
See PostgreSQL HA on AKS for architecture guidance.

AI model hosting and inference (for example, KAITO)

AI model serving platforms like KAITO benefit from NVMe disks for rapid model loading, artifact caching, and high-throughput inference. When models are stored as Open Container Initiative (OCI) artifacts and loaded on demand, local NVMe storage ensures minimal cold start times and efficient batch processing.

Best practices:

Use NVMe-backed volumes for model cache directories to accelerate model pulls and reduce inference latency.
For distributed inference, ensure each node has sufficient NVMe capacity to cache frequently used models.
Integrate with Kubernetes-native storage solutions (for example, Azure Container Storage) for automated management and monitoring.
See KAITO model as OCI artifacts for architecture guidance.

Data analytics and ETL pipelines

Workloads that process large volumes of intermediate data, such as Spark, Dask, or custom ETL jobs, can apply NVMe disks for shuffle storage, temporary files, and scratch space. This approach reduces bottlenecks during data transformation and aggregation.

Best practices:

Configure shuffle and temp directories to use NVMe-backed storage.
Clean up temporary data promptly to maximize available space.

Caching layers and key-value stores

In-memory databases and caching solutions (for example, Redis, Memcached, RocksDB) can use NVMe disks as a fast persistence layer or for overflow storage, providing a balance between speed and durability.

Best practices:

Use NVMe for write-heavy cache workloads where persistence isn't critical.
Monitor disk usage to avoid eviction or data loss due to node restarts.

High-performance computing (HPC) and simulation

HPC workloads, including genomics, financial modeling, and scientific simulations, often require rapid access to large datasets and scratch space for intermediate results. NVMe disks provide the necessary bandwidth and low latency for these scenarios.

Check VM sizes with ephemeral NVMe data disks

Ephemeral NVMe data disks are available on select Azure VM sizes that offer local, high-performance storage directly attached to the physical host. These disks are ideal for temporary data, such as caches, scratch files, or intermediate processing, and aren't persisted after a VM is deallocated or stopped. The number and capacity of NVMe disks vary by VM size and family.

To determine which VM sizes support ephemeral NVMe data disks and their configurations, refer to the Azure VM documentation and the AKS supported VM sizes. Look for VM series such as Lsv4 and Ddsv6, which are designed for high-throughput, low-latency workloads.

The following table lists example VM sizes and their NVMe disk configurations:

VM Size	Number of NVMe Disks	Total NVMe Capacity (GiB)
Standard_L4s_v4	2	894
Standard_L8s_v4	4	1,788
Standard_L96s_v4	12	21,456
Standard_D16ds_v6	2	880
Standard_D32ds_v6	4	1,760
Standard_D96ds_v6	6	5,280

For AI workloads that require GPU acceleration, consider VM sizes in the NC, ND, and NV series. Some GPU-enabled VM sizes, such as Standard_NC48ads_A100_v4 and Standard_ND96isr_H100_v5, offer local NVMe storage in addition to powerful GPUs. These VMs are suitable for AI training, inference, and other compute-intensive scenarios where both GPU and fast local storage are needed.

Example GPU VM sizes with NVMe disks:

VM Size	GPU Type	Number of NVMe Disks	Total NVMe Capacity (GiB)
Standard_NC48ads_A100_v4	2 x A100	2	1,788
Standard_NC96ads_A100_v4	4 x A100	4	3,576
Standard_ND96isr_H100_v5	8 x H100	8	28,610
Standard_ND96isr_H200_v5	8 x H200	8	28,610

Note

Actual NVMe disk capacity and number might vary by region and VM generation. Not all GPU VM sizes include local NVMe storage. Always verify the latest VM specifications and NVMe disk availability in the Azure documentation, as configurations might change.

Validate ephemeral NVMe data disks configuration

To ensure your AKS node is provisioned with ephemeral NVMe data disks, you can validate the configuration using the Azure CLI and by inspecting the node directly.

Option 1: Use Azure CLI to check NVMe disk configuration

You can use the Azure CLI to inspect the VM size and attached NVMe disks with the following sample commands.

# Modify location and VM size if needed
locationName="chinanorth3"
vmSize="Standard_L8s_v4"
az vm list-skus --resource-type virtualMachines --location $locationName \
  --query "[?name=='$vmSize'].{
    SkuName: name,
    NvmeDiskSizeInMiB: capabilities[?name=='NvmeDiskSizeInMiB'] | [0].value,
    NvmeSizePerDiskInMiB: capabilities[?name=='NvmeSizePerDiskInMiB'] | [0].value
  }" -o table

SkuName          NvmeDiskSizeInMiB    NvmeSizePerDiskInMiB
---------------  -------------------  ----------------------
Standard_L8s_v4  1830912              457728

Option 2: Use `lsblk` to check disk and mount layout on the node

kubectl get nodes

# Modify the node name from above list as needed
nodeName="aks-myworkload-22647054-vmss000000"

# Use your approach to login into the node.

kubectl debug "node/$nodeName" \
    --image=ubuntu \
    --profile=sysadmin -it \
    -- chroot /host /bin/bash

Once connected, use lsblk to list block devices and identify NVMe disks:

lsblk -o NAME,HCTL,SIZE,MOUNTPOINT,MODEL

NAME        HCTL       SIZE MOUNTPOINT MODEL
sr0         0:0:0:2    750K            Virtual DVD-ROM
nvme0n1                110G            Microsoft NVMe Direct Disk v2

NVMe disks typically appear as nvme*n1 and are configured with Microsoft NVMe Direct Disk* on model. This result confirms the presence and configuration of ephemeral NVMe data disks on your AKS node.

Use ephemeral NVMe data disks in workloads

There are several ways to use ephemeral NVMe data disks in your AKS workloads. The most common approaches are:

`emptyDir` Volumes

emptyDir is a Kubernetes volume type that uses the node's local storage. When backed by NVMe disks, emptyDir provides high throughput and low latency for temporary data.

To use this method, define an emptyDir volume in your Pod spec. By default, it uses the fastest available storage (NVMe if present).

Advantages

Simple to use and configure.
No external dependencies.
High performance when backed by NVMe.

Disadvantages

Data is lost if the Pod is rescheduled to another node.
No data persistence or replication.
Limited to single NVMe disk.

`hostPath` Volumes

hostPath mounts a specific directory or disk from the node’s filesystem into the Pod. You can target NVMe mount points directly.

To use this method, specify the NVMe disk path (for example, /mnt or /mnt/nvme0n1) in the Pod spec.

Advantages

Direct access to NVMe disk.
Useful for advanced scenarios (for example, custom formatting, partitioning).

Disadvantages

Tightly coupled to node layout; not portable.
Security risks if not properly restricted.
Limited to single NVMe disk.

Ephemeral NVMe data disks with ephemeral OS disks

When deploying AKS nodes with local NVMe data disks, such as the Standard_D2ads_v6 VM size (single 100 GiB NVMe disk) with ephemeral OS disks setting opt-in, you might observe that the ephemeral OS disk (for example, 60 GiB) is provisioned from the NVMe capacity. However, the unused NVMe space (in this example, the extra 40 GiB) isn’t available to use, and there’s no supported way to access or recover it after the node is created.

This behavior is by design, as the ephemeral OS disk requirements dictate how the NVMe device is partitioned at provisioning time. It can be confusing since you don’t get access to all of its storage, especially with many VM sizes that come with only one NVMe disk.

Use the following example to validate this behavior:

# Create Standard_D2ads_v6 (Single 100 GiB NVMe disk) node pool using ephemeral OS disk with 60 GiB capacity

az aks nodepool add \
    --resource-group $resourceGroup \
    --cluster-name $clusterName \
    --name $nodePoolName \
    --node-count 1 \
    --node-vm-size Standard_D2ads_v6 \
    --node-osdisk-type Ephemeral \
    --node-osdisk-size 60

kubectl debug "node/$nodeName" \
    --image=ubuntu \
    --profile=sysadmin -it \
    -- chroot /host /bin/bash

lsblk -o NAME,FSTYPE,LABEL,MOUNTPOINT,SIZE,VENDOR,MODEL
NAME         FSTYPE LABEL           MOUNTPOINT  SIZE VENDOR   MODEL
sr0                                             750K Msft     Virtual DVD-ROM
nvme0n1                                          60G          MSFT NVMe Accelerator v1.0
|-nvme0n1p1  ext4   cloudimg-rootfs /          59.9G
|-nvme0n1p14                                      4M
`-nvme0n1p15 vfat   UEFI            /boot/efi   106M

When you use VM sizes with a single local NVMe data disk and enable ephemeral OS disk, the OS consumes the entire NVMe disk, leaving no space available for Kubernetes workloads to provision persistent volumes. For VM sizes with two or more local NVMe data disks, one disk is used for the ephemeral OS, and the others can be used to provision persistent volumes for your workloads.

Current limitations

The ephemeral OS disk consumes a portion of one local NVMe drive, with the remainder left inaccessible.
There's no supported way to access or mount the unused NVMe space after node creation.
You can't update or repartition the NVMe disk post-deployment.

Customer impact

Reduced usable NVMe capacity compared to what is advertised for the VM size.
Inability to fully use high-performance local storage for workloads.
Potential confusion and inconvenience during upgrades or node replacement.

Recommendation

Decide the intended use of local NVMe disks, either for the OS disk or for Kubernetes workload storage—before provisioning AKS nodes. Ephemeral OS disk configuration is immutable after node creation, so planning ahead avoids the need to recreate nodes if requirements change.
Omit the OS disk size input when creating AKS nodes with ephemeral OS disks on NVMe-backed VMs. This prevents misconfiguration and aligns with product documentation, reducing the risk of inaccessible capacity and upgrade issues.

Note

These improvements are important for user experience and operational efficiency, especially as more VM SKUs with single NVMe disks become available. Follow the latest AKS documentation and monitor Azure updates for enhancements in ephemeral disk management.

Last updated on 2025-10-30

Best practices for ephemeral NVMe data disks in Azure Kubernetes Service (AKS)

Common scenarios of high-performance workloads

High-performance databases (for example, PostgreSQL)

AI model hosting and inference (for example, KAITO)

Data analytics and ETL pipelines

Caching layers and key-value stores

High-performance computing (HPC) and simulation

Check VM sizes with ephemeral NVMe data disks

Validate ephemeral NVMe data disks configuration

Option 1: Use Azure CLI to check NVMe disk configuration

Option 2: Use lsblk to check disk and mount layout on the node

Use ephemeral NVMe data disks in workloads

emptyDir Volumes

Advantages

Disadvantages

hostPath Volumes

Advantages

Disadvantages

Ephemeral NVMe data disks with ephemeral OS disks

Current limitations

Customer impact

Recommendation

Additional resources

Option 2: Use `lsblk` to check disk and mount layout on the node

`emptyDir` Volumes

`hostPath` Volumes