Scenario: Azure HDInsight clusters with disk encryption lose Key Vault access

This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.

Issue

The Resource Health Center (RHC) alert, The HDInsight cluster is unable to access the key for BYOK encryption at rest, is shown for Bring Your Own Key (BYOK) clusters where the cluster nodes have lost access to customers Key Vault (KV). Similar alerts can also be seen on Apache Ambari UI.

Cause

The alert ensures that KV is accessible from the cluster nodes, thereby ensuring the network connection, KV health, and access policy for the user assigned Managed Identity. This alert is only a warning of impending broker shutdown on subsequent node reboots, the cluster continues to function until nodes reboot.

Navigate to Apache Ambari UI to find more information about the alert from Disk Encryption Key Vault Status. This alert will have details about the reason for verification failure.

Resolution

KV/AAD outage

Look at Azure Key Vault availability and redundancy and Azure status page for more details https://status.azure.com/

KV accidental deletion

  • Restore deleted key on KV to auto recover. For more information, see Recover Deleted Key.
  • Reach out to KV team to recover from accidental deletions.

KV access policy changed

Restore the access policies for the user assigned Managed Identity that is assigned to HDI cluster for accessing the KV.

Key permitted operations

For each key in KV, you can choose the set of permitted operations. Ensure that you have wrap and unwrap operations enabled for the BYOK key

Expired key

If the expiry has passed and key isn't rotated, contact KV team to clear the expiry date.

KV firewall blocking access

Fix the KV firewall settings to allow BYOK cluster nodes to access the KV.

NSG rules on virtual network blocking access

Check the NSG rules associated with the virtual network attached to the cluster.

Mitigation and prevention steps

KV accidental deletion

  • Configure Key Vault with Resource Lock set.
  • Back up keys to their Hardware Security Module.

Key deletion

Cluster should be deleted before key deletion.

KV access policy changed

Regularly audit and test access policies.

Expired key

  • Back up keys to your HSM.
  • Use a key without any expiry set.
  • If expiry needs to be set, rotate the keys before the expiration date.