High Availability (HA) health status monitoring for Azure Database for PostgreSQL - Flexible Server
APPLIES TO: Azure Database for PostgreSQL - Flexible Server
Azure Database for PostgreSQL Flexible Server includes a High Availability (HA) Health Status Monitoring feature, which uses Azure's Resource Health Check (RHC) framework. This service provides continuous insights into the health of HA-enabled instances, notifying you of events that might affect connectivity and availability. The following details each health state and associated scenarios to help you troubleshoot and maintain HA stability.
Health States
Each HA state is monitored through various internal signals that represent specific conditions. Below are the possible HA states along with visual indicators and scenarios that might affect your Azure Database for PostgreSQL Flexible Server.
Available - HA is Healthy
The Available status indicates that your HA-enabled server is operating normally with no detected issues affecting failover readiness. All necessary configurations are intact, and no significant error conditions have been detected.
Degraded - Network Security Group (NSG) or Virtual Appliance Blocking Connections
The Degraded status might appear when NSG rules or a virtual appliance is blocking essential connections required for high availability. This configuration issue prevents full HA functionality and should be corrected by adjusting the NSG settings.
Degraded - Read-Only State
If your PostgreSQL Flexible Server enters a read-only state, the Degraded status reflects this restriction. This typically requires provisioning additional resources or addressing the conditions that led to the read-only setting to restore full functionality.
Degraded - High Availability in Degraded State
When the HA service itself is experiencing degraded performance, possibly due to transient issues or system-level conditions, this status appears. Implementing retry logic can help mitigate the effects of these temporary connectivity disruptions.
Degraded - Planned Failover Initiated
During a planned failover event initiated for your server, the Degraded status appears, signifying that HA failover processes are active. This is generally a brief and controlled process, and service should resume shortly.
Degraded - Unplanned Failover Initiated
For an unplanned failover, this status indicates an active failover event triggered by unexpected circumstances. This scenario might involve brief connectivity interruptions until the server completes failover procedures.
Degraded - Upgrade Failover Initiated
During system upgrades, your HA server might undergo an upgrade failover to apply necessary updates. While in this state, the server might restrict new connections temporarily, and retry logic should be implemented to handle transient issues effectively.
Configuring Resource Health Alerts
You can set up Resource Health alerts to receive real-time notifications when any changes occur in the health status of your HA-enabled PostgreSQL instance. Configurations are available through the Azure portal or using an ARM template, helping you stay informed of HA status updates without actively monitoring the portal.
Steps to Configure Resource Health Alerts via Portal
- Navigate to the Azure portal and select your PostgreSQL Flexible Server.
- In the left-hand menu, select "Alerts" under the "Monitoring" section.
- Select "New alert rule" and configure the alert logic based on Resource Health signals.
- Set up the action group to specify how you want to be notified (email, SMS, etc.).
- Review and create the alert rule.
Steps to Create Resource Health Alerts using ARM Template
- Download the ARM template from the Resource Health Alerts ARM Template Guide.
- Customize the template with your specific server details and alert preferences.
- Deploy the ARM template using Azure CLI or Azure PowerShell.
- Verify the deployment and ensure the alerts are active.
For more details on setting up alerts, follow these guides:
By using HA Health Status Monitoring, you gain essential insights into your PostgreSQL server's HA performance, enabling a proactive approach to managing uptime and availability.