Managing anomalies

This article helps you understand the managing anomalies capability within the FinOps Framework and how to implement that in the Microsoft Cloud.

Definition

Managing anomalies refers to the practice of detecting and addressing abnormal or unexpected cost and usage patterns in a timely manner.

Use automated tools to detect anomalies and notify stakeholders. Review usage trends periodically to reveal anomalies automated tools may have missed.

Investigate changes in application behaviors, resource utilization, and resource configuration to uncover the root cause of the anomaly.

With a systematic approach to anomaly detection, analysis, and resolution, organizations can minimize unexpected costs that impact budgets and business operations. And, they can even identify and prevent security and reliability incidents that can surface in cost data.

Getting started

When you first start managing cost in the cloud, you use the native tools available in the portal.

  • Start with proactive alerts.
    • Subscribe to anomaly alerts for each subscription in your environment to receive email alerts when an unusual spike or drop has been detected in your normalized usage based on historical usage.
    • Consider subscribing to scheduled alerts to share a chart of the recent cost trends with stakeholders. It can help you drive awareness as costs change over time and potentially catch changes the anomaly model may have missed.
    • Consider creating a budget in Cost Management to track that specific scope or workload. Specify filters and set alerts for both actual and forecast costs for finer-grained targeting.
  • Review costs periodically, using detailed cost breakdowns, usage analytics, and visualizations to identify potential anomalies that may have been missed.
  • Once an anomaly is identified, take appropriate actions to address it.
    • Review the anomaly details with the engineers who manage the related cloud resources. Some autodetected "anomalies" are planned or at least known resource configuration changes as part of building and managing cloud services.
    • If you need lower-level usage details, review resource utilization in Azure Monitor metrics.

Building on the basics

At this point, you have automated alerts configured and ideally views and reports saved to streamline periodic checks.

  • Establish and automate KPIs, such as:
    • Number of anomalies each month or quarter.
    • Total cost impact of anomalies each month or quarter
    • Response time to detect and resolve anomalies.
    • Number of false positives and false negatives.
  • Expand coverage of your anomaly detection and response process to include all costs.
  • Define, document, and automate workflows to guide the response process when anomalies are detected.
  • Foster a culture of continuous learning, innovation, and collaboration.
    • Regularly review and refine anomaly management processes based on feedback, industry best practices, and emerging technologies.
    • Promote knowledge sharing and cross-functional collaboration to drive continuous improvement in anomaly detection and response capabilities.

Next steps