Workload management and automation

Article
04/19/2024

This article helps you understand the workload management and automation capability within the FinOps Framework and how to implement that in the Microsoft Cloud.

Definition

Workload management and automation refers to running resources only when necessary and at the level or capacity needed for the active workload.

Tag resources based on their up-time requirements. Review resource usage patterns and determine if they can be scaled down or even shutdown (to stop billing) during off-peak hours. Consider cheaper alternatives to reduce costs.

An effective workload management and automation plan can significantly reduce costs by adjusting configuration to match supply to demand dynamically, ensuring the most effective utilization.

Getting started

When you first start working with a service, consider the following points:

Can the service be stopped (and if so, stop billing)?
- If the service can't be stopped, review alternatives to determine if there are any options that can be stopped to stop billing.
- Pay close attention to noncompute charges that may continue to be billed when a resource is stopped so you're not surprised. Storage is a common example of a cost that continues to be charged even if a compute resource that was using the storage is no longer running.
Does the service support serverless compute?
- Serverless compute tiers can reduce costs when not active. Some examples: Cosmos DB, Synapse Analytics.
Does the service support autostop or autoshutdown functionality?
- If you use a service that supports being stopped, but not autostopping, consider using a lightweight flow in Power Automate or Logic Apps.
Does the service support autoscaling?
- If the service supports autoscaling, configure it to scale based on your application's needs.
- Autoscaling can work with autostop behavior for maximum efficiency.
Consider automatically stopping and manually starting nonproduction resources during work hours to avoid unnecessary costs.
- Avoid automatically starting nonproduction resources that aren't used every day.
- If you choose to autostart, be aware of vacations and holidays where resources may get started automatically but not be used.
- Consider tagging manually stopped resources. Save a query in Azure Resource Graph or a view in the All resources list and pin it to the Azure portal dashboard to ensure all resources are stopped.
Consider architectural models such as containers and serverless to only use resources when they're needed, and to drive maximum efficiency in key services.

Building on the basics

At this point, you have setup autoscaling and autostop behaviors. As you move beyond the basics, consider the following points:

Automate the process of automatically scaling or stopping resources that don't support it or have more complex requirements.
Consider using Azure Functions.
Assign an "Env" or Environment tag to identify which resources are for development, testing, staging, production, etc.
- Prefer assigning tags at a subscription or resource group level. Then enable the tag inheritance policy for Azure Policy and Cost Management tag inheritance to cover resources that don't emit tags with usage data.
- Consider setting up automated scripts to stop resources with specific up-time profiles (for example, stop developer VMs during off-peak hours if they haven't been used in 2 hours).
- Document up-time expectations based on specific tag values and what happens when the tag isn't present.
- Use Azure Policy to track compliance with the tag policy.
- Use Azure Policy to enforce specific configuration rules based on environment.
- Consider using "override" tags to bypass the standard policy when needed. Track the cost and report them to stakeholders to ensure accountability.
Consider establishing and tracking KPIs for low-priority workloads, like development servers.

Workload management and automation

Definition

Getting started

Building on the basics

Next steps

Additional resources