Data ingestion and normalization

This article helps you understand the data ingestion and normalization capability within the FinOps Framework and how to implement that in the Microsoft Cloud.

Definition

Data ingestion and normalization refers to the process of collecting, transforming, and organizing data from various sources into a single, easily accessible repository.

Gather cost, utilization, performance, and other business data from cloud providers, vendors, and on-premises systems. Gathering the data can include:

  • Internal IT data. For example, from a configuration management database (CMDB) or IT asset management (ITAM) systems.
  • Business-specific data, like organizational hierarchies and metrics that map cloud costs to or quantify business value. For example, revenue, as defined by your organizational and divisional mission statements.

Consider how data gets reported and plan for data standardization requirements to support reporting on similar data from multiple sources, like cost data from multiple clouds or account types. Prefer open standards and interoperability with and across providers, vendors, and internal tools. It may also require restructuring data in a logical and meaningful way by categorizing or tagging data so it can be easily accessed, analyzed, and understood.

When armed with a comprehensive collection of cost and usage information tied to business value, organizations can empower stakeholders and accelerate the goals of other FinOps capabilities. Stakeholders are able to make more informed decisions, leading to more efficient use of resources and potentially significant cost savings.

Before you begin

While data ingestion and normalization are critical to long-term efficiency and effectiveness of any FinOps practice, it isn't a blocking requirement for your initial set of FinOps investments. If it is your first iteration through the FinOps lifecycle, consider lighter-weight capabilities that can deliver quicker return on investment, like Data analysis and showback. Data ingestion and normalization can require significant time and effort depending on account size and complexity. We recommend focusing on this process once you have the right level of understanding of the effort and commitment from key stakeholders to support that effort.

Getting started

When you first start managing cost in the cloud, you use the native tools available in the portal or through Power BI. If you need more, you may download the data for local analysis, or possibly build a small report or merge it with another dataset. Eventually, you need to automate this process, which is where "data ingestion" comes in. As a starting point, we focus on ingesting cost data into a common data store.

  • Before you ingest cost data, think about your reporting needs.
    • Talk to your stakeholders to ensure you have a firm understanding of what they need. Try to understand their motivations and goals to ensure the data or reporting helps them.
    • Identify the data you need, where you can get the data from, and who can give you access. Make note of any common datasets that may require normalization.
    • Determine the level of granularity required and how often the data needs to be refreshed. Daily cost data can be a challenge to manage for a large account. Consider monthly aggregates to reduce costs and increase query performance and reliability if that meets your reporting needs.
  • Consider using a third-party FinOps platform.
    • Review the available third-party solutions in the Azure Marketplace.
    • If you decide to build your own solution, consider starting with FinOps hubs, part of the open source FinOps toolkit provided by Microsoft.
      • FinOps hubs will accelerate your development and help you focus on building the features you need rather than infrastructure.
  • Select the cost details solution that is right for you. We recommend scheduled exports, which push cost data to a storage account on a daily or monthly basis.
    • If you use daily exports, notice that data is pushed into a new file each day. Ensure that you only select the latest day when reporting on costs.
  • Determine if you need a data integration or workflow technology to process data.
    • In an early phase, you may be able to keep data in the exported storage account without other processing. We recommend that you keep the data there for small accounts with lightweight requirements and minimal customization.
    • If you need to ingest data into a more advanced data store or perform data cleanup or normalization, you may need to implement a data pipeline. Choose a data pipeline orchestration technology.
  • Determine what your data storage requirements are.
    • In an early phase, we recommend using the exported storage account for simplicity and lower cost.
    • If you need an advanced query engine or expect to hit data size limitations within your reporting tools, you should consider ingesting data into an analytical data store. Choose an analytical data store.

Building on the basics

At this point, you have a data pipeline and are ingesting data into a central data repository. As you move beyond the basics, consider the following points:

  • Normalize data to a standard schema to support aligning and blending data from multiple sources.
    • FinOps hubs includes a Power BI report that normalizes data to the FOCUS schema, which can be a good starting point.
    • For an example of the FOCUS schema with Azure data, see the FOCUS sample report.
  • Complement cloud cost data with organizational hierarchies and budgets.
    • Consider labeling or tagging requirements to map cloud costs to organizational hierarchies.
  • Enrich cloud resource and solution data with internal CMDB or ITAM data.
  • Consider what internal business and revenue metrics are needed to map cloud costs to business value.
  • Determine what other datasets are required based on your reporting needs:

Next steps