操作说明:将指标数据载入指标顾问How-to: Onboard your metric data to Metrics Advisor

请参阅本文,了解如何将数据载入指标顾问。Use this article to learn about onboarding your data to Metrics Advisor.

数据架构要求和配置Data schema requirements and configuration

指标监视器是一种用于时序异常检测、诊断和分析的服务。Metrics Monitor is a service for time series anomaly detection, diagnostics and analysis. 作为由 AI 提供支持的服务,它使用你的数据来训练所使用的模型。As an AI powered service, it uses your data to train the model used. 服务接受具有以下各列的聚合数据表:The service accepts tables of aggregated data with the following columns:

  • 度量值 (必需):一个或多个包含数值的列。Measure (required): one or more columns containing numeric values.
  • 时间戳 (可选):零个或一个类型为 DateTimeString 的列。Timestamp (optional): zero or one column with type of DateTime or String. 如果未设置此列,则时间戳将设置为每个引入周期的开始时间。When this column is not set, the timestamp is set as the start time of each ingestion period. 将时间戳格式设置为:yyyy-MM-ddTHH:mm:ssZFormat the timestamp into: yyyy-MM-ddTHH:mm:ssZ.
    • 时间戳应与指标的粒度匹配。例如,每日指标应确保时间戳上的小时、分钟和秒标记为 00:00:00Your timestamp should match the granularity of the metric. For example, a daily metric should ensure the hour, minute and second on the timestamp labeled as 00:00:00 .
  • 维度 (可选):列可以是任意数据类型。Dimension (optional): Columns can be of any data type. 处理大量的列和值时要格外小心,应避免处理过多的维度。Be cautious when working with large volumes of columns and values, to prevent excessive numbers of dimensions from being processed.

备注

对于每个指标,每个度量值只能有一个时间戳,对应于一个维度组合。For each metric, there should only be one timestamp per measure, corresponding to one dimension combination. 在载入之前对数据进行聚合,或者使用查询来指定要引入的数据。Aggregate your data ahead of onboarding or use the query to specify the data to be ingested.

避免加载部分数据Avoid loading partial data

加载部分数据是由于指标顾问中存储的数据与数据源中存储的数据不一致导致的。Partial data is caused by inconsistencies between the data stored in Metrics Advisor and the data source. 在指标顾问完成数据拉取后,当数据源更新时,可能会发生这种情况。This can happen when the data source is updated after Metrics Advisor has finished pulling data. 指标顾问只从给定数据源拉取一次数据。Metrics Advisor only pulls data from a given data source once.

例如,如果某个指标已载入到指标顾问中用于监视,则会发生这种情况。For example, if a metric has been onboarded to Metrics Advisor for monitoring. 指标顾问成功获取时间戳 A 处的指标数据,并对其执行异常情况检测。Metrics Advisor successfully grabs metric data at timestamp A and performs anomaly detection on it. 但是,如果该特定时间戳 (A) 的指标数据在引入数据后已刷新,However, if the metric data of that particular timestamp A has been refreshed after the data been ingested. 则不会检索新的数据值。New data value won't be retrieved.

你可以尝试回填历史数据(稍后介绍)以减少不一致情况,但如果已触发这些时间点的警报,则不会触发新的异常警报。You can try to backfill historical data (described later) to mitigate inconsistencies but this won't trigger new anomaly alerts, if alerts for those time points have already been triggered. 此过程可能会向系统添加更多工作负荷,并且不是自动的。This process may add additional workload to the system, and is not automatic.

为了避免加载部分数据,建议采用两种方法:To avoid loading partial data, we recommend two approaches:

  • 在单个事务中生成数据:Generate data in one transaction:

    确保同一时间戳的所有维度组合的指标值在单个事务中存储到数据源。Ensure the metric values for all dimension combinations at the same timestamp are stored to the data source in one transaction. 在上面的示例中,请等待所有数据源中的数据准备就绪,然后在单个事务中将其加载到指标顾问中。In the above example, wait until data from all data sources is ready, and then load it into Metrics Advisor in one transaction. 指标顾问可以定期轮询数据馈送,直至成功检索到数据(或检索到部分数据)。Metrics Advisor can poll the data feed regularly until data is successfully (or partially) retrieved.

  • 通过为“引入时间偏移”参数设置适当的值来延迟数据引入:Delay data ingestion by setting a proper value for the Ingestion time offset parameter:

    为数据馈送设置“引入时间偏移”参数以延迟引入,直到数据完全准备就绪。Set the Ingestion time offset parameter for your data feed to delay the ingestion until the data is fully prepared. 对于不支持事务的某些数据源(例如 Azure 表存储),这可能很有用。This can be useful for some data sources which don't support transactions such as Azure Table Storage. 有关详细信息,请参阅高级设置See advanced settings for details.

使用基于 Web 的工作区添加数据馈送Add a data feed using the web-based workspace

登录到指标顾问门户并选择你的工作区后,请单击“开始”。After signing into your Metrics Advisor portal and choosing your workspace, click Get started . 然后,在工作区的主页上,从左侧菜单中单击“添加数据馈送”。Then, on the main page of the workspace, click Add data feed from the left menu.

添加连接设置Add connection settings

接下来,你将输入一组用于连接时序数据源的参数。Next you'll input a set of parameters to connect your time-series data source.

  • 源类型 :用于存储时序数据的数据源的类型。Source Type : The type of data source where your time series data is stored.
  • 粒度 :时序数据中连续数据点之间的间隔。Granularity : The interval between consecutive data points in your time series data. 指标顾问目前支持:“每年”、“每月”、“每周”、“每日”、“每小时”和“自定义”。Currently Metrics Advisor supports: Yearly, Monthly, Weekly, Daily, Hourly, and Custom. 自定义选项支持的最小间隔为 60 秒。The lowest interval The customization option supports is 60 seconds.
    • :当“granularityName”设置为“Customize”时的秒数。Seconds : The number of seconds when granularityName is set to Customize .
  • 引入数据的时间 (UTC) :数据引入的基线开始时间。Ingest data since (UTC) : The baseline start time for data ingestion. “startOffsetInSeconds”通常用于添加偏移以帮助提高数据一致性。startOffsetInSeconds is often used to add an offset to help with data consistency.

接下来,你需要指定数据源的连接信息,以及用于将数据转换为所需架构的自定义查询。Next, you'll need to specify the connection information for the data source, and the custom queries used to convert the data into the required schema. 若要详细了解其他字段以及如何连接不同类型的数据源,请参阅添加来自不同数据源的数据馈送For details on the other fields and connecting different types of data sources, see Add data feeds from different data sources.

在查询中,使用 @StartTime 参数获取单一时间戳的指标数据。Within the query use the @StartTime parameter to get metric data for a single timestamp. 指标顾问将在运行查询时将参数替换为 yyyy-MM-ddTHH:mm:ss 格式字符串。Metrics Advisor will replace the parameter with a yyyy-MM-ddTHH:mm:ss format string when it runs the query.

重要

对于每个维度组合,查询应在每个时间戳处最多返回一条记录。The query should return at most one record for each dimension combination, at each timestamp. 查询返回的所有记录必须具有相同的时间戳。And all records returned by the query must have the same timestamps. 指标顾问将针对每个时间戳运行此查询以引入数据。Metrics Advisor will run this query for each timestamp to ingest your data. 有关详细信息和示例,请参阅查询常见问题解答部分See the FAQ section on queries for more information, and examples.

验证并获取架构Verify and get schema

设置连接字符串和查询字符串后,选择“验证并获取架构”来验证连接并运行查询,以便从数据源获取数据架构。After the connection string and query string are set, select Verify and get schema to verify the connection and run the query to get your data schema from the data source. 通常,这会花费几秒钟的时间,具体取决于你的数据源连接。Normally it takes a few seconds depending on your data source connection. 如果此步骤中出现错误,请确认:If there's an error at this step, confirm that:

  • 你的连接字符串和查询是正确的。Your connection string and query are correct.
  • 如果有防火墙设置,指标顾问实例能够连接到数据源。Your Metrics Advisor instance is able to connect to the data source if there are firewall settings.

架构配置Schema configuration

加载数据架构后,选择适当的字段。Once the data schema is loaded, select the appropriate fields.

如果省略了某个时间点的时间戳,则指标顾问会改用引入数据点时的时间戳。If the timestamp of a data point is omitted, Metrics Advisor will use the timestamp when the data point is ingested instead. 对于每个数据馈送,最多可将一列指定为时间戳。For each data feed, you can specify at most one column as a timestamp. 如果你收到一条消息,指出不能将某个列指定为时间戳,请检查你的查询或数据源,并检查查询结果中是否有多个时间戳,而不仅仅是检查预览数据。If you get a message that a column cannot be specified as a timestamp, check your query or data source, and whether there are multiple timestamps in the query result - not only in the preview data. 执行数据引入操作时,指标顾问每次只能使用给定源中的一个时序数据区块(例如一天、一小时,具体取决于粒度)。When performing data ingestion, Metrics Advisor can only consume only one chunk (for example one day, one hour - according to the granularity) of time-series data from the given source each time.

选择Selection 说明Description 注释Notes
显示名称Display Name 取代原始列名称显示在工作区中的名称。Name to be displayed in your workspace instead of the original column name.
TimestampTimestamp 数据点的时间戳。The timestamp of a data point. 如果省略,则指标顾问将在数据点引入时使用时间戳。If omitted, Metrics Advisor will use the timestamp when the data point is ingested instead. 对于每个数据馈送,最多可将一列指定为时间戳。For each data feed, you can specify at most one column as timestamp. 可选。Optional. 最多只能指定一列。Should be specified with at most one column. 如果收到“无法将列指定为时间戳”错误,请检查查询或数据源中是否存在重复的时间戳。If you get a column cannot be specified as Timestamp error, check your query or data source for duplicate timestamps.
度量 Measure 数据馈送中的数值。The numeric values in the data feed. 对于每个数据馈送,可以指定多个度量值,但至少应选择一列作为度量值。For each data feed, you can specify multiple measures but at least one column should be selected as measure. 应至少指定一列。Should be specified with at least one column.
维度Dimension 分类值。Categorical values. 不同值的组合标识特定的一维时序,例如:国家/地区、语言、租户。A combination of different values identifies a particular single-dimension time series, for example: country, language, tenant. 你可以选择零个或零个以上的列作为维度。You can select zero or more columns as dimensions. 注意:选择非字符串列作为维度时要格外小心。Note: be cautious when selecting a non-string column as a dimension. 可选。Optional.
忽略Ignore 忽略所选列。Ignore the selected column. 可选。Optional. 请参阅下面的文本。See the below text.

如果希望忽略列,建议更新查询或数据源以排除这些列。If you want to ignore columns, we recommend updating your query or data source to exclude those columns. 你还可以使用“忽略列”来忽略列,然后“忽略”特定的列。 You can also ignore columns using Ignore columns and then then Ignore on the specific columns. 如果某个列应当为维度,但被错误地设置为“已忽略”,则指标顾问最终可能会引入部分数据。If a column should be a dimension and is mistakenly set as Ignored , Metrics Advisor may end up ingesting partial data. 例如,假设来自查询的数据如下所示:For example, assume the data from your query is as below:

行 IDRow ID 时间戳Timestamp CountryCountry 语言Language 收入Income
11 2019/11/102019/11/10 中国China ZH-CNZH-CN 1000010000
22 2019/11/102019/11/10 中国China EN-USEN-US 10001000
33 2019/11/102019/11/10 USUS ZH-CNZH-CN 1200012000
44 2019/11/112019/11/11 USUS EN-USEN-US 2300023000
...... ...... ...... ...... ......

如果“国家/地区”是一个维度,而“语言”设置为“已忽略”,则第一行和第二行将具有相同的维度。If Country is a dimension and Language is set as Ignored , then the first and second rows will have the same dimensions. 指标顾问将任意使用两行中的一个值。Metrics Advisor will arbitrarily use one value from the two rows. 在这种情况下,指标顾问不会聚合行。Metrics Advisor will not aggregate the rows in this case.

自动汇总设置Automatic roll up settings

重要

如果想要启用根本原因分析和其他诊断功能,则需要配置“自动汇总设置”。If you'd like to enable root cause analysis and other diagnostic capabilities, the Automatic roll up settings need to be configured. 启用后,将无法更改自动汇总设置。Once enabled, the automatic roll-up settings cannot be changed.

指标顾问可以在引入过程中自动对每个维度执行聚合(例如 SUM、MAX、MIN),然后生成一个用于根本原因分析和其他诊断功能的层次结构。Metrics Advisor can automatically perform aggregation(for example SUM, MAX, MIN) on each dimension during ingestion, then builds a hierarchy which will be used in root case analysis and other diagnostic features.

请考虑下列情形:Consider the following scenarios:

  • 我不需要包含数据的汇总分析。I do not need to include the roll-up analysis for my data.

    你不需要使用指标顾问汇总。You do not need to use the Metrics Advisor roll-up.

  • 我的数据已汇总,并且维度值由以下内容表示:NULL 或空(默认值)、仅 NULL、其他。My data has already rolled up and the dimension value is represented by: NULL or Empty (Default), NULL only, Others.

    此选项意味着指标顾问无需汇总数据,因为行已经求和。This option means Metrics Advisor doesn't need to roll up the data because the rows are already summed. 例如,如果选择“仅 NULL”,则会将以下示例中的第二个数据行视为所有国家/地区和语言 EN-US 的聚合;但是,“国家/地区”为空值的第四个数据行会被视为一个普通行,这可能表示数据不完整。For example, if you select NULL only , then the second data row in the below example will be seen as an aggregation of all countries and language EN-US ; the fourth data row which has an empty value for Country however will be seen as an ordinary row which might indicate incomplete data.

    CountryCountry 语言Language 收入Income
    中国China ZH-CNZH-CN 1000010000
    (NULL)(NULL) EN-USEN-US 999999999999
    USUS EN-USEN-US 1200012000
    EN-USEN-US 50005000
  • 我需要指标顾问通过计算 Sum/Max/Min/Avg/Count 来汇总我的数据并将其表示为 I need Metrics Advisor to roll up my data by calculating Sum/Max/Min/Avg/Count and represent it by

    某些数据源(例如 Cosmos DB 或 Azure Blob 存储)不支持 group by 或 cube 之类的特定计算。Some data sources such as Cosmos DB or Azure Blob Storage do not support certain calculations like group by or cube . 指标顾问提供了汇总选项,用于在引入过程中自动生成多维数据集。Metrics Advisor provides the roll up option to automatically generate a data cube during ingestion. 此选项意味着你需要指标顾问使用你选择的算法来计算汇总,并使用指定的字符串在指标顾问中表示汇总。This option means you need Metrics Advisor to calculate the roll-up using the algorithm you've selected and use the specified string to represent the roll-up in Metrics Advisor. 这不会更改数据源中的任何数据。This won't change any data in your data source. 例如,假设你有一组时序,它们代表维度(国家/地区、区域)的销售指标。For example, suppose you have a set of time series which stands for Sales metrics with the dimension (Country, Region). 对于给定的时间戳,它可能如下所示:For a given timestamp, it might look like the following:

    CountryCountry 区域Region SalesSales
    CanadaCanada AlbertaAlberta 100100
    CanadaCanada British ColumbiaBritish Columbia 500500
    United StatesUnited States MontanaMontana 100100

    在使用“求和”启用自动汇总后,指标顾问会计算维度组合,并在引入数据期间对指标求和。After enabling Auto Roll Up with Sum , Metrics Advisor will calculate the dimension combinations, and sum the metrics during data ingestion. 结果可能是:The result might be:

    CountryCountry 区域Region SalesSales
    CanadaCanada AlbertaAlberta 100100
    NullNULL AlbertaAlberta 100100
    CanadaCanada British ColumbiaBritish Columbia 500500
    NullNULL British ColumbiaBritish Columbia 500500
    United StatesUnited States MontanaMontana 100100
    NullNULL MontanaMontana 100100
    NullNULL NullNULL 700700
    CanadaCanada NullNULL 600600
    United StatesUnited States NullNULL 100100

    (Country=Canada, Region=NULL, Sales=600) 表示在加拿大(所有区域)的销售量总和为 600。(Country=Canada, Region=NULL, Sales=600) means the sum of Sales in Canada (all regions) is 600.

    下面是以 SQL 语言进行的转换。The following is the transformation in SQL language.

    SELECT
        dimension_1,
        dimension_2,
        ...
        dimension_n,
        sum (metrics_1) AS metrics_1,
        sum (metrics_2) AS metrics_2,
        ...
        sum (metrics_n) AS metrics_n
    FROM
        each_timestamp_data
    GROUP BY
        CUBE (dimension_1, dimension_2, ..., dimension_n);
    

    使用自动汇总功能之前,请考虑以下事项:Consider the following before using the Auto roll up feature:

    • 如果要使用“SUM”来聚合数据,请确保你的指标在每个维度中都是可相加的。If you want to use SUM to aggregate your data, make sure your metrics are additive in each dimension. 下面是“不可相加”指标的一些示例:Here are some examples of non-additive metrics:
      • 基于小数的指标。Fraction-based metrics. 这包括比率、百分比等。例如,你不应添加每个州的失业率来计算整个国家/地区的失业率。This includes ratio, percentage, etc. For example, you should not add the unemployment rate of each state to calculate the unemployment rate of the entire country.
      • 维度中的重叠。Overlap in dimension. 例如,你不应该将每项运动中的人数相加来计算喜欢运动的人数,因为其中有重叠,一个人可以喜欢多项运动。For example, you should not add the number of people in to each sport to calculate the number of people who like sports, because there is an overlap between them, one person can like multiple sports.
    • 为了确保整个系统的正常运行,多维数据集的大小是受限的。To ensure the health of the whole system, the size of cube is limited. 目前的限制为 1,000,000。Currently, the limit is 1,000,000. 如果你的数据超过该限制,则对应于该时间戳的引入会失败。If your data exceeds that limit, ingestion will fail for that timestamp.

高级设置Advanced settings

可以通过多个高级设置以自定义方式(例如指定引入偏移或并发)启用数据引入。There are several advanced settings to enable data ingested in a customized way, such as specifying ingestion offset, or concurrency. 有关详细信息,请参阅数据馈送管理文章中的高级设置部分。For more information, see the advanced settings section in the data feed management article.

指定数据馈送的名称并检查引入进度Specify a name for the data feed and check the ingestion progress

为数据馈送提供自定义名称,该名称将显示在工作区中。Give a custom name for the data feed, which will be displayed in your workspace. 然后,单击“提交”。Then click on Submit . 在数据馈送详细信息页中,可以使用引入进度栏查看状态信息。In the data feed details page, you can use the ingestion progress bar to view status information.

引入进度栏

若要检查引入失败详细信息,请执行以下操作:To check ingestion failure details:

  1. 单击“显示详细信息”。Click Show Details .
  2. 单击“状态”,然后选择“失败”或“错误”。Click Status then choose Failed or Error .
  3. 将鼠标悬停在失败的引入上,查看显示的详细信息消息。Hover over a failed ingestion, and view the details message that appears.

检查失败的引入

“失败”状态指示稍后将为此数据源重试引入。A failed status indicates the ingestion for this data source will be retried later. “错误”状态指示指标顾问不会针对此数据源进行重试。An Error status indicates Metrics Advisor won't retry for the data source. 若要重新加载数据,需要手动触发回填/重载。To reload data, you need trigger a backfill/reload manually.

还可以通过单击“刷新进度”来重载某个引入的进度。You can also reload the progress of an ingestion by clicking Refresh Progress . 在数据引入完成后,你可以随时单击指标,查看异常情况检测结果。After data ingestion complete, you're free to click into metrics and check anomaly detection results.

后续步骤Next steps