使用异常检测器 API 时的最佳做法Best practices for using the Anomaly Detector API

异常检测器 API 是一项无状态异常情况检测服务。The Anomaly Detector API is a stateless anomaly detection service. 其结果的准确度和表现可能受以下因素影响:The accuracy and performance of its results can be impacted by:

  • 准备时序数据的方式。How your time series data is prepared.
  • 使用的异常检测器 API 参数。The Anomaly Detector API parameters that were used.
  • API 请求中的数据点数目。The number of data points in your API request.

通过本文了解使用 API 针对你的数据获取最佳结果的最佳做法。Use this article to learn about best practices for using the API to get the best results for your data.

何时使用批量(全部数据)或最新(最新数据)点异常情况检测When to use batch (entire) or latest (last) point anomaly detection

异常检测器 API 的批量检测终结点允许你通过完整的时序数据来检测异常。The Anomaly Detector API's batch detection endpoint lets you detect anomalies through your entire times series data. 在此检测模式下,需创建单个统计模型,并将其应用于数据集中的每个点。In this detection mode, a single statistical model is created and applied to each point in the data set. 如果时序具有以下特征,建议使用批量检测在一个 API 调用中预览数据。If your time series has the below characteristics, we recommend using batch detection to preview your data in one API call.

  • 周期性时序,偶尔会出现异常。A seasonal time series, with occasional anomalies.
  • 平缓趋势时序,偶尔会出现峰值/谷值。A flat trend time series, with occasional spikes/dips.

不建议在实时数据监视中使用批量异常情况检测,也不建议在不具备上述特征的时序数据中使用它。We don't recommend using batch anomaly detection for real-time data monitoring, or using it on time series data that doesn't have the above characteristics.

  • 批量检测仅创建并应用一个模型,对每个点的检测都是在整个序列的上下文中完成的。Batch detection creates and applies only one model, the detection for each point is done in the context of the whole series. 如果时序数据上下起伏且没有周期性,则模型可能会错过一些变化点(数据中的峰值和谷值)。If the time series data trends up and down without seasonality, some points of change (dips and spikes in the data) may be missed by the model. 类似地,如果某些变化点的变化不如数据集中后面的变化点的变化那么明显,系统可能会认为它们的变化不够明显,因此不会将它们纳入模型中。Similarly, some points of change that are less significant than ones later in the data set may not be counted as significant enough to be incorporated into the model.

  • 进行实时数据监视时,由于分析的点数量较多,因此批量检测比检测最新点的异常状态要慢。Batch detection is slower than detecting the anomaly status of the latest point when doing real-time data monitoring, because of the number of points being analyzed.

对于实时数据监视,建议只检测最新数据点的异常状态。For real-time data monitoring, we recommend detecting the anomaly status of your latest data point only. 通过持续应用最新点检测,可以更有效、更准确地进行流数据监视。By continuously applying latest point detection, streaming data monitoring can be done more efficiently and accurately.

下面的示例描述了这些检测模式可能会对性能造成的影响。The example below describes the impact these detection modes can have on performance. 第一张图显示了沿着 28 个先前看到的数据点连续检测最新点异常状态的结果。The first picture shows the result of continuously detecting the anomaly status latest point along 28 previously seen data points. 红色点为异常。The red points are anomalies.

图中显示了使用最新点进行的异常情况检测

下面是使用批量异常情况检测的同一数据集。Below is the same data set using batch anomaly detection. 为此操作构建的模型忽略了几个异常(使用矩形进行了标记)。The model built for the operation has ignored several anomalies, marked by rectangles.

图中显示了使用批量检测方法的异常情况检测

数据准备工作Data preparation

异常检测器 API 接受格式化为 JSON 请求对象的时序数据。The Anomaly Detector API accepts time series data formatted into a JSON request object. 时序可以是在一段时间内按顺序记录的任何数值数据。A time series can be any numerical data recorded over time in sequential order. 你可以将各个时段内的时序数据发送到异常检测器 API 终结点,以提高 API 的性能。You can send windows of your time series data to the Anomaly Detector API endpoint to improve the API's performance. 可以发送的最少数据点数为 12 个,最多数据点数为 8640 个。The minimum number of data points you can send is 12, and the maximum is 8640 points. 粒度定义为数据采样速率。Granularity is defined as the rate that your data is sampled at.

发送到异常检测器 API 的数据点必须具有一个有效的协调世界时 (UTC) 时间戳和一个数值。Data points sent to the Anomaly Detector API must have a valid Coordinated Universal Time (UTC) timestamp, and a numerical value.

{
    "granularity": "daily",
    "series": [
      {
        "timestamp": "2018-03-01T00:00:00Z",
        "value": 32858923
      },
      {
        "timestamp": "2018-03-02T00:00:00Z",
        "value": 29615278
      },
    ]
}

如果数据按非标准时间间隔采样,可以通过在请求中添加 customInterval 属性来指定该时间间隔。If your data is sampled at a non-standard time interval, you can specify it by adding the customInterval attribute in your request. 例如,如果你的序列每 5 分钟进行一次采样,则可将以下内容添加到 JSON 请求:For example, if your series is sampled every 5 minutes, you can add the following to your JSON request:

{
    "granularity" : "minutely", 
    "customInterval" : 5
}

缺少数据点Missing data points

在均匀分布的时序数据集中,尤其是在细粒度(采样时间间隔很小,例如,每几分钟对数据进行一次采样)Missing data points are common in evenly distributed time series data sets, especially ones with a fine granularity (A small sampling interval. 的时序数据集中,缺少数据点的情况很常见。For example, data sampled every few minutes). 缺少的点数少于数据中预期点数的 10% 不会对检测结果产生负面影响。Missing less than 10% of the expected number of points in your data shouldn't have a negative impact on your detection results. 请考虑根据数据的特征来填补数据中的空白,例如,将其替换为早期的数据点、线性内插值或移动平均值。Consider filling gaps in your data based on its characteristics like substituting data points from an earlier period, linear interpolation, or a moving average.

聚合分布的数据Aggregate distributed data

异常检测器 API 最适用于均匀分布的时序。The Anomaly Detector API works best on an evenly distributed time series. 如果数据是随机分布的,则应按某个时间单位(例如每分钟、每小时或每天)进行聚合。If your data is randomly distributed, you should aggregate it by a unit of time, such as Per-minute, hourly, or daily.

对具有周期性模式的数据进行异常情况检测Anomaly detection on data with seasonal patterns

如果你知道时序数据具有周期性模式(定期出现的模式),则可提高准确度和缩短 API 响应时间。If you know that your time series data has a seasonal pattern (one that occurs at regular intervals), you can improve the accuracy and API response time.

在构造 JSON 请求时指定 period 可以将异常情况检测延迟降低多达 50%。Specifying a period when you construct your JSON request can reduce anomaly detection latency by up to 50%. period 是一个整数,它指定时序大约采用多少个数据点来重复某个模式。The period is an integer that specifies roughly how many data points the time series takes to repeat a pattern. 例如,每天有一个数据点的时序的 period7,而每小时有一个数据点的时序(具有相同的每周模式)的 period7*24For example, a time series with one data point per day would have a period as 7, and a time series with one point per hour (with the same weekly pattern) would have a period of 7*24. 如果你不确定数据的模式,则无需指定此参数。If you're unsure of your data's patterns, you don't have to specify this parameter.

为获得最佳结果,请提供四个 period 的数据点,再额外添加一个数据点。For best results, provide four period's worth of data point, plus an additional one. 例如,上述具有每周模式的每小时数据应当在请求正文中提供 673 个数据点 (7 * 24 * 4 + 1)。For example, hourly data with a weekly pattern as described above should provide 673 data points in the request body (7 * 24 * 4 + 1).

对数据进行采样以便实时监视Sampling data for real-time monitoring

如果按较短的时间间隔(例如秒或分钟)对流数据进行采样,则发送建议数量的数据点可能会超过异常检测器 API 的最大允许数量(8640 个数据点)。If your streaming data is sampled at a short interval (for example seconds or minutes), sending the recommended number of data points may exceed the Anomaly Detector API's maximum number allowed (8640 data points). 如果你的数据表现出稳定的周期性模式,请考虑以较大的时间间隔(例如小时)发送时序数据的样本。If your data shows a stable seasonal pattern, consider sending a sample of your time series data at a larger time interval, like hours. 以这种方式对数据进行采样还可以显著缩短 API 响应时间。Sampling your data in this way can also noticeably improve the API response time.

后续步骤Next steps