收集和分析 Azure 认知搜索的日志数据Collect and analyze log data for Azure Cognitive Search

诊断或操作日志提供 Azure 认知搜索的详细操作的见解,可用于监视服务和工作负荷流程。Diagnostic or operational logs provide insight into the detailed operations of Azure Cognitive Search and are useful for monitoring service and workload processes. 在内部,某些系统信息会短时存在于后端,但足以进行调查和分析(如果你提交了支持票证)。Internally, some system information exists on the backend for a short period of time, sufficient for investigation and analysis if you file a support ticket. 但是,如果想要自我掌控操作数据,则应配置诊断设置以指定要从何处收集日志记录信息。However, if you want self-direction over operational data, you should configure a diagnostic setting to specify where logging information is collected.

诊断日志记录功能是通过与 Azure Monitor 集成实现的。Diagnostic logging is enabled through integration with Azure Monitor.

设置诊断日志记录时,系统将要求你指定存储机制。When you set up diagnostic logging, you will be asked to specify a storage mechanism. 下表列举了用于收集和保留数据的选项。The following table enumerates options for collecting and persisting data.

资源Resource 用途Used for
发送到 Log Analytics 工作区Send to Log Analytics workspace 事件和指标将发送到 Log Analytics 工作区,可在门户中查询该工作区以返回详细信息。Events and metrics are sent to a Log Analytics workspace, which can be queried in the portal to return detailed information. 有关介绍,请参阅 Azure Monitor 日志入门For an introduction, see Get started with Azure Monitor logs
使用 Blob 存储进行存档Archive with Blob storage 事件和指标将存档到 Blob 容器,并存储在 JSON 文件中。Events and metrics are archived to a Blob container and stored in JSON files. 日志可以精确到小时/分钟,对于调查特定的事件非常有用,但不适合用于无目标性的调查。Logs can be quite granular (by the hour/minute), useful for researching a specific incident but not for open-ended investigation. 使用 JSON 编辑器查看原始日志文件,或使用 Power BI 来聚合与可视化日志数据。Use a JSON editor to view a raw log file or Power BI to aggregate and visualize log data.
流式传输到事件中心Stream to Event Hub 事件和指标将流式传输到 Azure 事件中心服务。Events and metrics are streamed to an Azure Event Hubs service. 对于很大的日志,请选择此项作为备用数据收集服务。Choose this as an alternative data collection service for very large logs.

先决条件Prerequisites

提前创建资源,以便在配置诊断日志记录时可以选择一个或多个资源。Create resources in advance so that you can select one or more when configuring diagnostic logging.

启用数据收集Enable data collection

诊断设置指定如何收集记录的事件和指标。Diagnostic settings specify how logged events and metrics are collected.

  1. 在“监视”下,选择“诊断设置” 。Under Monitoring, select Diagnostic settings.

    诊断设置Diagnostic settings

  2. 选择“+ 添加诊断设置”Select + Add diagnostic setting

  3. 选中“Log Analytics”,选择你的工作区,然后选择“OperationLogs”和“AllMetrics”。 Check Log Analytics, select your workspace, and select OperationLogs and AllMetrics.

    配置数据收集Configure data collection

  4. 保存设置。Save the setting.

  5. 启用日志记录后,使用搜索服务开始生成日志和指标。After logging has been enabled, use your search service to start generating logs and metrics. 记录的事件和指标需在一段时间后才可供使用。It will take time before logged events and metrics become available.

对于 Log Analytics,数据将在几分钟后可供使用,然后可以运行 Kusto 查询来返回数据。For Log Analytics, it will be several minutes before data is available, after which you can run Kusto queries to return data. 有关详细信息,请参阅监视查询请求For more information, see Monitor query requests.

对于 Blob 存储,容器将在一小时后出现在 Blob 存储中。For Blob storage, it takes one hour before the containers will appear in Blob storage. 每个容器每小时会有一个 blob。There is one blob, per hour, per container. 仅当存在要记录或度量的活动时,才会创建容器。Containers are only created when there is an activity to log or measure. 将数据复制到存储帐户时,数据会被格式化为 JSON 并置于两个容器中:When the data is copied to a storage account, the data is formatted as JSON and placed in two containers:

  • insights-logs-operationlogs:用于搜索流量日志insights-logs-operationlogs: for search traffic logs
  • insights-metrics-pt1m:用于指标insights-metrics-pt1m: for metrics

查询日志信息Query log information

两个表包含 Azure 认知搜索的日志和指标:AzureDiagnosticsAzureMetricsTwo tables contain logs and metrics for Azure Cognitive Search: AzureDiagnostics and AzureMetrics.

  1. 在“监视”下选择“日志”。 Under Monitoring, select Logs.

  2. 在查询窗口中输入 AzureMetricsEnter AzureMetrics in the query window. 请运行此简单查询来熟悉此表中收集的数据。Run this simple query to get acquainted with the data collected in this table. 滚动浏览整个表以查看指标和值。Scroll across the table to view metrics and values. 请注意顶部的记录计数。如果服务已收集了一段时间的指标,你可以调整时间间隔以获取可管理的数据集。Notice the record count at the top, and if your service has been collecting metrics for a while, you might want to adjust the time interval to get a manageable data set.

    AzureMetrics 表AzureMetrics table

  3. 输入以下查询以返回表格式结果集。Enter the following query to return a tabular result set.

    AzureMetrics
     | project MetricName, Total, Count, Maximum, Minimum, Average
    
  4. AzureDiagnostics 开始重复前面的步骤,以返回所有列供参考,然后运行一个更有选择性的查询来提取更有意义的信息。Repeat the previous steps, starting with AzureDiagnostics to return all columns for informational purposes, followed by a more selective query that extracts more interesting information.

    AzureDiagnostics
    | project OperationName, resultSignature_d, DurationMs, Query_s, Documents_d, IndexName_s
    | where OperationName == "Query.Search" 
    

    AzureDiagnostics 表AzureDiagnostics table

Kusto 查询示例Kusto query examples

如果启用了诊断日志记录,可以查询 AzureDiagnostics,以获取在服务中运行的操作的列表及其运行时间。If you enabled diagnostic logging, you can query AzureDiagnostics for a list of operations that ran on your service and when. 还可以关联活动来调查性能的变化。You can also correlate activity to investigate changes in performance.

示例:列出操作Example: List operations

返回操作的列表以及每个操作的计数。Return a list of operations and a count of each one.

AzureDiagnostics
| summarize count() by OperationName

示例:关联操作Example: Correlate operations

将查询请求关联到索引编制操作,并在时间图表中呈现数据点,以确定操作是否一致。Correlate query request with indexing operations, and render the data points across a time chart to see operations coincide.

AzureDiagnostics
| summarize OperationName, Count=count()
| where OperationName in ('Query.Search', 'Indexing.Index')
| summarize Count=count(), AvgLatency=avg(DurationMs) by bin(TimeGenerated, 1h), OperationName
| render timechart

记录的操作Logged operations

Azure Monitor 捕获的记录事件包括与索引编制和查询相关的事件。Logged events captured by Azure Monitor include those related to indexing and queries. Log Analytics 中的 AzureDiagnostics 表收集与查询和索引编制相关的操作数据。The AzureDiagnostics table in Log Analytics collects operational data related to queries and indexing.

OperationNameOperationName 说明Description
ServiceStatsServiceStats 此操作是对获取服务统计信息的例行调用(直接调用或隐式调用),以便在加载或刷新门户概述页时在其中填充信息。This operation is a routine call to Get Service Statistics, either called directly or implicitly to populate a portal overview page when it is loaded or refreshed.
Query.SearchQuery.Search 针对索引的查询请求。有关记录的查询的信息,请参阅监视查询Query requests against an index See Monitor queries for information about logged queries.
Indexing.IndexIndexing.Index 此操作是对添加、更新或删除文档的调用。This operation is a call to Add, Update or Delete Documents.
indexes.Prototypeindexes.Prototype 这是导入数据向导创建的索引。This is an index created by the Import Data wizard.
Indexers.CreateIndexers.Create 通过导入数据向导显式或隐式创建索引器。Create an indexer explicitly or implicitly through the Import Data wizard.
Indexers.GetIndexers.Get 每当运行索引器,就会返回该索引器的名称。Returns the name of an indexer whenever the indexer is run.
Indexers.StatusIndexers.Status 每当运行索引器,就会返回该索引器的状态。Returns the status of an indexer whenever the indexer is run.
DataSources.GetDataSources.Get 每当运行索引器,就会返回数据源的名称。Returns the name of the data source whenever an indexer is run.
Indexes.GetIndexes.Get 每当运行索引器,就会返回索引的名称。Returns the name of an index whenever an indexer is run.

日志架构Log schema

如果要生成自定义报表,则包含 Azure 认知搜索日志数据的数据结构遵从以下架构。If you are building custom reports, the data structures that contain Azure Cognitive Search log data conform to the schema below. 对于 Blob 存储,每个 Blob 都有一个名为 records 的根对象,其中包含一系列日志对象。For Blob storage, each blob has one root object called records containing an array of log objects. 每个 Blob 包含同一小时内发生的所有操作的记录。Each blob contains records for all the operations that took place during the same hour.

下表是资源日志记录常用字段的部分列表。The following table is a partial list of fields common to resource logging.

名称Name 类型Type 示例Example 注释Notes
timeGeneratedtimeGenerated datetimedatetime "2018-12-07T00:00:43.6872559Z""2018-12-07T00:00:43.6872559Z" 操作的时间戳Timestamp of the operation
ResourceIdresourceId 字符串string “/SUBSCRIPTIONS/11111111-1111-1111-1111-111111111111/"/SUBSCRIPTIONS/11111111-1111-1111-1111-111111111111/
RESOURCEGROUPS/DEFAULT/PROVIDERS/RESOURCEGROUPS/DEFAULT/PROVIDERS/
MICROSOFT.SEARCH/SEARCHSERVICES/SEARCHSERVICE”MICROSOFT.SEARCH/SEARCHSERVICES/SEARCHSERVICE"
ResourceIdYour ResourceId
operationNameoperationName stringstring “Query.Search”"Query.Search" 操作的名称The name of the operation
operationVersionoperationVersion stringstring "2020-06-30""2020-06-30" 使用的 api-versionThe api-version used
categorycategory stringstring “OperationLogs”"OperationLogs" constantconstant
resultTyperesultType stringstring “Success”"Success" 可能的值:Success 或 FailurePossible values: Success or Failure
resultSignatureresultSignature intint 200200 HTTP 结果代码HTTP result code
durationMSdurationMS intint 5050 操作持续时间,以毫秒为单位Duration of the operation in milliseconds
propertiesproperties objectobject 请参阅下表see the following table 包含特定于操作的数据的对象Object containing operation-specific data

属性架构Properties schema

以下属性特定于 Azure 认知搜索。The properties below are specific to Azure Cognitive Search.

名称Name 类型Type 示例Example 注释Notes
Description_sDescription_s stringstring “GET /indexes('content')/docs”"GET /indexes('content')/docs" 操作的终结点The operation's endpoint
Documents_dDocuments_d intint 4242 处理的文档数目Number of documents processed
IndexName_sIndexName_s stringstring "test-index""test-index" 与操作关联的索引名称Name of the index associated with the operation
Query_sQuery_s stringstring "?search=AzureSearch&$count=true&api-version=2020-06-30""?search=AzureSearch&$count=true&api-version=2020-06-30" 查询参数The query parameters

度量值架构Metrics schema

按一分钟间隔捕获和度量查询请求的指标。Metrics are captured for query requests and measured in one minute intervals. 每个度量值都会显示每分钟的最小、最大和平均值。Every metric exposes minimum, maximum and average values per minute. 有关详细信息,请参阅监视查询请求For more information, see Monitor query requests.

名称Name 类型Type 示例Example 注释Notes
ResourceIdresourceId 字符串string “/SUBSCRIPTIONS/11111111-1111-1111-1111-111111111111/"/SUBSCRIPTIONS/11111111-1111-1111-1111-111111111111/
RESOURCEGROUPS/DEFAULT/PROVIDERS/RESOURCEGROUPS/DEFAULT/PROVIDERS/
MICROSOFT.SEARCH/SEARCHSERVICES/SEARCHSERVICE”MICROSOFT.SEARCH/SEARCHSERVICES/SEARCHSERVICE"
资源 IDyour resource ID
metricNamemetricName stringstring “Latency”"Latency" 度量值名称the name of the metric
timetime datetimedatetime "2018-12-07T00:00:43.6872559Z""2018-12-07T00:00:43.6872559Z" 操作的时间戳the operation's timestamp
averageaverage intint 6464 指标时间间隔内原始样本的平均值,单位为秒或百分比,具体取决于指标。The average value of the raw samples in the metric time interval, units in seconds or percentage, depending on the metric.
最小值minimum intint 3737 指标时间间隔内原始样本的最小值,单位为秒。The minimum value of the raw samples in the metric time interval, units in seconds.
最大值maximum intint 7878 指标时间间隔内原始样本的最大值,单位为秒。The maximum value of the raw samples in the metric time interval, units in seconds.
totaltotal intint 258258 指标时间间隔内原始样本的总计值,单位为秒。The total value of the raw samples in the metric time interval, units in seconds.
countcount intint 44 在一分钟间隔内从节点发出到日志的指标数。The number of metrics emitted from a node to the log within the one minute interval.
timegraintimegrain stringstring “PT1M”"PT1M" 采用 ISO 8601 的指标时间粒度。The time grain of the metric in ISO 8601.

查询往往在若干毫秒内即可完成执行,因此,指标中仅显示以秒度量的查询,例如 QPS。It's common for queries to execute in milliseconds, so only queries that measure as seconds will appear in metric like QPS.

对于“每秒搜索查询数”指标,最小值是该分钟内已注册的每秒搜索查询次数最低值。For the Search Queries Per Second metric, minimum is the lowest value for search queries per second that was registered during that minute. 最大值也是如此。The same applies to the maximum value. 平均值是一分钟内的聚合值。Average, is the aggregate across the entire minute. 例如,在一分钟内可能出现如下所述的模式:有 1 秒出现高负载(这是 SearchQueriesPerSecond 的最大值),紧接着有 58 秒的平均负载,最后 1 秒只有 1 个查询(这是最小值)。For example, within one minute, you might have a pattern like this: one second of high load that is the maximum for SearchQueriesPerSecond, followed by 58 seconds of average load, and finally one second with only one query, which is the minimum.

对于“受限制的搜索查询百分比”、最小值、最大值、平均值和总计,全都具有相同的值:在一分钟内的搜索查询总数中,已限制搜索查询百分比。For Throttled Search Queries Percentage, minimum, maximum, average and total, all have the same value: the percentage of search queries that were throttled, from the total number of search queries during one minute.

查看原始日志文件View raw log files

Blob 存储用于存档日志文件。Blob storage is used for archiving log files. 可以使用任何 JSON 编辑器来查看日志文件。You can use any JSON editor to view the log file. 如果没有编辑器,建议使用 Visual Studio CodeIf you don't have one, we recommend Visual Studio Code.

  1. 在 Azure 门户中打开存储帐户。In Azure portal, open your Storage account.

  2. 在左侧导航窗格中,单击“Blob”。In the left-navigation pane, click Blobs. 此时会看到 insights-logs-operationlogsinsights-metrics-pt1mYou should see insights-logs-operationlogs and insights-metrics-pt1m. 这些容器是在将日志数据导出到 Blob 存储时由 Azure 认知搜索创建的。These containers are created by Azure Cognitive Search when the log data is exported to Blob storage.

  3. 单击文件夹层次结构,直至找到 .json 文件。Click down the folder hierarchy until you reach the .json file. 通过上下文菜单来下载文件。Use the context-menu to download the file.

待文件下载完以后,在 JSON 编辑器中将其打开即可查看内容。Once the file is downloaded, open it in a JSON editor to view the contents.

后续步骤Next steps

如果尚未这样做,请查看搜索服务监视基础知识,以全方面地了解监督功能。If you haven't done so already, review the fundamentals of search service monitoring to learn about the full range of oversight capabilities.