Azure HDInsight monitoring data reference

This article contains all the monitoring reference information for this service.

See Monitor HDInsight for details on the data you can collect for Azure HDInsight and how to use it.

Metrics

This section lists all the automatically collected platform metrics for this service. These metrics are also part of the global list of all platform metrics supported in Azure Monitor.

For information on metric retention, see Azure Monitor Metrics overview.

Supported metrics for Microsoft.HDInsight/clusters

The following table lists the metrics available for the Microsoft.HDInsight/clusters resource type.

Table headings

  • Metric - The metric display name as it appears in the Azure portal.
  • Name in Rest API - Metric name as referred to in the REST API.
  • Unit - Unit of measure.
  • Aggregation - The default aggregation type. Valid values: Average, Minimum, Maximum, Total, Count.
  • Dimensions - Dimensions available for the metric.
  • Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
  • DS Export- Whether the metric is exportable to Azure Monitor Logs via Diagnostic Settings. For information on exporting metrics, see Create diagnostic settings in Azure Monitor.
Category Metric Name in REST API Unit Aggregation Dimensions Time Grains DS Export
Availability Categorized Gateway Requests

Number of gateway requests by categories (1xx/2xx/3xx/4xx/5xx)
CategorizedGatewayRequests Count Count, Total HttpStatus PT1M, PT1H, P1D Yes
Availability Gateway Requests

Number of gateway requests
GatewayRequests Count Count, Total HttpStatus PT1M, PT1H, P1D Yes
Availability REST proxy Consumer RequestThroughput

Number of consumer requests to Kafka REST proxy
KafkaRestProxy.ConsumerRequest.m1_delta CountPerSecond Total Machine, Topic PT1M, PT1H, P1D Yes
Availability REST proxy Consumer Unsuccessful Requests

Consumer request exceptions
KafkaRestProxy.ConsumerRequestFail.m1_delta CountPerSecond Total Machine, Topic PT1M, PT1H, P1D Yes
Availability REST proxy Consumer RequestLatency

Message latency in a consumer request through Kafka REST proxy
KafkaRestProxy.ConsumerRequestTime.p95 Milliseconds Average Machine, Topic PT1M, PT1H, P1D Yes
Availability REST proxy Consumer Request Backlog

Consumer REST proxy queue length
KafkaRestProxy.ConsumerRequestWaitingInQueueTime.p95 Milliseconds Average Machine, Topic PT1M, PT1H, P1D Yes
Availability REST proxy Producer MessageThroughput

Number of producer messages through Kafka REST proxy
KafkaRestProxy.MessagesIn.m1_delta CountPerSecond Total Machine, Topic PT1M, PT1H, P1D Yes
Availability REST proxy Consumer MessageThroughput

Number of consumer messages through Kafka REST proxy
KafkaRestProxy.MessagesOut.m1_delta CountPerSecond Total Machine, Topic PT1M, PT1H, P1D Yes
Availability REST proxy ConcurrentConnections

Number of concurrent connections through Kafka REST proxy
KafkaRestProxy.OpenConnections Count Total Machine, Topic PT1M, PT1H, P1D Yes
Availability REST proxy Producer RequestThroughput

Number of producer requests to Kafka REST proxy
KafkaRestProxy.ProducerRequest.m1_delta CountPerSecond Total Machine, Topic PT1M, PT1H, P1D Yes
Availability REST proxy Producer Unsuccessful Requests

Producer request exceptions
KafkaRestProxy.ProducerRequestFail.m1_delta CountPerSecond Total Machine, Topic PT1M, PT1H, P1D Yes
Availability REST proxy Producer RequestLatency

Message latency in a producer request through Kafka REST proxy
KafkaRestProxy.ProducerRequestTime.p95 Milliseconds Average Machine, Topic PT1M, PT1H, P1D Yes
Availability REST proxy Producer Request Backlog

Producer REST proxy queue length
KafkaRestProxy.ProducerRequestWaitingInQueueTime.p95 Milliseconds Average Machine, Topic PT1M, PT1H, P1D Yes
Availability Number of Active Workers

Number of Active Workers
NumActiveWorkers Count Average, Maximum, Minimum MetricName PT1M, PT1H, P1D Yes
Availability Pending CPU

Pending CPU Requests in YARN
PendingCPU Count Average, Maximum, Minimum <none> PT1M, PT1H, P1D Yes
Availability Pending Memory

Pending Memory Requests in YARN
PendingMemory Count Average, Maximum, Minimum <none> PT1M, PT1H, P1D Yes

Metric dimensions

For information about what metric dimensions are, see Multi-dimensional metrics.

This service has the following dimensions associated with its metrics.

Dimensions for the Microsoft.HDInsight/clusters table include:

  • HttpStatus
  • Machine
  • Topic
  • MetricName

Resource logs

This section lists the types of resource logs you can collect for this service. The section pulls from the list of all resource logs category types supported in Azure Monitor.

HDInsight doesn't use Azure Monitor resource logs or diagnostic settings. Logs are collected by other methods, including the use of the Log Analytics agent.

Azure Monitor Logs tables

This section lists the Azure Monitor Logs tables relevant to this service, which are available for query by Log Analytics using Kusto queries. The tables contain resource log data and possibly more depending on what is collected and routed to them.

HDInsight Clusters

Microsoft.HDInsight/Clusters

The available logs and metrics vary depending on your HDInsight cluster type.

Log table mapping

The new Azure Monitor integration implements new tables in the Log Analytics workspace. The following tables show the log table mappings from the classic Azure Monitor integration to the new one.

The New table column shows the name of the new table. The Description row describes the type of logs/metrics that are available in this table. The Classic table column is a list of all the tables from the classic Azure Monitor integration whose data is now present in the new table.

Note

Some tables are completely new and not based on previous tables.

General workload tables

New table Description Classic table
HDInsightAmbariSystemMetrics System metrics collected from Ambari. The metrics now come from each node in the cluster (except for edge nodes) instead of just the two headnodes. Each metric is now a column and each metric is reported once per record. metrics_cpu_nice_cl, metrics_cpu_system_cl, metrics_cpu_user_cl, metrics_memory_cache_CL, metrics_memory_swap_CL, metrics_memory_total_CLmetrics_memory_buffer_CL, metrics_load_1min_CL, metrics_load_cpu_CL, metrics_load_nodes_CL, metrics_load_procs_CL, metrics_network_in_CL, metrics_network_out_CL
HDInsightAmbariClusterAlerts Ambari Cluster Alerts from each node in the cluster (except for edge nodes). Each alert is a record in this table. metrics_cluster_alerts_CL
HDInsightSecurityLogs Records from the Ambari Audit and Auth Logs. log_ambari_audit_CL, log_auth_CL
HDInsightRangerAuditLogs All records from the Ranger Audit log for ESP clusters. ranger_audit_logs_CL
HDInsightGatewayAuditLogs_CL The Gateway nodes audit information. Same format as the classic table, and still located in the Custom Logs section. log_gateway_Audit_CL

Spark workload

Note

Spark application related tables have been replaced with 11 new Spark tables that give more in-depth information about your Spark workloads.

New table Description Classic table
HDInsightSparkLogs All logs related to Spark and its related component: Livy and Jupyter. log_livy_CL, log_jupyter_CL, log_spark_CL, log_sparkappsexecutors_CL, log_sparkappsdrivers_CL
HDInsightSparkApplicationEvents Event information for Spark Applications including Submission and Completion time, App ID, and AppName. Useful for keeping track of when applications started and completed.
HDInsightSparkBlockManagerEvents Event information related to Spark's Block Manager. Includes information such as executor memory usage.
HDInsightSparkEnvironmentEvents Event information related to the Environment an application executes in including, Spark Deploy Mode, Master, and information about the Executor.
HDInsightSparkExecutorEvents Event information about the Spark Executor usage for by an Application.
HDInsightSparkExtraEvents Event information that doesn't fit into any other Spark table.
HDInsightSparkJobEvents Information about Spark Jobs including their start and end times, result, and associated stages.
HDInsightSparkSqlExecutionEvents Event information on Spark SQL Queries including their plan info and description and start and end times.
HDInsightSparkStageEvents Event information for Spark Stages including their start and completion times, failure status, and detailed execution information.
HDInsightSparkStageTaskAccumulables Performance metrics for stages and tasks.
HDInsightTaskEvents Event information for Spark Tasks including start and completion time, associated stages, execution status, and task type.
HDInsightJupyterNotebookEvents Event information for Jupyter Notebooks.

Hadoop/YARN workload

New table Description Classic table
HDInsightHadoopAndYarnMetrics JMX metrics from the Hadoop and YARN frameworks. Contains all the same JMX metrics as the previous Custom Logs tables, plus more important metrics: Timeline Server, Node Manager, and Job History Server. Contains one metric per record. metrics_resourcemanager_clustermetrics_CL, metrics_resourcemanager_jvm_CL, metrics_resourcemanager_queue_root_CL, metrics_resourcemanager_queue_root_joblauncher_CL, metrics_resourcemanager_queue_root_default_CL, metrics_resourcemanager_queue_root_thriftsvr_CL
HDInsightHadoopAndYarnLogs All logs generated from the Hadoop and YARN frameworks. log_mrjobsummary_CL, log_resourcemanager_CL, log_timelineserver_CL, log_nodemanager_CL

Hive/LLAP workload

New table Description Classic table
HDInsightHiveAndLLAPMetrics JMX metrics from the Hive and LLAP frameworks. Contains all the same JMX metrics as the previous Custom Logs tables, one metric per record. llap_metrics_hiveserver2_CL, llap_metrics_hs2_metrics_subsystemllap_metrics_jvm_CL, llap_metrics_llap_daemon_info_CL, llap_metrics_buddy_allocator_info_CL, llap_metrics_deamon_jvm_CL, llap_metrics_io_CL, llap_metrics_executor_metrics_CL, llap_metrics_metricssystem_stats_CL, llap_metrics_cache_CL
HDInsightHiveAndLLAPLogs Logs generated from Hive, LLAP, and their related components: WebHCat and Zeppelin. log_hivemetastore_CL log_hiveserver2_CL, log_hiveserve2interactive_CL, log_webhcat_CL, log_zeppelin_zeppelin_CL

Kafka workload

New table Description Classic table
HDInsightKafkaMetrics JMX metrics from Kafka. Contains all the same JMX metrics as the old Custom Logs tables, plus other important metrics. One metric per record. metrics_kafka_CL
HDInsightKafkaLogs All logs generated from the Kafka Brokers. log_kafkaserver_CL, log_kafkacontroller_CL

HBase workload

New table Description Classic table
HDInsightHBaseMetrics JMX metrics from HBase. Contains all the same JMX metrics from the previous tables. In contrast with the previous tables, each row contains one metric. metrics_regionserver_CL, metrics_regionserver_wal_CL, metrics_regionserver_ipc_CL, metrics_regionserver_os_CL, metrics_regionserver_replication_CL, metrics_restserver_CL, metrics_restserver_jvm_CL, metrics_hmaster_assignmentmanager_CL, metrics_hmaster_ipc_CL, metrics_hmaser_os_CL, metrics_hmaster_balancer_CL, metrics_hmaster_jvm_CL, metrics_hmaster_CL, metrics_hmaster_fs_CL
HDInsightHBaseLogs Logs from HBase and its related components: Phoenix and HDFS. log_regionserver_CL, log_restserver_CL, log_phoenixserver_CL, log_hmaster_CL, log_hdfsnamenode_CL, log_garbage_collector_CL

Oozie workload

New table Description Classic table
HDInsightOozieLogs All logs generated from the Oozie framework. Log_oozie_CL

Activity log

The linked table lists the operations that may be recorded in the activity log for this service. This is a subset of all the possible resource provider operations in the activity log.

For more information on the schema of activity log entries, see Activity Log schema.