Azure Machine Learning monitoring data reference

This article contains all the monitoring reference information for this service.

See Monitor Machine Learning for details on the data you can collect for Azure Machine Learning and how to use it.

Metrics

This section lists all the automatically collected platform metrics for this service.

For information on metric retention, see Azure Monitor Metrics overview. The resource provider for these metrics is Microsoft.MachineLearningServices/workspaces.

The metrics categories are Model, Quota, Resource, Run, and Traffic. Quota information is for Machine Learning compute only. Run provides information on training runs for the workspace.

Supported metrics for Microsoft.MachineLearningServices/workspaces

The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces resource type.

  • All columns might not be present in every table.
  • Some columns might be beyond the viewing area of the page. Select Expand table to view all available columns.

Table headings

  • Category - The metrics group or classification.
  • Metric - The metric display name as it appears in the Azure portal.
  • Name in REST API - The metric name as referred to in the REST API.
  • Unit - Unit of measure.
  • Aggregation - The default [aggregation]/azure-monitor/essentials/metrics-aggregation-explained) type. Valid values: Average (Avg), Minimum (Min), Maximum (Max), Total (Sum), Count.
  • Dimensions - Dimensions available for the metric.
  • Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
  • DS Export- Whether the metric is exportable to Azure Monitor Logs via diagnostic settings. For information on exporting metrics, see Create diagnostic settings in Azure Monitor.

Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints

The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces/onlineEndpoints resource type.

  • All columns might not be present in every table.
  • Some columns might be beyond the viewing area of the page. Select Expand table to view all available columns.

Table headings

  • Category - The metrics group or classification.
  • Metric - The metric display name as it appears in the Azure portal.
  • Name in REST API - The metric name as referred to in the REST API.
  • Unit - Unit of measure.
  • Aggregation - The default [aggregation]/azure-monitor/essentials/metrics-aggregation-explained) type. Valid values: Average (Avg), Minimum (Min), Maximum (Max), Total (Sum), Count.
  • Dimensions - Dimensions available for the metric.
  • Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
  • DS Export- Whether the metric is exportable to Azure Monitor Logs via diagnostic settings. For information on exporting metrics, see Create diagnostic settings in Azure Monitor.

Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments

The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource type.

  • All columns might not be present in every table.
  • Some columns might be beyond the viewing area of the page. Select Expand table to view all available columns.

Table headings

  • Category - The metrics group or classification.
  • Metric - The metric display name as it appears in the Azure portal.
  • Name in REST API - The metric name as referred to in the REST API.
  • Unit - Unit of measure.
  • Aggregation - The default [aggregation]/azure-monitor/essentials/metrics-aggregation-explained) type. Valid values: Average (Avg), Minimum (Min), Maximum (Max), Total (Sum), Count.
  • Dimensions - Dimensions available for the metric.
  • Time Grains - Intervals at which the metric is sampled. For example, PT1M indicates that the metric is sampled every minute, PT30M every 30 minutes, PT1H every hour, and so on.
  • DS Export- Whether the metric is exportable to Azure Monitor Logs via diagnostic settings. For information on exporting metrics, see Create diagnostic settings in Azure Monitor.

For information about what metric dimensions are, see Multi-dimensional metrics. This service has the following dimensions associated with its metrics. | Dimension | Description | | ---- | ---- | | Cluster Name | The name of the compute cluster resource. Available for all quota metrics. | | Vm Family Name | The name of the VM family used by the cluster. Available for quota utilization percentage. | | Vm Priority | The priority of the VM. Available for quota utilization percentage. | CreatedTime | Only available for CpuUtilization and GpuUtilization. | | DeviceId | ID of the device (GPU). Only available for GpuUtilization. | | NodeId | ID of the node created where job is running. Only available for CpuUtilization and GpuUtilization. | | RunId | ID of the run/job. Only available for CpuUtilization and GpuUtilization. | | ComputeType | The compute type that the run used. Only available for Completed runs, Failed runs, and Started runs. | | PipelineStepType | The type of PipelineStep used in the run. Only available for Completed runs, Failed runs, and Started runs. | | PublishedPipelineId | The ID of the published pipeline used in the run. Only available for Completed runs, Failed runs, and Started runs. | | RunType | The type of run. Only available for Completed runs, Failed runs, and Started runs. |

The valid values for the RunType dimension are:

Value Description
Experiment Non-pipeline runs.
PipelineRun A pipeline run, which is the parent of a StepRun.
StepRun A run for a pipeline step.
ReusedStepRun A run for a pipeline step that reuses a previous run.

Resource logs

This section lists the types of resource logs you can collect for this service. The section pulls from the list of all resource logs category types supported in Azure Monitor.

Azure Monitor Logs tables

This section refers to all of the Azure Monitor Logs tables relevant to this service, which are available for query by Log Analytics using Kusto queries.

This service uses the following tables to store resource log data.

Machine Learning

Microsoft.MachineLearningServices/workspaces

  • AzureActivity
  • AMLOnlineEndpointConsoleLog
  • AMLOnlineEndpointTrafficLog
  • AMLOnlineEndpointEventLog
  • AzureMetrics
  • AMLComputeClusterEvent
  • AMLComputeClusterNodeEvent
  • AMLComputeJobEvent
  • AMLRunStatusChangedEvent
  • AMLComputeCpuGpuUtilization
  • AMLComputeInstanceEvent
  • AMLDataLabelEvent
  • AMLDataSetEvent
  • AMLDataStoreEvent
  • AMLDeploymentEvent
  • AMLEnvironmentEvent
  • AMLInferencingEvent
  • AMLModelsEvent
  • AMLPipelineEvent
  • AMLRunEvent

Microsoft.MachineLearningServices/registries

  • AzureActivity
  • AmlRegistryReadEventsLog
  • AmlRegistryWriteEventsLog

Activity log

The linked table lists the operations that can be recorded in the activity log for this service. These operations are a subset of all the possible resource provider operations in the activity log.

For more information on the schema of activity log entries, see Activity Log schema.

The following table lists some operations related to Machine Learning that may be created in the activity log. For a complete listing of Microsoft.MachineLearningServices operations, see Microsoft.MachineLearningServices resource provider operations.

Operation Description
Creates or updates a Machine Learning workspace A workspace was created or updated
CheckComputeNameAvailability Check if a compute name is already in use
Creates or updates the compute resources A compute resource was created or updated
Deletes the compute resources A compute resource was deleted
List secrets On operation listed secrets for a Machine Learning workspace

Log schemas

Azure Machine Learning uses the following schemas.

AmlComputeJobEvent table

Property Description
TimeGenerated Time when the log entry was generated
OperationName Name of the operation associated with the log event
Category Name of the log event
JobId ID of the Job submitted
ExperimentId ID of the Experiment
ExperimentName Name of the Experiment
CustomerSubscriptionId SubscriptionId where Experiment and Job as submitted
WorkspaceName Name of the machine learning workspace
ClusterName Name of the Cluster
ProvisioningState State of the Job submission
ResourceGroupName Name of the resource group
JobName Name of the Job
ClusterId ID of the cluster
EventType Type of the Job event. For example, JobSubmitted, JobRunning, JobFailed, JobSucceeded.
ExecutionState State of the job (the Run). For example, Queued, Running, Succeeded, Failed
ErrorDetails Details of job error
CreationApiVersion Api version used to create the job
ClusterResourceGroupName Resource group name of the cluster
TFWorkerCount Count of TF workers
TFParameterServerCount Count of TF parameter server
ToolType Type of tool used
RunInContainer Flag describing if job should be run inside a container
JobErrorMessage detailed message of Job error
NodeId ID of the node created where job is running

AmlComputeClusterEvent table

Property Description
TimeGenerated Time when the log entry was generated
OperationName Name of the operation associated with the log event
Category Name of the log event
ProvisioningState Provisioning state of the cluster
ClusterName Name of the cluster
ClusterType Type of the cluster
CreatedBy User who created the cluster
CoreCount Count of the cores in the cluster
VmSize Vm size of the cluster
VmPriority Priority of the nodes created inside a cluster Dedicated/LowPriority
ScalingType Type of cluster scaling manual/auto
InitialNodeCount Initial node count of the cluster
MinimumNodeCount Minimum node count of the cluster
MaximumNodeCount Maximum node count of the cluster
NodeDeallocationOption How the node should be deallocated
Publisher Publisher of the cluster type
Offer Offer with which the cluster is created
Sku Sku of the Node/VM created inside cluster
Version Version of the image used while Node/VM is created
SubnetId SubnetId of the cluster
AllocationState Cluster allocation state
CurrentNodeCount Current node count of the cluster
TargetNodeCount Target node count of the cluster while scaling up/down
EventType Type of event during cluster creation.
NodeIdleTimeSecondsBeforeScaleDown Idle time in seconds before cluster is scaled down
PreemptedNodeCount Preempted node count of the cluster
IsResizeGrow Flag indicating that cluster is scaling up
VmFamilyName Name of the VM family of the nodes that can be created inside cluster
LeavingNodeCount Leaving node count of the cluster
UnusableNodeCount Unusable node count of the cluster
IdleNodeCount Idle node count of the cluster
RunningNodeCount Running node count of the cluster
PreparingNodeCount Preparing node count of the cluster
QuotaAllocated Allocated quota to the cluster
QuotaUtilized Utilized quota of the cluster
AllocationStateTransitionTime Transition time from one state to another
ClusterErrorCodes Error code received during cluster creation or scaling
CreationApiVersion Api version used while creating the cluster

AmlComputeInstanceEvent table

Property Description
Type Name of the log event, AmlComputeInstanceEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
CorrelationId A GUID used to group together a set of related events, when applicable.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlComputeInstanceName "The name of the compute instance associated with the log entry.

AmlDataLabelEvent table

Property Description
Type Name of the log event, AmlDataLabelEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
CorrelationId A GUID used to group together a set of related events, when applicable.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlProjectId The unique identifier of the Azure Machine Learning project.
AmlProjectName The name of the Azure Machine Learning project.
AmlLabelNames The label class names which are created for the project.
AmlDataStoreName The name of the data store where the project's data is stored.

AmlDataSetEvent table

Property Description
Type Name of the log event, AmlDataSetEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
AmlWorkspaceId A GUID and unique ID of the Azure Machine Learning workspace.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlDatasetId The ID of the Azure Machine Learning Data Set.
AmlDatasetName The name of the Azure Machine Learning Data Set.

AmlDataStoreEvent table

Property Description
Type Name of the log event, AmlDataStoreEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
AmlWorkspaceId A GUID and unique ID of the Azure Machine Learning workspace.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlDatastoreName The name of the Azure Machine Learning Data Store.

AmlDeploymentEvent table

Property Description
Type Name of the log event, AmlDeploymentEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlServiceName The name of the Azure Machine Learning Service.

AmlInferencingEvent table

Property Description
Type Name of the log event, AmlInferencingEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlServiceName The name of the Azure Machine Learning Service.

AmlModelsEvent table

Property Description
Type Name of the log event, AmlModelsEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
ResultSignature The HTTP status code of the event. Typical values include 200, 201, 202 etc.
AmlModelName The name of the Azure Machine Learning Model.

AmlPipelineEvent table

Property Description
Type Name of the log event, AmlPipelineEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
AmlWorkspaceId A GUID and unique ID of the Azure Machine Learning workspace.
AmlWorkspaceId The name of the Azure Machine Learning workspace.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlModuleId A GUID and unique ID of the module.
AmlModelName The name of the Azure Machine Learning Model.
AmlPipelineId The ID of the Azure Machine Learning pipeline.
AmlParentPipelineId The ID of the parent Azure Machine Learning pipeline (in the case of cloning).
AmlPipelineDraftId The ID of the Azure Machine Learning pipeline draft.
AmlPipelineDraftName The name of the Azure Machine Learning pipeline draft.
AmlPipelineEndpointId The ID of the Azure Machine Learning pipeline endpoint.
AmlPipelineEndpointName The name of the Azure Machine Learning pipeline endpoint.

AmlRunEvent table

Property Description
Type Name of the log event, AmlRunEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
ResultType The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved.
OperationName The name of the operation associated with the log entry
AmlWorkspaceId A GUID and unique ID of the Azure Machine Learning workspace.
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
RunId The unique ID of the run.

AmlEnvironmentEvent table

Property Description
Type Name of the log event, AmlEnvironmentEvent
TimeGenerated Time (UTC) when the log entry was generated
Level The severity level of the event. Must be one of Informational, Warning, Error, or Critical.
OperationName The name of the operation associated with the log entry
Identity The identity of the user or application that performed the operation.
AadTenantId The Microsoft Entra tenant ID the operation was submitted for.
AmlEnvironmentName The name of the Azure Machine Learning environment configuration.
AmlEnvironmentVersion The name of the Azure Machine Learning environment configuration version.

AMLOnlineEndpointTrafficLog table (preview)

Property Description
Method The requested method from client.
Path The requested path from client.
SubscriptionId The machine learning subscription ID of the online endpoint.
AzureMLWorkspaceId The machine learning workspace ID of the online endpoint.
AzureMLWorkspaceName The machine learning workspace name of the online endpoint.
EndpointName The name of the online endpoint.
DeploymentName The name of the online deployment.
Protocol The protocol of the request.
ResponseCode The final response code returned to the client.
ResponseCodeReason The final response code reason returned to the client.
ModelStatusCode The response status code from model.
ModelStatusReason The response status reason from model.
RequestPayloadSize The total bytes received from the client.
ResponsePayloadSize The total bytes sent back to the client.
UserAgent The user-agent header of the request, including comments but truncated to a max of 70 characters.
XRequestId The request ID generated by Azure Machine Learning for internal tracing.
XMSClientRequestId The tracking ID generated by the client.
TotalDurationMs Duration in milliseconds from the request start time to the last response byte sent back to the client. If the client disconnected, it measures from the start time to client disconnect time.
RequestDurationMs Duration in milliseconds from the request start time to the last byte of the request received from the client.
ResponseDurationMs Duration in milliseconds from the request start time to the first response byte read from the model.
RequestThrottlingDelayMs Delay in milliseconds in request data transfer due to network throttling.
ResponseThrottlingDelayMs Delay in milliseconds in response data transfer due to network throttling.

For more information on this log, see Monitor online endpoints.

AMLOnlineEndpointConsoleLog

Property Description
TimeGenerated The timestamp (UTC) of when the log was generated.
OperationName The operation associated with log record.
InstanceId The ID of the instance that generated this log record.
DeploymentName The name of the deployment associated with the log record.
ContainerName The name of the container where the log was generated.
Message The content of the log.

For more information on this log, see Monitor online endpoints.

AMLOnlineEndpointEventLog (preview)

Property Description
TimeGenerated The timestamp (UTC) of when the log was generated.
OperationName The operation associated with log record.
InstanceId The ID of the instance that generated this log record.
DeploymentName The name of the deployment associated with the log record.
Name The name of the event.
Message The content of the event.

For more information on this log, see Monitor online endpoints.