Azure Machine Learning monitoring data reference
This article contains all the monitoring reference information for this service.
See Monitor Machine Learning for details on the data you can collect for Azure Machine Learning and how to use it.
Metrics
This section lists all the automatically collected platform metrics for this service.
For information on metric retention, see Azure Monitor Metrics overview. The resource provider for these metrics is Microsoft.MachineLearningServices/workspaces.
The metrics categories are Model, Quota, Resource, Run, and Traffic. Quota information is for Machine Learning compute only. Run provides information on training runs for the workspace.
Supported metrics for Microsoft.MachineLearningServices/workspaces
The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces resource type.
- All columns might not be present in every table.
- Some columns might be beyond the viewing area of the page. Select Expand table to view all available columns.
Table headings
- Category - The metrics group or classification.
- Metric - The metric display name as it appears in the Azure portal.
- Name in REST API - The metric name as referred to in the REST API.
- Unit - Unit of measure.
- Aggregation - The default [aggregation]/azure-monitor/essentials/metrics-aggregation-explained) type. Valid values: Average (Avg), Minimum (Min), Maximum (Max), Total (Sum), Count.
- Dimensions - Dimensions available for the metric.
- Time Grains - Intervals at which the metric is sampled. For example,
PT1M
indicates that the metric is sampled every minute,PT30M
every 30 minutes,PT1H
every hour, and so on. - DS Export- Whether the metric is exportable to Azure Monitor Logs via diagnostic settings. For information on exporting metrics, see Create diagnostic settings in Azure Monitor.
Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints
The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces/onlineEndpoints resource type.
- All columns might not be present in every table.
- Some columns might be beyond the viewing area of the page. Select Expand table to view all available columns.
Table headings
- Category - The metrics group or classification.
- Metric - The metric display name as it appears in the Azure portal.
- Name in REST API - The metric name as referred to in the REST API.
- Unit - Unit of measure.
- Aggregation - The default [aggregation]/azure-monitor/essentials/metrics-aggregation-explained) type. Valid values: Average (Avg), Minimum (Min), Maximum (Max), Total (Sum), Count.
- Dimensions - Dimensions available for the metric.
- Time Grains - Intervals at which the metric is sampled. For example,
PT1M
indicates that the metric is sampled every minute,PT30M
every 30 minutes,PT1H
every hour, and so on. - DS Export- Whether the metric is exportable to Azure Monitor Logs via diagnostic settings. For information on exporting metrics, see Create diagnostic settings in Azure Monitor.
Supported metrics for Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments
The following table lists the metrics available for the Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource type.
- All columns might not be present in every table.
- Some columns might be beyond the viewing area of the page. Select Expand table to view all available columns.
Table headings
- Category - The metrics group or classification.
- Metric - The metric display name as it appears in the Azure portal.
- Name in REST API - The metric name as referred to in the REST API.
- Unit - Unit of measure.
- Aggregation - The default [aggregation]/azure-monitor/essentials/metrics-aggregation-explained) type. Valid values: Average (Avg), Minimum (Min), Maximum (Max), Total (Sum), Count.
- Dimensions - Dimensions available for the metric.
- Time Grains - Intervals at which the metric is sampled. For example,
PT1M
indicates that the metric is sampled every minute,PT30M
every 30 minutes,PT1H
every hour, and so on. - DS Export- Whether the metric is exportable to Azure Monitor Logs via diagnostic settings. For information on exporting metrics, see Create diagnostic settings in Azure Monitor.
For information about what metric dimensions are, see Multi-dimensional metrics. This service has the following dimensions associated with its metrics. | Dimension | Description | | ---- | ---- | | Cluster Name | The name of the compute cluster resource. Available for all quota metrics. | | Vm Family Name | The name of the VM family used by the cluster. Available for quota utilization percentage. | | Vm Priority | The priority of the VM. Available for quota utilization percentage. | CreatedTime | Only available for CpuUtilization and GpuUtilization. | | DeviceId | ID of the device (GPU). Only available for GpuUtilization. | | NodeId | ID of the node created where job is running. Only available for CpuUtilization and GpuUtilization. | | RunId | ID of the run/job. Only available for CpuUtilization and GpuUtilization. | | ComputeType | The compute type that the run used. Only available for Completed runs, Failed runs, and Started runs. | | PipelineStepType | The type of PipelineStep used in the run. Only available for Completed runs, Failed runs, and Started runs. | | PublishedPipelineId | The ID of the published pipeline used in the run. Only available for Completed runs, Failed runs, and Started runs. | | RunType | The type of run. Only available for Completed runs, Failed runs, and Started runs. |
The valid values for the RunType dimension are:
Value | Description |
---|---|
Experiment | Non-pipeline runs. |
PipelineRun | A pipeline run, which is the parent of a StepRun. |
StepRun | A run for a pipeline step. |
ReusedStepRun | A run for a pipeline step that reuses a previous run. |
Resource logs
This section lists the types of resource logs you can collect for this service. The section pulls from the list of all resource logs category types supported in Azure Monitor.
Azure Monitor Logs tables
This section refers to all of the Azure Monitor Logs tables relevant to this service, which are available for query by Log Analytics using Kusto queries.
This service uses the following tables to store resource log data.
Machine Learning
Microsoft.MachineLearningServices/workspaces
- AzureActivity
- AMLOnlineEndpointConsoleLog
- AMLOnlineEndpointTrafficLog
- AMLOnlineEndpointEventLog
- AzureMetrics
- AMLComputeClusterEvent
- AMLComputeClusterNodeEvent
- AMLComputeJobEvent
- AMLRunStatusChangedEvent
- AMLComputeCpuGpuUtilization
- AMLComputeInstanceEvent
- AMLDataLabelEvent
- AMLDataSetEvent
- AMLDataStoreEvent
- AMLDeploymentEvent
- AMLEnvironmentEvent
- AMLInferencingEvent
- AMLModelsEvent
- AMLPipelineEvent
- AMLRunEvent
Microsoft.MachineLearningServices/registries
- AzureActivity
- AmlRegistryReadEventsLog
- AmlRegistryWriteEventsLog
Activity log
The linked table lists the operations that can be recorded in the activity log for this service. These operations are a subset of all the possible resource provider operations in the activity log.
For more information on the schema of activity log entries, see Activity Log schema.
The following table lists some operations related to Machine Learning that may be created in the activity log. For a complete listing of Microsoft.MachineLearningServices operations, see Microsoft.MachineLearningServices resource provider operations.
Operation | Description |
---|---|
Creates or updates a Machine Learning workspace | A workspace was created or updated |
CheckComputeNameAvailability | Check if a compute name is already in use |
Creates or updates the compute resources | A compute resource was created or updated |
Deletes the compute resources | A compute resource was deleted |
List secrets | On operation listed secrets for a Machine Learning workspace |
Log schemas
Azure Machine Learning uses the following schemas.
AmlComputeJobEvent table
Property | Description |
---|---|
TimeGenerated | Time when the log entry was generated |
OperationName | Name of the operation associated with the log event |
Category | Name of the log event |
JobId | ID of the Job submitted |
ExperimentId | ID of the Experiment |
ExperimentName | Name of the Experiment |
CustomerSubscriptionId | SubscriptionId where Experiment and Job as submitted |
WorkspaceName | Name of the machine learning workspace |
ClusterName | Name of the Cluster |
ProvisioningState | State of the Job submission |
ResourceGroupName | Name of the resource group |
JobName | Name of the Job |
ClusterId | ID of the cluster |
EventType | Type of the Job event. For example, JobSubmitted, JobRunning, JobFailed, JobSucceeded. |
ExecutionState | State of the job (the Run). For example, Queued, Running, Succeeded, Failed |
ErrorDetails | Details of job error |
CreationApiVersion | Api version used to create the job |
ClusterResourceGroupName | Resource group name of the cluster |
TFWorkerCount | Count of TF workers |
TFParameterServerCount | Count of TF parameter server |
ToolType | Type of tool used |
RunInContainer | Flag describing if job should be run inside a container |
JobErrorMessage | detailed message of Job error |
NodeId | ID of the node created where job is running |
AmlComputeClusterEvent table
Property | Description |
---|---|
TimeGenerated | Time when the log entry was generated |
OperationName | Name of the operation associated with the log event |
Category | Name of the log event |
ProvisioningState | Provisioning state of the cluster |
ClusterName | Name of the cluster |
ClusterType | Type of the cluster |
CreatedBy | User who created the cluster |
CoreCount | Count of the cores in the cluster |
VmSize | Vm size of the cluster |
VmPriority | Priority of the nodes created inside a cluster Dedicated/LowPriority |
ScalingType | Type of cluster scaling manual/auto |
InitialNodeCount | Initial node count of the cluster |
MinimumNodeCount | Minimum node count of the cluster |
MaximumNodeCount | Maximum node count of the cluster |
NodeDeallocationOption | How the node should be deallocated |
Publisher | Publisher of the cluster type |
Offer | Offer with which the cluster is created |
Sku | Sku of the Node/VM created inside cluster |
Version | Version of the image used while Node/VM is created |
SubnetId | SubnetId of the cluster |
AllocationState | Cluster allocation state |
CurrentNodeCount | Current node count of the cluster |
TargetNodeCount | Target node count of the cluster while scaling up/down |
EventType | Type of event during cluster creation. |
NodeIdleTimeSecondsBeforeScaleDown | Idle time in seconds before cluster is scaled down |
PreemptedNodeCount | Preempted node count of the cluster |
IsResizeGrow | Flag indicating that cluster is scaling up |
VmFamilyName | Name of the VM family of the nodes that can be created inside cluster |
LeavingNodeCount | Leaving node count of the cluster |
UnusableNodeCount | Unusable node count of the cluster |
IdleNodeCount | Idle node count of the cluster |
RunningNodeCount | Running node count of the cluster |
PreparingNodeCount | Preparing node count of the cluster |
QuotaAllocated | Allocated quota to the cluster |
QuotaUtilized | Utilized quota of the cluster |
AllocationStateTransitionTime | Transition time from one state to another |
ClusterErrorCodes | Error code received during cluster creation or scaling |
CreationApiVersion | Api version used while creating the cluster |
AmlComputeInstanceEvent table
Property | Description |
---|---|
Type | Name of the log event, AmlComputeInstanceEvent |
TimeGenerated | Time (UTC) when the log entry was generated |
Level | The severity level of the event. Must be one of Informational, Warning, Error, or Critical. |
ResultType | The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved. |
CorrelationId | A GUID used to group together a set of related events, when applicable. |
OperationName | The name of the operation associated with the log entry |
Identity | The identity of the user or application that performed the operation. |
AadTenantId | The Microsoft Entra tenant ID the operation was submitted for. |
AmlComputeInstanceName | "The name of the compute instance associated with the log entry. |
AmlDataLabelEvent table
Property | Description |
---|---|
Type | Name of the log event, AmlDataLabelEvent |
TimeGenerated | Time (UTC) when the log entry was generated |
Level | The severity level of the event. Must be one of Informational, Warning, Error, or Critical. |
ResultType | The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved. |
CorrelationId | A GUID used to group together a set of related events, when applicable. |
OperationName | The name of the operation associated with the log entry |
Identity | The identity of the user or application that performed the operation. |
AadTenantId | The Microsoft Entra tenant ID the operation was submitted for. |
AmlProjectId | The unique identifier of the Azure Machine Learning project. |
AmlProjectName | The name of the Azure Machine Learning project. |
AmlLabelNames | The label class names which are created for the project. |
AmlDataStoreName | The name of the data store where the project's data is stored. |
AmlDataSetEvent table
Property | Description |
---|---|
Type | Name of the log event, AmlDataSetEvent |
TimeGenerated | Time (UTC) when the log entry was generated |
Level | The severity level of the event. Must be one of Informational, Warning, Error, or Critical. |
ResultType | The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved. |
AmlWorkspaceId | A GUID and unique ID of the Azure Machine Learning workspace. |
OperationName | The name of the operation associated with the log entry |
Identity | The identity of the user or application that performed the operation. |
AadTenantId | The Microsoft Entra tenant ID the operation was submitted for. |
AmlDatasetId | The ID of the Azure Machine Learning Data Set. |
AmlDatasetName | The name of the Azure Machine Learning Data Set. |
AmlDataStoreEvent table
Property | Description |
---|---|
Type | Name of the log event, AmlDataStoreEvent |
TimeGenerated | Time (UTC) when the log entry was generated |
Level | The severity level of the event. Must be one of Informational, Warning, Error, or Critical. |
ResultType | The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved. |
AmlWorkspaceId | A GUID and unique ID of the Azure Machine Learning workspace. |
OperationName | The name of the operation associated with the log entry |
Identity | The identity of the user or application that performed the operation. |
AadTenantId | The Microsoft Entra tenant ID the operation was submitted for. |
AmlDatastoreName | The name of the Azure Machine Learning Data Store. |
AmlDeploymentEvent table
Property | Description |
---|---|
Type | Name of the log event, AmlDeploymentEvent |
TimeGenerated | Time (UTC) when the log entry was generated |
Level | The severity level of the event. Must be one of Informational, Warning, Error, or Critical. |
ResultType | The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved. |
OperationName | The name of the operation associated with the log entry |
Identity | The identity of the user or application that performed the operation. |
AadTenantId | The Microsoft Entra tenant ID the operation was submitted for. |
AmlServiceName | The name of the Azure Machine Learning Service. |
AmlInferencingEvent table
Property | Description |
---|---|
Type | Name of the log event, AmlInferencingEvent |
TimeGenerated | Time (UTC) when the log entry was generated |
Level | The severity level of the event. Must be one of Informational, Warning, Error, or Critical. |
ResultType | The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved. |
OperationName | The name of the operation associated with the log entry |
Identity | The identity of the user or application that performed the operation. |
AadTenantId | The Microsoft Entra tenant ID the operation was submitted for. |
AmlServiceName | The name of the Azure Machine Learning Service. |
AmlModelsEvent table
Property | Description |
---|---|
Type | Name of the log event, AmlModelsEvent |
TimeGenerated | Time (UTC) when the log entry was generated |
Level | The severity level of the event. Must be one of Informational, Warning, Error, or Critical. |
ResultType | The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved. |
OperationName | The name of the operation associated with the log entry |
Identity | The identity of the user or application that performed the operation. |
AadTenantId | The Microsoft Entra tenant ID the operation was submitted for. |
ResultSignature | The HTTP status code of the event. Typical values include 200, 201, 202 etc. |
AmlModelName | The name of the Azure Machine Learning Model. |
AmlPipelineEvent table
Property | Description |
---|---|
Type | Name of the log event, AmlPipelineEvent |
TimeGenerated | Time (UTC) when the log entry was generated |
Level | The severity level of the event. Must be one of Informational, Warning, Error, or Critical. |
ResultType | The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved. |
AmlWorkspaceId | A GUID and unique ID of the Azure Machine Learning workspace. |
AmlWorkspaceId | The name of the Azure Machine Learning workspace. |
OperationName | The name of the operation associated with the log entry |
Identity | The identity of the user or application that performed the operation. |
AadTenantId | The Microsoft Entra tenant ID the operation was submitted for. |
AmlModuleId | A GUID and unique ID of the module. |
AmlModelName | The name of the Azure Machine Learning Model. |
AmlPipelineId | The ID of the Azure Machine Learning pipeline. |
AmlParentPipelineId | The ID of the parent Azure Machine Learning pipeline (in the case of cloning). |
AmlPipelineDraftId | The ID of the Azure Machine Learning pipeline draft. |
AmlPipelineDraftName | The name of the Azure Machine Learning pipeline draft. |
AmlPipelineEndpointId | The ID of the Azure Machine Learning pipeline endpoint. |
AmlPipelineEndpointName | The name of the Azure Machine Learning pipeline endpoint. |
AmlRunEvent table
Property | Description |
---|---|
Type | Name of the log event, AmlRunEvent |
TimeGenerated | Time (UTC) when the log entry was generated |
Level | The severity level of the event. Must be one of Informational, Warning, Error, or Critical. |
ResultType | The status of the event. Typical values include Started, In Progress, Succeeded, Failed, Active, and Resolved. |
OperationName | The name of the operation associated with the log entry |
AmlWorkspaceId | A GUID and unique ID of the Azure Machine Learning workspace. |
Identity | The identity of the user or application that performed the operation. |
AadTenantId | The Microsoft Entra tenant ID the operation was submitted for. |
RunId | The unique ID of the run. |
AmlEnvironmentEvent table
Property | Description |
---|---|
Type | Name of the log event, AmlEnvironmentEvent |
TimeGenerated | Time (UTC) when the log entry was generated |
Level | The severity level of the event. Must be one of Informational, Warning, Error, or Critical. |
OperationName | The name of the operation associated with the log entry |
Identity | The identity of the user or application that performed the operation. |
AadTenantId | The Microsoft Entra tenant ID the operation was submitted for. |
AmlEnvironmentName | The name of the Azure Machine Learning environment configuration. |
AmlEnvironmentVersion | The name of the Azure Machine Learning environment configuration version. |
AMLOnlineEndpointTrafficLog table (preview)
Property | Description |
---|---|
Method | The requested method from client. |
Path | The requested path from client. |
SubscriptionId | The machine learning subscription ID of the online endpoint. |
AzureMLWorkspaceId | The machine learning workspace ID of the online endpoint. |
AzureMLWorkspaceName | The machine learning workspace name of the online endpoint. |
EndpointName | The name of the online endpoint. |
DeploymentName | The name of the online deployment. |
Protocol | The protocol of the request. |
ResponseCode | The final response code returned to the client. |
ResponseCodeReason | The final response code reason returned to the client. |
ModelStatusCode | The response status code from model. |
ModelStatusReason | The response status reason from model. |
RequestPayloadSize | The total bytes received from the client. |
ResponsePayloadSize | The total bytes sent back to the client. |
UserAgent | The user-agent header of the request, including comments but truncated to a max of 70 characters. |
XRequestId | The request ID generated by Azure Machine Learning for internal tracing. |
XMSClientRequestId | The tracking ID generated by the client. |
TotalDurationMs | Duration in milliseconds from the request start time to the last response byte sent back to the client. If the client disconnected, it measures from the start time to client disconnect time. |
RequestDurationMs | Duration in milliseconds from the request start time to the last byte of the request received from the client. |
ResponseDurationMs | Duration in milliseconds from the request start time to the first response byte read from the model. |
RequestThrottlingDelayMs | Delay in milliseconds in request data transfer due to network throttling. |
ResponseThrottlingDelayMs | Delay in milliseconds in response data transfer due to network throttling. |
For more information on this log, see Monitor online endpoints.
AMLOnlineEndpointConsoleLog
Property | Description |
---|---|
TimeGenerated | The timestamp (UTC) of when the log was generated. |
OperationName | The operation associated with log record. |
InstanceId | The ID of the instance that generated this log record. |
DeploymentName | The name of the deployment associated with the log record. |
ContainerName | The name of the container where the log was generated. |
Message | The content of the log. |
For more information on this log, see Monitor online endpoints.
AMLOnlineEndpointEventLog (preview)
Property | Description |
---|---|
TimeGenerated | The timestamp (UTC) of when the log was generated. |
OperationName | The operation associated with log record. |
InstanceId | The ID of the instance that generated this log record. |
DeploymentName | The name of the deployment associated with the log record. |
Name | The name of the event. |
Message | The content of the event. |
For more information on this log, see Monitor online endpoints.
Related content
- See Monitor Machine Learning for a description of monitoring Machine Learning.
- See Monitor Azure resources with Azure Monitor for details on monitoring Azure resources.