Azure Databricks 中的诊断日志记录 Diagnostic logging in Azure Databricks

重要

此功能目前以公共预览版提供。This feature is in Public Preview.

Azure Databricks 提供全面的 Azure Databricks 用户所执行活动的端到端诊断日志,使企业能够监视详细的 Azure Databricks 使用模式。Azure Databricks provides comprehensive end-to-end diagnostic logs of activities performed by Azure Databricks users, allowing your enterprise to monitor detailed Azure Databricks usage patterns.

配置诊断日志传送Configure diagnostic log delivery

备注

诊断日志需要 Azure Databricks 高级计划Diagnostic Logs require the Azure Databricks Premium Plan.

  1. 以 Azure Databricks 工作区的所有者或参与者身份登录到 Azure 门户,然后单击你的 Azure Databricks 服务资源。Log in to the Azure portal as an Owner or Contributor for the Azure Databricks workspace and click your Azure Databricks Service resource.

  2. 在侧栏的“监视”部分中,单击“诊断设置”选项卡。In the Monitoring section of the sidebar, click the Diagnostic settings tab.

  3. 单击“启用诊断”。Click Turn on diagnostics.

    Azure Databricks 诊断设置Azure Databricks diagnostic settings

  4. 在“诊断设置”页上,提供以下配置:On the Diagnostic settings page, provide the following configuration:

    名称Name

    为要创建的日志输入名称。Enter a name for the logs to create.

    存档到存储帐户Archive to a storage account

    要使用此选项,需要一个可连接到的现有存储帐户。To use this option, you need an existing storage account to connect to. 要在门户中创建新的存储帐户,请参阅创建存储帐户,并按照说明创建 Azure 资源管理器(即通用帐户)。To create a new storage account in the portal, see Create a storage account and follow the instructions to create an Azure Resource Manager, general-purpose account. 然后在门户中返回到此页,选择存储帐户。Then return to this page in the portal to select your storage account. 新创建的存储帐户可能几分钟后才会显示在下拉菜单中。It might take a few minutes for newly created storage accounts to appear in the drop-down menu. 若要了解写入到存储帐户引发的额外成本,请参阅 Azure 存储定价For information about additional costs incurred by writing to a storage account, see Azure Storage pricing.

    流式传输到事件中心Stream to an event hub

    若要使用此选项,你需要一个可连接到的现有 Azure 事件中心命名空间和事件中心。To use this option, you need an existing Azure Event Hubs namespace and event hub to connect to. 要创建事件中心命名空间,请参阅使用 Azure 门户创建事件中心命名空间和事件中心To create an Event Hubs namespace, see Create an Event Hubs namespace and an event hub by using the Azure portal. 然后,在门户中返回到此页,选择事件中心命名空间和策略名称。Then return to this page in the portal to select the Event Hubs namespace and policy name. 若要了解写入到事件中心产生的额外成本,请参阅 Azure 事件中心定价For information about additional costs incurred by writing to an event hub, see Azure Event Hubs pricing.

    发送到 Log AnalyticsSend to Log Analytics

    若要使用此选项,请使用现有的 Log Analytics 工作区,或者创建新的工作区,只需在门户中执行创建新工作区所需的步骤即可。To use this option, either use an existing Log Analytics workspace or create a new one by following the steps to Create a new workspace in the portal. 若要了解向 Log Analytics 发送日志产生的额外成本,请参阅 Azure Monitor 定价For information about additional costs incurred by sending logs to Log Analytics, see Azure Monitor pricing.

    Azure Databricks 诊断设置Azure Databricks diagnostic settings

  5. 选择你需要其诊断日志的服务,并设置保留策略。Choose the services you want diagnostic logs for and set retention policies.

    保留仅应用于存储帐户。Retention applies only to storage accounts. 如果不想应用保留策略,而是想永久保留数据,请将“保留期(天)”设置为 0。If you do not want to apply a retention policy and you want to retain data forever, set Retention (days) to 0.

  6. 选择“保存” 。Select Save.

  7. 如果收到一个错误,指出“无法更新 的诊断。If you receive an error that says “Failed to update diagnostics for . 订阅 未注册为使用 microsoft.insights”,请遵照排查 Azure 诊断问题中的说明注册帐户,然后重试此过程。The subscription is not registered to use microsoft.insights,” follow the Troubleshoot Azure Diagnostics instructions to register the account and then retry this procedure.

  8. 若要更改在将来的任意时间点保存诊断日志的方式,可以返回到此页,修改帐户的诊断日志设置。If you want to change how your diagnostic logs are saved at any point in the future, return to this page to modify the diagnostic log settings for your account.

使用 PowerShell 启用日志记录Turn on logging using PowerShell

  1. 启动 Azure PowerShell 会话,并使用以下命令登录用户的 Azure 帐户:Start an Azure PowerShell session and sign in to your Azure account with the following command:

    Connect-AzAccount
    

    如果尚未安装 Azure Powershell,请使用以下命令安装 Azure PowerShell 并导入 Azure RM 模块。If you do not have Azure Powershell installed already, use the following commands to install Azure PowerShell and import the Azure RM module.

    Install-Module -Name Az -AllowClobber
    Import-Module AzureRM
    
  2. 在弹出的浏览器窗口中,输入 Azure 帐户用户名和密码。In the pop-up browser window, enter your Azure account user name and password. Azure PowerShell 会获取与此帐户关联的所有订阅,并按默认使用第一个订阅。Azure PowerShell gets all of the subscriptions that are associated with this account, and by default, uses the first one.

    如果有多个订阅,可能需要指定用来创建 Azure Key Vault 的特定订阅。If you have more than one subscription, you might have to specify the specific subscription that was used to create your Azure key vault. 若要查看帐户的订阅,请键入以下命令:To see the subscriptions for your account, type the following command:

    Get-AzSubscription
    

    若要指定与要记录的 Azure Databricks 帐户关联的订阅,请键入以下命令:To specify the subscription that’s associated with the Azure Databricks account that you’re logging, type the following command:

    Set-AzContext -SubscriptionId <subscription ID>
    
  3. 将你的 Log Analyticss 资源名称设置为名为 logAnalytics 的变量,其中 ResourceName 是 Log Analytics 工作区的名称。Set your Log Analytics resource name to a variable named logAnalytics, where ResourceName is the name of the Log Analytics workspace.

    $logAnalytics = Get-AzResource -ResourceGroupName <resource group name> -ResourceName <resource name> -ResourceType "Microsoft.OperationalInsights/workspaces"
    
  4. 将 Azure Databricks 服务资源名称设置为名为 databricks 的变量,其中 ResourceName 是 Azure Databricks 服务的名称。Set the Azure Databricks service resource name to a variable named databricks, where ResourceName is the name of the Azure Databricks service.

    $databricks = Get-AzResource -ResourceGroupName <your resource group name> -ResourceName <your Azure Databricks service name> -ResourceType "Microsoft.Databricks/workspaces"
    
  5. 若要为 Azure Databricks 启用日志记录,请使用 Set-AzDiagnosticSetting cmdlet,同时使用新存储帐户、Azure Databricks 服务和要为其启用日志记录的类别的变量。To enable logging for Azure Databricks, use the Set-AzDiagnosticSetting cmdlet with variables for the new storage account, Azure Databricks service, and the category to enable for logging. 运行以下命令,并将 -Enabled 标志设置为 $trueRun the following command and set the -Enabled flag to $true:

    Set-AzDiagnosticSetting -ResourceId $databricks.ResourceId -WorkspaceId $logAnalytics.ResourceId -Enabled $true -name "<diagnostic setting name>" -Category <comma separated list>
    

使用 Azure CLI 启用日志记录Enable logging by using Azure CLI

  1. 打开 PowerShell。Open PowerShell.

  2. 使用以下命令连接到你的 Azure 帐户:Use the following command to connect to your Azure account:

    az login
    
  3. 运行以下诊断设置命令:Run the following diagnostic setting command:

    az monitor diagnostic-settings create --name <diagnostic name>
    --resource-group <log analytics workspace resource group>
    --workspace <log analytics name or object ID>
    --resource <target resource object ID>
    --logs '[
    {
      \"category\": <category name>,
      \"enabled\": true
    }
    ]'
    

REST APIREST API

使用 LogSettings API。Use the LogSettings API.

请求Request

PUT https://management.azure.com/{resourceUri}/providers/microsoft.insights/diagnosticSettings/{name}?api-version=2017-05-01-preview

请求正文Request body

{
    "properties": {
    "workspaceId": "<log analytics resourceId>",
    "logs": [
      {
        "category": "<category name>",
        "enabled": true,
        "retentionPolicy": {
          "enabled": false,
          "days": 0
        }
      }
    ]
  }
}

诊断日志传送Diagnostic log delivery

为帐户启用日志记录后,Azure Databricks 会自动开始定期将诊断日志发送到你的传送位置。Once logging is enabled for your account, Azure Databricks automatically starts sending diagnostic logs in to your delivery location on a periodic basis. 日志会在激活后的 24 到 72 小时内可用。Logs are available within 24 to 72 hours of activation. 在任何给定的日期,Azure Databricks 都将在前 24 小时内提供至少 99% 的诊断日志,剩余的 1% 将在 72 小时内提供。On any given day, Azure Databricks delivers at least 99% of diagnostic logs within the first 24 hours, and the remaining 1% in no more than 72 hours.

诊断日志架构Diagnostic log schema

诊断日志记录的架构如下所示:The schema of diagnostic log records is as follows:

字段Field 说明Description
operationversionoperationversion 诊断日志格式的架构版本。The schema version of the diagnostic log format.
timetime 操作的 UTC 时间戳。UTC timestamp of the action.
properties.sourceIPAddressproperties.sourceIPAddress 源请求的 IP 地址。The IP address of the source request.
properties.userAgentproperties.userAgent 用于发出请求的浏览器或 API 客户端。The browser or API client used to make the request.
properties.sessionIdproperties.sessionId 操作的会话 ID。Session ID of the action.
identitiesidentities 有关发出请求的用户的信息:Information about the user that makes the requests:

* email :用户电子邮件地址。* email : User email address.
categorycategory 记录了请求的服务。The service that logged the request.
operationNameoperationName 操作,例如登录、注销、读取、写入,等等。The action, such as login, logout, read, write, etc.
properties.requestIdproperties.requestId 唯一的请求 ID。Unique request ID.
properties.requestParamsproperties.requestParams 事件中使用的参数键值对。Parameter key-value pairs used in the event.
properties.responseproperties.response 对请求的响应:Response to the request:

* errorMessage :错误消息(如果有错误)。* errorMessage : The error message if there was an error.
* result :请求的结果。* result : The result of the request.
* statusCode :指示请求是否成功的 HTTP 状态代码。* statusCode : HTTP status code that indicates whether the request succeeds or not.
properties.logIdproperties.logId 日志消息的唯一标识符。The unique identifier for the log messages.

事件Events

categoryoperationName 属性标识日志记录中的事件。The category and operationName properties identify an event in a log record. Azure Databricks 提供以下服务的诊断日志:Azure Databricks provides diagnostic logs for the following services:

  • DBFSDBFS
  • 群集Clusters
  • Pools
  • 帐户Accounts
  • 作业Jobs
  • 笔记本Notebook
  • SSHSSH
  • 工作区Workspace
  • 机密Secrets
  • SQL 权限SQL Permissions

如果操作花费很长时间,则会单独记录请求和响应,但请求和响应对具有相同的 properties.requestIdIf actions take a long time, the request and response are logged separately, but the request and response pair have the same properties.requestId.

除了与装载相关的操作之外,Azure Databricks 诊断日志不包含与 DBFS 相关的操作。With the exception of mount-related operations, Azure Databricks diagnostic logs do not include DBFS-related operations.

备注

自动化操作(例如,由于自动缩放而重设群集大小,或由于调度而启动作业)由用户 System-User 执行。Automated actions—such as resizing a cluster due to autoscaling or launching a job due to scheduling—are performed by the user System-User.

示例日志输出Sample log output

下面的 JSON 示例是 Azure Databricks 日志输出的示例:The following JSON sample is an example of Azure Databricks log output:

{
    "TenantId": "<your tenant id",
    "SourceSystem": "|Databricks|",
    "TimeGenerated": "2019-05-01T00:18:58Z",
    "ResourceId": "/SUBSCRIPTIONS/SUBSCRIPTION_ID/RESOURCEGROUPS/RESOURCE_GROUP/PROVIDERS/MICROSOFT.DATABRICKS/WORKSPACES/PAID-VNET-ADB-PORTAL",
    "OperationName": "Microsoft.Databricks/jobs/create",
    "OperationVersion": "1.0.0",
    "Category": "jobs",
    "Identity": {
        "email": "mail@contoso.com",
        "subjectName": null
    },
    "SourceIPAddress": "131.0.0.0",
    "LogId": "201b6d83-396a-4f3c-9dee-65c971ddeb2b",
    "ServiceName": "jobs",
    "UserAgent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36",
    "SessionId": "webapp-cons-webapp-01exaj6u94682b1an89u7g166c",
    "ActionName": "create",
    "RequestId": "ServiceMain-206b2474f0620002",
    "Response": {
        "statusCode": 200,
        "result": "{\"job_id\":1}"
    },
    "RequestParams": {
        "name": "Untitled",
        "new_cluster": "{\"node_type_id\":\"Standard_DS3_v2\",\"spark_version\":\"5.2.x-scala2.11\",\"num_workers\":8,\"spark_conf\":{\"spark.databricks.delta.preview.enabled\":\"true\"},\"cluster_creator\":\"JOB_LAUNCHER\",\"spark_env_vars\":{\"PYSPARK_PYTHON\":\"/databricks/python3/bin/python3\"},\"enable_elastic_disk\":true}"
    },
    "Type": "DatabricksJobs"
}

分析诊断日志Analyze diagnostic logs

如果在启用诊断日志记录时选择了“发送到 Log Analytics”选项,则容器中的诊断数据会在 24 到 72 小时内转发到 Azure Monitor 日志。If you selected the Send to Log Analytics option when you turned on diagnostic logging, diagnostic data from your container is forwarded to Azure Monitor logs within 24 to 72 hours.

在查看日志之前,请验证 Log Analytics 工作区是否已升级为使用新的 Kusto 查询语言。Before you view your logs, verify if your Log Analytics workspace has been upgraded to use the new Kusto query language. 若要检查,请打开 Azure 门户并选择最左侧的“Log Analytics”。To check, open the Azure portal and select Log Analytics on the far left. 然后选择你的 Log Analytics 工作区。Then select your Log Analytics workspace. 如果收到要升级的消息,请参阅将 Azure Log Analytics 工作区升级到新的日志搜索If you get a message to upgrade, see Upgrade your Azure Log Analytics workspace to new log search.

若要查看 Azure Monitor 日志中的诊断数据,请打开左侧菜单中的“日志搜索”页或该页的“管理”区域。To view your diagnostic data in Azure Monitor logs, open the Log Search page from the left menu or the Management area of the page. 然后,将查询输入到“日志搜索”框中。Then enter your query into the Log search box.

Azure Log AnalyticsAzure Log Analytics

查询Queries

可在“日志搜索”框中输入下面这些附加的查询。Here are some additional queries that you can enter into the Log search box. 这些查询以 Kusto 查询语言编写。These queries are written in Kusto Query Language.

  • 若要查询已访问 Azure Databricks 工作区及其位置的所有用户,请执行以下语句:To query all users who have accessed the Azure Databricks workspace and their location:

    DatabricksAccounts
    | where ActionName contains "login"
    | extend d=parse_json(Identity)
    | project UserEmail=d.email, SourceIPAddress
    
  • 若要检查使用的 Spark 版本,请执行以下语句:To check the Spark versions used:

    DatabricksClusters
    | where ActionName == "create"
    | extend d=parse_json(RequestParams)
    | extend SparkVersion= d.spark_version
    | summarize Count=count() by tostring(SparkVersion)