有关 Azure Databricks 的常见问题解答Frequently asked questions about Azure Databricks

本文列出了用户可能会遇到的与 Azure Databricks 相关的常见问题。This article lists the top questions you might have related to Azure Databricks. 以及使用 Databricks 时可能会遇到的一些常见问题。It also lists some common problems you might have while using Databricks. 有关详细信息,请参阅什么是 Azure DatabricksFor more information, see What is Azure Databricks.

是否可以使用 Azure Key Vault 来存储要在 Azure Databricks 中使用的密钥/机密?Can I use Azure Key Vault to store keys/secrets to be used in Azure Databricks?

是的。Yes. 可以使用 Azure Key Vault 来存储要用于 Azure Databricks 的密钥/机密。You can use Azure Key Vault to store keys/secrets for use with Azure Databricks. 有关详细信息,请参阅 Azure Key Vault 支持的作用域For more information, see Azure Key Vault-backed scopes.

是否可以将 Azure 虚拟网络与 Databricks 配合使用?Can I use Azure Virtual Networks with Databricks?

是的。Yes. 可以将 Azure 虚拟网络 (VNET) 与 Databricks 配合使用。You can use an Azure Virtual Network (VNET) with Azure Databricks. 有关详细信息,请参阅在 Azure 虚拟网络中部署 Azure DatabricksFor more information, see Deploying Azure Databricks in your Azure Virtual Network.

如何使用笔记本访问 Azure Data Lake Storage?How do I access Azure Data Lake Storage from a notebook?

执行以下步骤:Follow these steps:

  1. 在 Azure Active Directory (Azure AD) 中预配服务主体并记录其密钥。In Azure Active Directory (Azure AD), provision a service principal, and record its key.
  2. 在 Data Lake Storage 中分配服务主体的必需权限。Assign the necessary permissions to the service principal in Data Lake Storage.
  3. 若要访问 Data Lake Storage 中的文件,请使用 Notebook 中的服务主体凭据。To access a file in Data Lake Storage, use the service principal credentials in Notebook.

有关详细信息,请参阅配合使用 Azure Data Lake Storage 和 Azure DatabricksFor more information, see Use Azure Data Lake Storage with Azure Databricks.

解决常见问题Fix common problems

以下是使用 Databricks 时可能会遇到的一些问题。Here are a few problems you might encounter with Databricks.

问题:该订阅未注册为使用命名空间“Microsoft.Databricks”Issue: This subscription is not registered to use the namespace 'Microsoft.Databricks'

错误消息Error message

“该订阅未注册为使用命名空间‘Microsoft.Databricks’。"This subscription is not registered to use the namespace 'Microsoft.Databricks'. 有关如何注册订阅,请参阅 https://aka.ms/rps-not-found。See https://aka.ms/rps-not-found for how to register subscriptions. (代码:MissingSubscriptionRegistration)”(Code: MissingSubscriptionRegistration)"

解决方案Solution

  1. 转到 Azure 门户Go to the Azure portal.
  2. 依次选择“订阅”、正在使用的订阅,然后单击“资源提供程序”。Select Subscriptions , the subscription you are using, and then Resource providers.
  3. 在资源提供程序列表中,针对“Microsoft.Databricks”选择“注册”。In the list of resource providers, against Microsoft.Databricks , select Register. 必须具有订阅的参与者或所有者角色才能注册资源提供程序。You must have the contributor or owner role on the subscription to register the resource provider.

问题:在 Azure 门户中,你的帐户 {email} 没有 Databricks 工作区资源的“所有者”或“参与者”角色Issue: Your account {email} does not have the owner or contributor role on the Databricks workspace resource in the Azure portal

错误消息Error message

“在 Azure 门户中,你的帐户 {email} 没有 Databricks 工作区资源的‘所有者’或‘参与者’角色。"Your account {email} does not have Owner or Contributor role on the Databricks workspace resource in the Azure portal. 如果你是租户中的来宾用户,也可能发生此错误。This error can also occur if you are a guest user in the tenant. 请让管理员授予你访问权限,或直接在 Databricks 工作区中将你添加为用户。”Ask your administrator to grant you access or add you as a user directly in the Databricks workspace."

解决方案Solution

以下是此问题的一些解决方法:The following are a couple of solutions to this issue:

  • 若要初始化租户,必须以租户的常规用户(而非来宾用户)身份登录。To initialize the tenant, you must be signed in as a regular user of the tenant, not as a guest user. 还必须具有 Databricks 工作区资源的“参与者”角色。You must also have a contributor role on the Databricks workspace resource. 可以在 Azure 门户的 Databricks 工作区中,通过“访问控制(IAM)”选项卡授予用户访问权限。You can grant a user access from the Access control (IAM) tab within your Databricks workspace in the Azure portal.

  • 如果电子邮件域名在 Azure AD 中被分配给多个目录,也可能会发生此错误。This error might also occur if your email domain name is assigned to multiple directories in Azure AD. 若要解决此问题,可在包含订阅和 Databricks 工作区的目录中创建新用户。To work around this issue, create a new user in the directory that contains the subscription with your Databricks workspace.

    a.a. 在 Azure 门户中,转到 Azure AD。In the Azure portal, go to Azure AD. 依次选择“用户和组” > “添加用户”。Select Users and Groups > Add a user.

    b.b. 使用 @<tenant_name>.onmicrosoft.com 电子邮件而非 @<your_domain> 电子邮件添加用户。Add a user with an @<tenant_name>.onmicrosoft.com email instead of @<your_domain> email. 可在 Azure 门户中 Azure AD 下的“自定义域”中找到此选项。You can find this option in Custom Domains , under Azure AD in the Azure portal.

    c.c. 授予新用户 Databricks 工作区资源的“参与者”角色。Grant this new user the Contributor role on the Databricks workspace resource.

    d.d. 使用新用户登录到 Azure 门户,并找到 Databricks 工作区。Sign in to the Azure portal with the new user, and find the Databricks workspace.

    e.e. 以此用户的身份启动 Databricks 工作区。Launch the Databricks workspace as this user.

问题:你的帐户 {电子邮件} 未在 Databricks 中注册Issue: Your account {email} has not been registered in Databricks

解决方案Solution

如果你未创建工作区,但要添加为用户,请联系创建工作区的人员。If you did not create the workspace, and you are added as a user, contact the person who created the workspace. 让他通过 Azure Databricks 管理员控制台添加。Have that person add you by using the Azure Databricks Admin Console. 有关说明,请参阅 Adding and managing users(添加和管理用户)。For instructions, see Adding and managing users. 如果已创建该工作区但仍出现此错误,请尝试再次在 Azure 门户中单击“初始化工作区”。If you created the workspace and still you get this error, try selecting Initialize Workspace again from the Azure portal.

问题:设置群集时,云提供程序启动失败 (PublicIPCountLimitReached)Issue: Cloud provider launch failure while setting up the cluster (PublicIPCountLimitReached)

错误消息Error message

“云提供程序启动故障: 在设置群集时遇到云提供程序错误。"Cloud Provider Launch Failure: A cloud provider error was encountered while setting up the cluster. 有关详细信息,请参阅“Databricks 指南”。For more information, see the Databricks guide. Azure 错误代码:PublicIPCountLimitReached。Azure error code: PublicIPCountLimitReached. Azure 错误消息:在此区域该订阅不能创建超过 10 个公共 IP 地址。”Azure error message: Cannot create more than 10 public IP addresses for this subscription in this region."

背景Background

Databricks 群集为每个节点(包括驱动程序节点)使用一个公共 IP 地址。Databricks clusters use one public IP address per node (including the driver node). 对每个区域,Azure 订阅均具有公共 IP 地址限制Azure subscriptions have public IP address limits per region. 因此,如果群集创建和纵向扩展操作导致分配给该区域中的该订阅的公共 IP 地址数量超过限制,则群集创建和纵向扩展操作可能会失败。Thus, cluster creation and scale-up operations may fail if they would cause the number of public IP addresses allocated to that subscription in that region to exceed the limit. 此限制还包括分配给非 Databricks 使用的公共 IP 地址,例如自定义用户定义的 VM。This limit also includes public IP addresses allocated for non-Databricks usage, such as custom user-defined VMs.

通常,群集仅在处于活动状态时才使用公共 IP 地址。In general, clusters only consume public IP addresses while they are active. 但短时间内可能会继续发生 PublicIPCountLimitReached 错误,即使在其他群集终止后也是如此。However, PublicIPCountLimitReached errors may continue to occur for a short period of time even after other clusters are terminated. 原因在于:群集终止后,Databricks 会暂时缓存 Azure 资源。This is because Databricks temporarily caches Azure resources when a cluster is terminated. 资源缓存是设计使然,因为它可以在许多常见方案中显著降低群集启动和自动缩放的延迟。Resource caching is by design, since it significantly reduces the latency of cluster startup and autoscaling in many common scenarios.

解决方案Solution

如果订阅已达到指定区域的公共 IP 地址限制,则应执行以下的其中一项操作。If your subscription has already reached its public IP address limit for a given region, then you should do one or the other of the following.

  • 在不同 Databricks 工作区中新建群集。Create new clusters in a different Databricks workspace. 另一个工作区必须位于尚未达到订阅的公共 IP 地址限制的区域。The other workspace must be located in a region in which you have not reached your subscription's public IP address limit.
  • 请求提高公共 IP 地址限制Request to increase your public IP address limit. 选择“配额”作为“问题类型”、选择“网络: ARM”作为“配额类型”。Choose Quota as the Issue Type , and Networking: ARM as the Quota Type. 在“详细信息”中,请求增加公共 IP 地址配额。In Details , request a Public IP Address quota increase. 例如,如果限制当前为 60,但希望创建具有 100 个节点的群集,则请求将限制增加至 160。For example, if your limit is currently 60, and you want to create a 100-node cluster, request a limit increase to 160.

问题:设置群集时,第二种类型的云提供程序启动失败 (MissingSubscriptionRegistration)Issue: A second type of cloud provider launch failure while setting up the cluster (MissingSubscriptionRegistration)

错误消息Error message

“云提供程序启动故障: 在设置群集时遇到云提供程序错误。"Cloud Provider Launch Failure: A cloud provider error was encountered while setting up the cluster. 有关详细信息,请参阅“Databricks 指南”。For more information, see the Databricks guide. Azure 错误代码:MissingSubscriptionRegistration;Azure 错误消息:订阅未注册为使用命名空间“Microsoft.Databricks”。Azure error code: MissingSubscriptionRegistration Azure error message: The subscription is not registered to use namespace 'Microsoft.Compute'. 有关如何注册订阅,请参阅 https://aka.ms/rps-not-found。See https://aka.ms/rps-not-found for how to register subscriptions."

解决方案Solution

  1. 转到 Azure 门户Go to the Azure portal.
  2. 依次选择“订阅”、正在使用的订阅,然后单击“资源提供程序”。Select Subscriptions , the subscription you are using, and then Resource providers.
  3. 在资源提供程序列表中,针对“Microsoft.Compute”选择“注册”。In the list of resource providers, against Microsoft.Compute , select Register. 必须具有订阅的参与者或所有者角色才能注册资源提供程序。You must have the contributor or owner role on the subscription to register the resource provider.

有关详细说明,请参阅资源提供程序和类型For more detailed instructions, see Resource providers and types.

问题:Azure Databricks 需要只有管理员可以授予的访问组织中资源的权限。Issue: Azure Databricks needs permissions to access resources in your organization that only an admin can grant.

背景Background

Azure Databricks 集成了 Azure Active Directory。Azure Databricks is integrated with Azure Active Directory. 你可以通过指定 Azure AD 中的用户在 Azure Databricks 中(例如,在笔记本或群集上)设置权限。You can set permissions within Azure Databricks (for example, on notebooks or clusters) by specifying users from Azure AD. 要使 Azure Databricks 能够列出 Azure AD 中的用户名称,它需要对该信息的读取权限并需要得到同意。For Azure Databricks to be able to list the names of the users from your Azure AD, it requires read permission to that information and consent to be given. 如果许可尚不可用,将看到错误。If the consent is not already available, you see the error.

解决方案Solution

以全局管理员身份登录到 Azure 门户。Log in as a global administrator to the Azure portal. 对于 Azure Active Directory,请转到“用户设置”选项卡并确保“用户可以同意应用代表他们访问公司数据”设置为“是”。For Azure Active Directory, go to the User Settings tab and make sure Users can consent to apps accessing company data on their behalf is set to Yes.

后续步骤Next steps