管理和增加 Azure 机器学习资源的配额Manage & increase quotas for resources with Azure Machine Learning

适用于:是基本版是企业版               (升级到企业版APPLIES TO: yesBasic edition yesEnterprise edition                    (Upgrade to Enterprise edition)

本文将介绍 Azure 机器学习订阅的针对 Azure 资源的预配置限制以及你可以管理的配额。In this article, you will learn about preconfigured limits on Azure resources for your Azure Machine Learning subscription and what quotas you can manage. 这些限制用于防止由于欺诈导致的预算超支,并且符合 Azure 容量限制。These limits are put in place to prevent budget over-runs due to fraud, and to honor Azure capacity constraints.

与其他 Azure 服务一样,与 Azure 机器学习关联的某些资源存在限制。As with other Azure services, there are limits on certain resources associated with Azure Machine Learning. 这些限制包括工作区的数量上限以及用于模型训练或推断/评分的实际基础计算的限制等。These limits range from a cap on the number of workspaces to limits on the actual underlying compute that gets used for model training or inference/scoring.

在设计和扩展生产工作负载的 Azure 机器学习资源时,请考虑这些限制。As you design and scale your Azure Machine Learning resources for production workloads, consider these limits. 例如,如果群集未达到节点目标数量,那么可能是已达到订阅的 Azure 机器学习计算核心限制。For example, if your cluster doesn't reach the target number of nodes, then you may have reached an Azure Machine Learning Compute cores limit for your subscription. 如果想将限制或配额提高到默认值限制以上,可以免费打开联机客户支持请求。If you want to raise the limit or quota above the Default Limit, open an online customer support request at no charge. 由于 Azure 容量限制,无法将限制提高到超过下表中显示的最大限制值。The limits can't be raised above the Maximum Limit value shown in the following tables due to Azure Capacity constraints. 如果没有“最大限制”列,则资源没有可调整的限制。If there is no Maximum Limit column, then the resource doesn't have adjustable limits.

特殊注意事项Special considerations

  • 配额是一种信用限制,不附带容量保证。A quota is a credit limit, not a capacity guarantee. 如果有大规模容量需求,请与 Azure 支持部门联系。If you have large-scale capacity needs, contact Azure support. 还可以增加配额You can also increase your quotas.

  • 配额在订阅中的所有服务之间共享,包括 Azure 机器学习。Your quota is shared across all the services in your subscriptions including Azure Machine Learning. 唯一的例外是 Azure 机器学习计算,它的配额独立于核心计算配额。The only exception is Azure Machine Learning compute which has a separate quota from the core compute quota. 在评估容量需求时,请务必计算所有服务的配额使用情况。Be sure to calculate the quota usage across all services when evaluating your capacity needs.

  • 默认限制根据产品/服务类别类型(例如免费试用、即用即付)和 VM 系列(例如 Dv2、F、G 等)而有所不同。Default limits vary by offer Category Type, such as Free Trial, Pay-As-You-Go, and VM series, such as Dv2, F, G, and so on.

默认资源配额Default resource quotas

下面是 Azure 订阅中各种资源类型的配额限制的细分条目。Here is a breakdown of the quota limits by various resource types within your Azure subscription.

重要

限制随时会变化。Limits are subject to change. 始终可以在所有 Azure 的服务级别配额文档中找到最新的限制。The latest can always be found at the service-level quota document for all of Azure.

虚拟机Virtual machines

对于每个 Azure 订阅,可以跨服务使用或独立使用的虚拟机数量有限制。For each Azure subscription, there is a limit on the number of virtual machines you can have across your services or standalone. 基于总核心数和每个系列的区域级别也同样有此限制。This limit is at the region level both on the total cores and also on a per family basis.

虚拟机核心数既有区域总数限制,也有区域按大小系列(Dv2、F 等)限制,这两种限制单独实施。Virtual machine cores have a regional total limit and a regional per size series (Dv2, F, etc.) limit, both of which are separately enforced. 例如,假设某个订阅的美国东部 VM 核心总数限制为 30,A 系列核心数限制为 30,D 系列核心数限制为 30。For example, consider a subscription with a US East total VM core limit of 30, an A series core limit of 30, and a D series core limit of 30. 该订阅可以部署 30 个 A1 VM、30 个 D1 VM,或者两者的组合,但其总数不能超过 30 个核心(例如,10 个 A1 VM 和 20 个 D1 VM)。This subscription would be allowed to deploy 30 A1 VMs, or 30 D1 VMs, or a combination of the two not to exceed a total of 30 cores (for example, 10 A1 VMs and 20 D1 VMs).

资源Resource 限制Limit
每个 Azure Active Directory 租户的订阅数Subscriptions per Azure Active Directory tenant 不受限制。Unlimited.
每个订阅的协同管理员数Coadministrators per subscription 不受限制。Unlimited.
每个订阅的资源组数Resource groups per subscription 980980
Azure 资源管理器 API 请求大小Azure Resource Manager API request size 4,194,304 字节。4,194,304 bytes.
每个订阅的标记数1Tags per subscription1 5050
每个订阅的唯一标记计算数1Unique tag calculations per subscription1 10,00010,000
每个位置的订阅级部署数Subscription-level deployments per location 80028002

1可以将最多 50 个标记直接应用于一个订阅。1You can apply up to 50 tags directly to a subscription. 但是,订阅可以包含无限数量的标记,这些标记应用于订阅中的资源组和资源。However, the subscription can contain an unlimited number of tags that are applied to resource groups and resources within the subscription. 每个资源或资源组的标记数限制为 50。The number of tags per resource or resource group is limited to 50. 当标记数少于或等于 10,000 时,资源管理器仅返回订阅中唯一标记名和值的列表Resource Manager returns a list of unique tag name and values in the subscription only when the number of tags is 10,000 or less. 即使数目超过 10,000,也仍可按标记查找资源。You still can find a resource by tag when the number exceeds 10,000.

2如果达到 800 个部署的限制,则会从历史记录中删除不再需要的部署。2If you reach the limit of 800 deployments, delete deployments from the history that are no longer needed. 若要删除订阅级别部署,请使用 Remove-AzDeploymentaz deployment sub deleteTo delete subscription level deployments, use Remove-AzDeployment or az deployment sub delete.

Azure 机器学习计算Azure Machine Learning Compute

Azure 机器学习计算中,订阅中每个区域所允许的核心数和唯一计算资源数都有默认配额限制。For Azure Machine Learning Compute, there is a default quota limit on both the number of cores and number of unique compute resources allowed per region in a subscription. 该配额与上述的 VM 核心配额不同,并且核心限制没有在这两种资源类型之间共享,因为 AmlCompute 是在托管代表模型中部署资源的托管服务。This quota is separate from the VM core quota above and the core limits are not shared between the two resource types since AmlCompute is a managed service that deploys resources in a hosted-on-behalf-of model.

可用资源:Available resources:

  • 每个区域的专用核心数的默认限制为 24 - 300 个,具体取决于订阅套餐的类型,EA 和 CSP 套餐类型的默认限制较高。Dedicated cores per region have a default limit of 24 - 300 depending on your subscription offer type with higher defaults for EA and CSP offer types. 每个订阅的专用核心数可以提高,每个 VM 系列的此项限制各不相同。The number of dedicated cores per subscription can be increased and is different for each VM family. 某些专业化 VM 系列(例如 NCv2、NCv3 或 ND 系列)最初的默认限制为零个核心。Certain specialized VM families like NCv2, NCv3, or ND series start with a default of zero cores. 可以通过提出配额请求来联系 Azure 支持部门,以讨论限制提升选项。Contact Azure support by raising a quota request to discuss increase options.

  • 每个区域的低优先级核心数的默认限制为 100 - 3000 个,具体取决于订阅套餐的类型,EA 和 CSP 套餐类型的默认限制较高。Low-priority cores per region have a default limit of 100 - 3000 depending on your subscription offer type with higher defaults for EA and CSP offer types. 每个订阅的低优先级核心数可以提高,对不同的 VM 系列采用单个值。The number of low-priority cores per subscription can be increased and is a single value across VM families. 请联系 Azure 支持以讨论增加选项。Contact Azure support to discuss increase options.

  • 每个区域的群集数的默认限制为 200。Clusters per region have a default limit of 200. 该数字在训练群集与计算实例(在配额消耗中被视为单节点群集)之间共享。These are shared between a training cluster and a compute instance (which is considered as a single node cluster for quota purposes). 如果请求增加的配额超出此限制,请与 Azure 支持部门联系。Contact Azure support if you want to request an increase beyond this limit.

  • 下面是其他严格限制,不能超出这些限制。There are other strict limits that cannot be exceeded once hit.

资源Resource 最大限制Maximum limit
每个资源组的最大工作区数Maximum workspaces per resource group 800800
单个 Azure 机器学习计算 (AmlCompute) 资源中的最大节数点Maximum nodes in a single Azure Machine Learning Compute (AmlCompute) resource 100 个节点100 nodes
每个节点的最大 GPU MPI 进程数Maximum GPU MPI processes per node 1-41-4
每个节点的最大 GPU 辅助角色数Maximum GPU workers per node 1-41-4
最长作业生存期Maximum job lifetime 21 天121 days1
低优先级节点上的最大作业生存期Maximum job lifetime on a Low-Priority Node 7 天27 days2
每个节点的最大参数服务器数Maximum parameter servers per node 11

1 最长生存期是指运行从开始到结束的时间。1 The maximum lifetime refers to the time that a run start and when it finishes. 已完成的运行会无限期保存;最长生存期内未完成的运行的数据不可访问。Completed runs persist indefinitely; data for runs not completed within the maximum lifetime is not accessible. 2 每当存在容量约束时,低优先级节点上的作业可能会预先清空。2 Jobs on a Low-Priority node could be preempted anytime there is a capacity constraint. 我们建议在作业中实施检查点。We recommend you implement checkpointing in your job.

Azure 机器学习管道Azure Machine Learning Pipelines

Azure 机器学习管道中,管道中的步骤数有配额限制,订阅中每个区域已发布管道的基于计划的运行数也有配额限制。For Azure Machine Learning Pipelines, there is a quota limit on the number of steps in a pipeline and on the number of schedule-based runs of published pipelines per region in a subscription.

  • 管道中所允许的最大步骤数为 30,000Maximum number of steps allowed in a pipeline is 30,000
  • 每个月根据每个订阅的已发布管道的博客触发计划执行的基于计划的运行数与 Blob 提取数之和的最大数目为 100,000Maximum number of the sum of schedule-based runs and blob pulls for blog-triggered schedules of published pipelines per subscription per month is 100,000

容器实例Container instances

可以在给定时间段内(以小时为范围)或在你的整个订阅中启动的容器实例数量也有限制。There is also a limit on the number of container instances that you can spin up in a given time period (scoped hourly) or across your entire subscription. 有关限制,请参阅容器实例限制For the limits, see Container Instances limits.

存储Storage

给定订阅中每个区域的存储帐户数量也有限制。There is a limit on the number of storage accounts per region as well in a given subscription. 默认限制数量为 250,包括标准和高级存储帐户。The default limit is 250 and includes both Standard and Premium Storage accounts. 如果在某特定区域中需要的存储帐户多于 250 个,请通过 Azure 支持提出请求。If you require more than 250 storage accounts in a given region, make a request through Azure Support. Azure 存储团队将评审业务案例,对于特定区域最多可以批准 250 个存储帐户。The Azure Storage team will review your business case and may approve up to 250 storage accounts for a given region.

工作区级别配额Workspace level quota

为了更好地管理各种工作区之间的 Azure 机器学习计算目标 (Amlcompute) 的资源分配,我们引入了一项功能,该功能允许你分发订阅级别配额(按 VM 系列),并在工作区级别配置这些配额。To better manage resource allocations for Azure Machine Learning Compute target (Amlcompute) between various workspaces, we have introduced a feature that allows you to distribute subscription level quotas (by VM family) and configure them at the workspace level. 默认行为是所有工作区都具有与任何 VM 系列的订阅级别配额相同的配额。The default behavior is that all workspaces have the same quota as the subscription level quota for any VM family. 但是,随着工作区数量的增大,不同优先级的工作负荷将开始共享相同的资源,用户需要通过某种方式来更好地共享容量和避免资源争用问题。However, as the number of workspaces increases, and workloads of varying priority start sharing the same resources, users want a way to better share capacity and avoid resource contention issues. Azure 机器学习允许用户针对每个工作区中的特定 VM 系列设置最大配额,通过其托管计算产品/服务来提供解决方案。Azure Machine Learning provides a solution with its managed compute offering by allowing users to set a maximum quota for a particular VM family on each workspace. 这类似于在工作区之间分配容量,而用户也可以选择过度分配资源来促成最大利用率。This is analogous to distributing your capacity between workspaces, and the users can choose to also over-allocate to drive maximum utilization.

若要在工作区级别设置配额,请转到订阅中的任何工作区,并在左侧窗格中单击“用量 + 配额”。To set quotas at the workspace level, go to any workspace in your subscription, and click on Usages + quotas in the left pane. 然后选择“配置配额”选项卡查看配额,展开任一 VM 系列,并针对该 VM 系列下面列出的任何工作区设置配额限制。Then select the Configure quotas tab to view the quotas, expand any VM family, and set a quota limit on any workspace listed under that VM family. 请记住,不能设置负值或大于订阅级配额的值。Remember that you cannot set a negative value or a value higher than the subscription level quota. 此外,可以看到,默认将为所有工作区分配整个订阅配额,以充分利用分配的配额。Also, as you would observe, by default all workspaces are assigned the entire subscription quota to allow for full utilization of the allocated quota.

Azure 机器学习工作区级别的配额Azure Machine Learning workspace level quota

备注

这只是企业版功能。This is an Enterprise edition feature only. 如果订阅中有基本版和企业版工作区,则可以使用此方法仅对企业工作区设置配额。If you have both a Basic and an Enterprise edition workspace in your subscription, you can use this to only set quotas on your Enterprise workspaces. 基本工作区将继续具有订阅级别配额,这是默认行为。Your Basic workspaces will continue to have the subscription level quota which is the default behavior.

需要拥有订阅级别的权限才能在工作区级别设置配额。You need subscription level permissions to set quota at the workspace level. 这是一项强制性的要求,目的是避免单个工作区所有者编辑或提高其配额,然后开始侵占为其他工作区预留的资源。This is enforced so that individual workspace owners do not edit or increase their quotas and start encroaching onto resources set aside for another workspace. 因此,最适合由订阅管理员分配这些配额并将其分摊到各个工作区。Thus a subscription admin is best suited to allocate and distribute these quotas across workspaces.

查看使用情况和配额View your usage and quotas

订阅中的 Azure 机器学习计算配额与其他 Azure 资源配额分开管理。Azure Machine Learning Compute quota on your subscription is managed separately from other Azure resources quota. 若要查看此配额,需要向下钻取到机器学习服务。To view this quota, you need to drill down into Machine Learning services.

  1. 在左窗格中,选择“机器学习服务”,然后从显示的列表中选择任何工作区。On the left pane, select Machine Learning service and then select any workspace from the list shown.

  2. 在下一个边栏选项卡中,在“支持 + 故障排除部分”下,选择“使用情况 + 配额”以查看当前配额限制和使用情况 。On the next blade, under the Support + troubleshooting section select Usage + quotas to view your current quota limits and usage.

  3. 选择订阅以查看配额限制。Select a subscription to view the quota limits. 请记住筛选到所需的区域。Remember to filter to the region you are interested in.

  4. 现在,可以在订阅级视图与工作区级视图之间切换:You can now toggle between a subscription level view and a workspace level view:

    • 订阅视图: 在此视图中可以按 VM 系列查看核心配额的用量,按工作区展开此节点,然后按实际群集名称进一步展开此节点。Subscription view: This allows you to view your usage of core quota by VM family, expanding it by workspace, and further expanding it by the actual cluster names. 此视图最适合用于快速了解特定 VM 系列的核心用量详细信息,以查看各工作区的细分,以及按每个工作区的基础群集进一步提供的细分。This view is optimal for quickly getting into the details of core usage for a particular VM family to see the break-up by workspaces and further by the underlying clusters for each of those workspaces. 此视图中的一般约定是“(用量/配额)”,其中,用量是当前的已扩展核心数,配额是资源可以扩展到的逻辑最大核心数。The general convention in this view is (usage/quota), where the usage is the current number of scaled up cores, and quota is the logical maximum number of cores that the resource can scale to. 对于每个工作区,配额是工作区级配额(如前所述),表示特定 VM 系列可扩展到的最大核心数。For each workspace, the quota would be the workspace level quota (as explained above) which denotes the maximum number of cores that you can scale to for a particular VM family. 同理,对于群集,配额实际上是对应于群集可扩展到的最大节点数(由 max_nodes 属性定义)的核心数。For a cluster similarly, the quota is actually the cores corresponding to the maximum number of nodes that the cluster can scale to defined by the max_nodes property.

    • 工作区视图: 在此视图中可以按工作区查看核心配额的用量,按 VM 系列展开此节点,然后按实际群集名称进一步展开此节点。Workspace view: This allows you to view your usage of core quota by Workspace, expanding it by VM family, and further expanding it by the actual cluster names. 此视图最适合快速了解特定工作区的核心使用情况的详细信息,以查看按 VM 系列列出的细分,以及进一步按每个系列的基础群集列出的细分。This view is optimal for quickly getting into the details of core usage for a particular workspace to see the break-up by VM families and further by the underlying clusters for each of those families.

通过 Azure 门户可以轻松查看各种其他 Azure 资源的配额,例如虚拟机、存储和网络。Viewing your quota for various other Azure resources, such as Virtual Machines, Storage, Network, is easy through the Azure portal.

  1. 在左窗格上,选择“所有服务”,然后在“一般”类别下选择“订阅” 。On the left pane, select All services and then select Subscriptions under the General category.

  2. 从订阅列表中选择要查找其配额的订阅。From the list of subscriptions, select the subscription whose quota you are looking for.

  3. 选择“使用情况 + 配额”以查看当前的配额限制和使用情况。Select Usage + quotas to view your current quota limits and usage. 使用筛选器选择提供者和位置。Use the filters to select the provider and locations.

请求增加配额Request quota increases

如果想将限制或配额提高到默认值限制以上,可以免费打开联机客户支持请求If you want to raise the limit or quota above the default limit, open an online customer support request at no charge.

无法将限制提高到表中所示的最大限制值。The limits can't be raised above the maximum limit value shown in the tables. 如果没有最大限制,则资源没有可调整的限制。If there is no maximum limit, then the resource doesn't have adjustable limits. 请参阅有关如何增加配额的分步说明See step by step instructions on how to increase your quota.

请求配额增加时,需要选择要请求提高配额的服务,这可能是机器学习服务配额、容器实例或存储配额的服务。When requesting a quota increase, you need to select the service you are requesting to raise the quota against, which could be services such as Machine Learning service quota, Container instances or Storage quota. 此外,对于 Azure 机器学习计算,可以在按照上述步骤查看配额时单击“请求配额”按钮。In addition for Azure Machine Learning Compute, you can click on the Request Quota button while viewing the quota following the steps above.