将应用程序和数据复制到池节点Copy applications and data to pool nodes

Azure Batch 支持用多种方式来将数据和应用程序提取到计算节点中,使这些数据和应用程序可供任务使用。Azure Batch supports several ways for getting data and applications onto compute nodes so that the data and applications are available for use by tasks. 运行整个作业可能需要数据和应用程序,因此需要在每个节点上安装它们。Data and applications may be required to run the entire job and so need to be installed on every node. 一些可能只是特定任务需要,或者需要针对作业进行安装,而不需要在每个节点上安装。Some may be required only for a specific task, or need to be installed for the job but don't need to be on every node. Batch 为上述每种场景都提供了相关工具。Batch has tools for each of these scenarios.

  • 池启动任务资源文件:针对需要在池中每个节点上安装的应用程序或数据。Pool start task resource files: For applications or data that need to be installed on every node in the pool. 将此方法与应用程序包或启动任务的资源文件集合一起使用来执行安装命令。Use this method along with either an application package or the start task's resource file collection in order to perform an install command.

示例:Examples:

  • 使用启动任务命令行来移动或安装应用程序Use the start task command line to move or install applications

  • 在 Azure 存储帐户中指定特定文件或容器的列表。Specify a list of specific files or containers in an Azure storage account. 有关详细信息,请参阅 REST 文档中的 add#resourcefileFor more information see add#resourcefile in REST documentation

  • 池中运行的每个作业都会运行 MyApplication.exe,后者必须先使用 MyApplication.msi 进行安装。Every job that runs on the pool runs MyApplication.exe that must first be installed with MyApplication.msi. 如果使用此机制,需要将启动任务的“等待成功”属性设置为 true 。If you use this mechanism, you need to set the start task's wait for success property to true. 有关详细信息,请参阅 REST 文档中的 add#starttaskFor more information, see the add#starttask in REST documentation.

  • 池中的应用程序包引用:针对需要在池中每个节点上安装的应用程序或数据。Application package references on the pool: For applications or data that need to be installed on every node in the pool. 没有与应用程序包关联的安装命令,但你可使用启动任务来运行任何安装命令。There is no install command associated with an application package, but you can use a start task to run any install command. 如果应用程序无需安装或者包含大量文件,则可使用此方法。If your application doesn't require installation, or consists of a large number of files, you can use this method. 应用程序包非常适合大量文件,这是因为它们会将大量文件引用组合到一个小的有效负载中。Application packages are well suited for large numbers of files because they combine a large number of file references into a small payload. 如果尝试将 100 个以上单独的资源文件包含到一个任务中,Batch 服务可能会遇到单任务内部系统限制。If you try to include more than 100 separate resource files into one task, the Batch service might come up against internal system limitations for a single task. 此外,如果你有严格的版本控制需求,你可能有同一应用程序的多个不同版本,且需要在这些版本之间进行选择,那么请使用应用程序包。Also, use application packages if you have rigorous versioning requirements where you might have many different versions of the same application and need to choose between them. 有关详细信息,请参阅使用 Batch 应用程序包将应用程序部署到计算节点For more information, read Deploy applications to compute nodes with Batch application packages.

  • 作业准备任务资源文件:针对为使作业运行而必须安装,但无需在整个池中安装的应用程序或数据。Job preparation task resource files: For applications or data that must be installed for the job to run, but don't need to be installed on the entire pool. 例如,如果你的池有多个不同类型的作业,但只有一个作业类型需要 MyApplication.msi 才能运行,则在作业准备任务中进行安装很合理。For example: if your pool has many different types of jobs, and only one job type needs MyApplication.msi to run, it makes sense to put the installation step into a job preparation task. 有关作业准备任务的详细信息,请参阅在 Batch 计算节点上运行作业准备和作业发布任务For more information about job preparation tasks see Run job preparation and job release tasks on Batch compute nodes.

  • 任务资源文件:适合应用程序或数据仅与单个任务相关的情况。Task resource files: For when an application or data is relevant only to an individual task. 例如:你有 5 项任务,每一项处理一个不同的文件,然后将输出写入 Blob 存储。For example: You have five tasks, each processes a different file and then writes the output to blob storage. 在这种情况下,应在任务资源文件上指定输入文件,因为每项任务都有自己的输入文件。In this case, the input file should be specified on the tasks resource files collection because each task has its own input file.

确定文件所需的范围Determine the scope required of a file

需要确定文件的范围,即需要文件的是池、作业还是任务。You need to determine the scope of a file - is the file required for a pool, a job, or a task. 范围设为池的文件应使用池应用程序包或启动任务。Files that are scoped to the pool should use pool application packages, or a start task. 范围设为作业的文件应使用作业准备任务。Files scoped to the job should use a job preparation task. 范围设在池或作业级别的文件的一个很好的例子就是应用程序。A good example of files scoped at the pool or job level are applications. 范围设为任务的文件应使用任务资源文件。Files scoped to the task should use task resource files.

将数据提取到 Batch 计算节点的其他方式Other ways to get data onto Batch compute nodes

还有其他方法可将数据提取到未正式集成到 Batch REST API 的 Batch 计算节点。There are other ways to get data onto Batch compute nodes that are not officially integrated into the Batch REST API. 你可控制 Azure Batch 节点且可运行自定义可执行文件,因此你能够从任意数量的自定义源中拉取数据,前提是 Batch 节点与目标相连,并且你在 Azure Batch 节点上具有该源的凭据。Because you have control over Azure Batch nodes, and can run custom executables, you are able to pull data from any number of custom sources as long as the Batch node has connectivity to the target and you have the credentials to that source onto the Azure Batch node. 一些常见示例包括:A few common examples are:

  • 从 SQL 下载数据Downloading data from SQL
  • 从其他 Web 服务/自定义位置下载数据Downloading data from other web services/custom locations
  • 映射网络共享Mapping a network share

Azure 存储Azure storage

Blob 存储具有下载可伸缩性目标。Blob storage has download scalability targets. 对单个 Blob 来说,Azure 存储文件共享可伸缩性目标是相同的。Azure storage file share scalability targets are the same as for a single blob. 大小将影响你所需的节点数和池数。Size will impact the number of nodes and pools you need.