用于 Node.js 的 Batch SDK 入门Get started with Batch SDK for Node.js

了解使用 Azure Batch Node.js SDK 在 Node.js 中生成 Batch 客户端的基础知识。Learn the basics of building a Batch client in Node.js using Azure Batch Node.js SDK. 我们采用分步方式来了解一个 Batch 应用程序的方案,然后通过 Node.js 客户端设置该方案。We take a step by step approach of understanding a scenario for a batch application and then setting it up using a Node.js client.

先决条件Prerequisites

本文假设你有 Node.js 的实践知识并熟悉 Linux,This article assumes that you have a working knowledge of Node.js and familiarity with Linux. 同时还假设你已设置 Azure 帐户并具有创建 Batch 和存储服务所需的访问权限。It also assumes that you have an Azure account setup with access rights to create Batch and Storage services.

我们建议你在完成本文概述的步骤之前,先阅读 Azure Batch 技术概述We recommend reading Azure Batch Technical Overview before you go through the steps outlined this article.

教程方案The tutorial scenario

让我们了解 Batch 工作流方案。Let us understand the batch workflow scenario. 我们有一个简单的以 Python 编写的脚本,该脚本从 Azure Blob 存储容器下载所有 csv 文件,并将其转换为 JSON。We have a simple script written in Python that downloads all csv files from an Azure Blob storage container and converts them to JSON. 若要并行处理多个存储帐户容器,可将脚本部署为 Azure Batch 作业。To process multiple storage account containers in parallel, we can deploy the script as an Azure Batch job.

Azure Batch 体系结构Azure Batch Architecture

下图描绘了如何使用 Azure Batch 和 Node.js 客户端来缩放 Python 脚本。The following diagram depicts how we can scale the Python script using Azure Batch and a Node.js client.

Azure Batch 方案

node.js 客户端通过一个准备任务(稍后详细介绍)和一系列其他任务部署 Batch 作业,具体取决于存储帐户中的容器数。The node.js client deploys a batch job with a preparation task (explained in detail later) and a set of tasks depending on the number of containers in the storage account. 可以从 GitHub 存储库下载脚本。You can download the scripts from the GitHub repository.

提示

指定链接中的 Node.js 客户端不包含可部署为 Azure Function App 的特定代码。The Node.js client in the link specified does not contain specific code to be deployed as an Azure function app. 如需创建该应用的说明,可参阅以下链接。You can refer to the following links for instructions to create one.

构建应用程序Build the application

现在,让我们一步步按过程来构建 Node.js 客户端:Now, let us follow the process step by step into building the Node.js client:

步骤 1:安装 Azure Batch SDKStep 1: Install Azure Batch SDK

可以使用 npm install 命令安装用于 Node.js 的 Azure Batch SDK。You can install Azure Batch SDK for Node.js using the npm install command.

npm install azure-batch

该命令安装最新版的 azure-batch Node SDK。This command installs the latest version of azure-batch node SDK.

提示

在 Azure 函数应用中,若要运行 npm install 命令,可以转到 Azure Function 的“设置”选项卡中的“Kudu 控制台”。In an Azure Function app, you can go to "Kudu Console" in the Azure function's Settings tab to run the npm install commands. 在此示例中,目的是安装用于 Node.js 的 Azure Batch SDK。In this case to install Azure Batch SDK for Node.js.

步骤 2:创建 Azure Batch 帐户Step 2: Create an Azure Batch account

可以通过 Azure 门户或命令行 (PowerShell /Azure CLI) 创建该帐户。You can create it from the Azure portal or from command line (PowerShell /Azure CLI).

下面是通过 Azure CLI 创建该帐户的命令。Following are the commands to create one through Azure CLI.

创建一个资源组。如果你已经有一个需要在其中创建 Batch 帐户的资源组,则请跳过此步骤:Create a Resource Group, skip this step if you already have one where you want to create the Batch Account:

az group create -n "<resource-group-name>" -l "<location>"

接下来,创建 Azure Batch 帐户。Next, create an Azure Batch account.

az batch account create -l "<location>" -g "<resource-group-name>" -n "<batch-account-name>"

每个 Batch 帐户都有其相应的访问密钥。Each Batch account has its corresponding access keys. 需要使用这些密钥才能在 Azure Batch 帐户中创建更多的资源。These keys are needed to create further resources in Azure batch account. 对生产环境有利的做法是使用 Azure Key Vault 来存储这些密钥。A good practice for production environment is to use Azure Key Vault to store these keys. 然后即可为应用程序创建服务主体。You can then create a Service principal for the application. 应用程序可以使用该服务主体创建一个 OAuth 令牌,以便访问 Key Vault 中的密钥。Using this service principal the application can create an OAuth token to access keys from the key vault.

az batch account keys list -g "<resource-group-name>" -n "<batch-account-name>"

复制并存储可在后续步骤中使用的密钥。Copy and store the key to be used in the subsequent steps.

步骤 3:创建 Azure Batch 服务客户端Step 3: Create an Azure Batch service client

以下代码片段首先导入 azure-batch Node.js 模块,然后创建 Batch 服务客户端。Following code snippet first imports the azure-batch Node.js module and then creates a Batch Service client. 需先使用从前一步骤复制的 Batch 帐户密钥创建 SharedKeyCredentials 对象。You need to first create a SharedKeyCredentials object with the Batch account key copied from the previous step.

// Initializing Azure Batch variables

var batch = require('azure-batch');

var accountName = '<azure-batch-account-name>';

var accountKey = '<account-key-downloaded>';

var accountUrl = '<account-url>'

// Create Batch credentials object using account name and account key

var credentials = new batch.SharedKeyCredentials(accountName,accountKey);

// Create Batch service client

var batch_client = new batch.ServiceClient(credentials,accountUrl);

Azure Batch URI 可以在 Azure 门户的“概览”选项卡中找到。The Azure Batch URI can be found in the Overview tab of the Azure portal. 它的格式为:It is of the format:

https://accountname.location.batch.chinacloudapi.cn

请参阅屏幕截图:Refer to the screenshot:

Azure Batch URI

步骤 4:创建 Azure Batch 池Step 4: Create an Azure Batch pool

Azure Batch 池包含多个 VM(也称 Batch 节点)。An Azure Batch pool consists of multiple VMs (also known as Batch Nodes). Azure Batch 服务将任务部署在这些节点上并对其进行管理。Azure Batch service deploys the tasks on these nodes and manages them. 可以为池定义以下配置参数。You can define the following configuration parameters for your pool.

  • 虚拟机映像类型Type of Virtual Machine image
  • 虚拟机节点大小Size of Virtual Machine nodes
  • 虚拟机节点数目Number of Virtual Machine nodes

提示

虚拟机节点的大小和数目主要取决于需要并行运行的任务数以及任务本身。The size and number of Virtual Machine nodes largely depend on the number of tasks you want to run in parallel and also the task itself. 建议通过测试来确定理想的数目和大小。We recommend testing to determine the ideal number and size.

以下代码片段创建配置参数对象。The following code snippet creates the configuration parameter objects.

// Creating Image reference configuration for Ubuntu Linux VM
var imgRef = {publisher:"Canonical",offer:"UbuntuServer",sku:"14.04.2-LTS",version:"latest"}

// Creating the VM configuration object with the SKUID
var vmconfig = {imageReference:imgRef,nodeAgentSKUId:"batch.node.ubuntu 14.04"}

// Setting the VM size to Standard F4
var vmSize = "STANDARD_F4"

//Setting number of VMs in the pool to 4
var numVMs = 4

提示

如需可供 Azure Batch 使用的 Linux VM 映像及其 SKU ID 的列表,请参阅虚拟机映像列表For the list of Linux VM images available for Azure Batch and their SKU IDs, see List of virtual machine images.

定义池配置后,可以创建 Azure Batch 池。Once the pool configuration is defined, you can create the Azure Batch pool. Batch 池命令可创建 Azure 虚拟机节点并对其进行准备,使之能够接收要执行的任务。The Batch pool command creates Azure Virtual Machine nodes and prepares them to be ready to receive tasks to execute. 每个池都应有一个可在后续步骤中引用的唯一 ID。Each pool should have a unique ID for reference in subsequent steps.

以下代码片段可创建 Azure Batch 池。The following code snippet creates an Azure Batch pool.

// Create a unique Azure Batch pool ID
var poolid = "pool" + customerDetails.customerid;
var poolConfig = {id:poolid, displayName:poolid,vmSize:vmSize,virtualMachineConfiguration:vmconfig,targetDedicatedComputeNodes:numVms,enableAutoScale:false };
// Creating the Pool for the specific customer
var pool = batch_client.pool.add(poolConfig,function(error,result){
    if(error!=null){console.log(error.response)};
});

你可以检查所创建池的状态,确保状态为“活动”,然后再继续操作,将作业提交到该池。You can check the status of the pool created and ensure that the state is in "active" before going ahead with submission of a Job to that pool.

var cloudPool = batch_client.pool.get(poolid,function(error,result,request,response){
        if(error == null)
        {

            if(result.state == "active")
            {
                console.log("Pool is active");
            }
        }
        else
        {
            if(error.statusCode==404)
            {
                console.log("Pool not found yet returned 404...");    

            }
            else
            {
                console.log("Error occurred while retrieving pool data");
            }
        }
        });

下面是由 pool.get 函数返回的结果对象示例。Following is a sample result object returned by the pool.get function.

{ id: 'processcsv_201721152',
  displayName: 'processcsv_201721152',
  url: 'https://<batch-account-name>.chinaeast.batch.chinacloudapi.cn/pools/processcsv_201721152',
  eTag: '<eTag>',
  lastModified: 2017-03-27T10:28:02.398Z,
  creationTime: 2017-03-27T10:28:02.398Z,
  state: 'active',
  stateTransitionTime: 2017-03-27T10:28:02.398Z,
  allocationState: 'resizing',
  allocationStateTransitionTime: 2017-03-27T10:28:02.398Z,
  vmSize: 'standard_a1',
  virtualMachineConfiguration:
   { imageReference:
      { publisher: 'Canonical',
        offer: 'UbuntuServer',
        sku: '14.04.2-LTS',
        version: 'latest' },
     nodeAgentSKUId: 'batch.node.ubuntu 14.04' },
  resizeTimeout:
   { [Number: 900000]
     _milliseconds: 900000,
     _days: 0,
     _months: 0,
     _data:
      { milliseconds: 0,
        seconds: 0,
        minutes: 15,
        hours: 0,
        days: 0,
        months: 0,
        years: 0 },
     _locale:
      Locale {
        _calendar: [Object],
        _longDateFormat: [Object],
        _invalidDate: 'Invalid date',
        ordinal: [Function: ordinal],
        _ordinalParse: /\d{1,2}(th|st|nd|rd)/,
        _relativeTime: [Object],
        _months: [Object],
        _monthsShort: [Object],
        _week: [Object],
        _weekdays: [Object],
        _weekdaysMin: [Object],
        _weekdaysShort: [Object],
        _meridiemParse: /[ap]\.?m?\.?/i,
        _abbr: 'en',
        _config: [Object],
        _ordinalParseLenient: /\d{1,2}(th|st|nd|rd)|\d{1,2}/ } },
  currentDedicated: 0,
  targetDedicated: 4,
  enableAutoScale: false,
  enableInterNodeCommunication: false,
  taskSlotsPerNode: 1,
  taskSchedulingPolicy: { nodeFillType: 'Spread' } }

步骤 4:提交 Azure Batch 作业Step 4: Submit an Azure Batch job

Azure Batch 作业是包含相似任务的逻辑组。An Azure Batch job is a logical group of similar tasks. 在我们的方案中,它是指“将 csv 处理成 JSON”。In our scenario, it is "Process csv to JSON." 这里的每个任务可能都在处理每个 Azure 存储容器中存在的 csv 文件。Each task here could be processing csv files present in each Azure Storage container.

这些任务会并行运行,并且跨多个节点部署,由 Azure Batch 服务进行协调。These tasks would run in parallel and deployed across multiple nodes, orchestrated by the Azure Batch service.

提示

可以使用 taskSlotsPerNode 属性指定能够在单个节点上同时运行的最大任务数。You can use the taskSlotsPerNode property to specify maximum number of tasks that can run concurrently on a single node.

准备任务Preparation task

所创建的 VM 节点是空白 Ubuntu 节点。The VM nodes created are blank Ubuntu nodes. 通常需安装一组程序作为必备组件。Often, you need to install a set of programs as prerequisites. 对于 Linux 节点,通常可在实际任务运行之前使用 shell 脚本安装必备组件。Typically, for Linux nodes you can have a shell script that installs the prerequisites before the actual tasks run. 不过,也可通过任何可编程的可执行文件来完成该操作。However it could be any programmable executable. 在此示例中,shell 脚本安装 Python-pip 以及用于 Python 的 Microsoft Azure 存储 SDK。The shell script in this example installs Python-pip and the Microsoft Azure Storage SDK for Python.

可以将脚本上传到 Azure 存储帐户,并生成用于访问脚本的 SAS URI。You can upload the script on an Azure Storage Account and generate a SAS URI to access the script. 还可使用 Azure 存储 Node.js SDK 自动执行此过程。This process can also be automated using the Azure Storage Node.js SDK.

提示

作业的准备任务仅在需要运行特定任务的 VM 节点上运行。A preparation task for a job runs only on the VM nodes where the specific task needs to run. 如果需要在所有节点上安装必备组件,而不管在其上运行的任务是什么,则可在添加池时使用 startTask 属性。If you want prerequisites to be installed on all nodes irrespective of the tasks that run on it, you can use the startTask property while adding a pool. 可以使用以下准备任务定义作为参考。You can use the following preparation task definition for reference.

准备任务在提交 Azure Batch 作业时指定。A preparation task is specified during the submission of Azure Batch job. 以下是准备任务配置参数:Following are the preparation task configuration parameters:

  • ID :准备任务的唯一标识符ID : A unique identifier for the preparation task
  • commandLine :用于执行任务可执行文件的命令行commandLine : Command line to execute the task executable
  • resourceFiles :对象数组,提供运行此任务时需下载的文件的详细信息。resourceFiles : Array of objects that provide details of files needed to be downloaded for this task to run. 下面是其选项Following are its options
    • blobSource:文件的 SAS URIblobSource: The SAS URI of the file
    • filePath:下载并保存文件所需的本地路径filePath: Local path to download and save the file
    • fileMode:仅适用于 Linux 节点。fileMode 采用八进制格式,默认值为 0770fileMode: Only applicable for Linux nodes, fileMode is in octal format with a default value of 0770
  • waitForSuccess :如果设置为 true,该任务不会在准备任务失败的情况下运行waitForSuccess : If set to true, the task does not run on preparation task failures
  • runElevated :如果需要提升权限才能运行该任务,则设置为 true。runElevated : Set it to true if elevated privileges are needed to run the task.

以下代码片段显示了准备任务脚本配置示例:Following code snippet shows the preparation task script configuration sample:

var job_prep_task_config = {id:"installprereq",commandLine:"sudo sh startup_prereq.sh > startup.log",resourceFiles:[{'blobSource':'Blob SAS URI','filePath':'startup_prereq.sh'}],waitForSuccess:true,runElevated:true}

如果不需安装任何必备组件即可运行任务,则可跳过准备任务。If there are no prerequisites to be installed for your tasks to run, you can skip the preparation tasks. 以下代码创建显示名称为“process csv files”的作业。Following code creates a job with display name "process csv files."

// Setting up Batch pool configuration
var pool_config = {poolId:poolid}
// Setting up Job configuration along with preparation task
var jobId = "processcsvjob"
var job_config = {id:jobId,displayName:"process csv files",jobPreparationTask:job_prep_task_config,poolInfo:pool_config}
// Adding Azure batch job to the pool
var job = batch_client.job.add(job_config,function(error,result){
    if(error != null)
    {
        console.log("Error submitting job : " + error.response);
    }});

步骤 5:为作业提交 Azure Batch 任务Step 5: Submit Azure Batch tasks for a job

创建“process csv”作业以后,让我们创建该作业的任务。Now that our process csv job is created, let us create tasks for that job. 假设我们有四个容器,则必须创建四个任务,一个容器一个任务。Assuming we have four containers, we have to create four tasks, one for each container.

如果我们查看 Python 脚本,可以看到它接受两个参数:If we look at the Python script, it accepts two parameters:

  • 容器名称:要从其中下载文件的存储容器container name: The Storage container to download files from
  • 模式:文件名称模式的可选参数pattern: An optional parameter of file name pattern

假设我们有四个容器,分别为“con1”、“con2”、“con3”、“con4”。以下代码显示了如何将任务提交到我们此前创建的 Azure Batch 作业“process csv”。Assuming we have four containers "con1", "con2", "con3","con4" following code shows submitting for tasks to the Azure batch job "process csv" we created earlier.

// storing container names in an array
var container_list = ["con1","con2","con3","con4"]
    container_list.forEach(function(val,index){

           var container_name = val;
           var taskID = container_name + "_process";
           var task_config = {id:taskID,displayName:'process csv in ' + container_name,commandLine:'python processcsv.py --container ' + container_name,resourceFiles:[{'blobSource':'<blob SAS URI>','filePath':'processcsv.py'}]}
           var task = batch_client.task.add(poolid,task_config,function(error,result){
                if(error != null)
                {
                    console.log(error.response);
                }
                else
                {
                    console.log("Task for container : " + container_name + "submitted successfully");
                }

           });

    });

该代码将多个任务添加到池。The code adds multiple tasks to the pool. 每个任务在所创建的 VM 池中的一个节点上执行。And each of the tasks is executed on a node in the pool of VMs created. 如果任务数超出池中的 VM 数或 taskSlotsPerNode 属性,则任务会等待节点可用。If the number of tasks exceeds the number of VMs in a pool or the taskSlotsPerNode property, the tasks wait until a node is made available. 此业务流程由 Azure Batch 自动处理。This orchestration is handled by Azure Batch automatically.

门户提供了有关任务和作业状态的详细视图。The portal has detailed views on the tasks and job statuses. 也可使用列表,获取 Azure Node SDK 中的函数。You can also use the list and get functions in the Azure Node SDK. 文档链接中提供了详细信息。Details are provided in the documentation link.

后续步骤Next steps