高级入口脚本创作Advanced entry script authoring

本文介绍如何编写用于专用用例的入口脚本。This article shows how to write entry scripts for specialized use cases.

必备条件Prerequisites

本文假设你已有一个要使用 Azure 机器学习进行部署的经过训练的机器学习模型。This article assumes you already have a trained machine learning model that you intend to deploy with Azure Machine Learning. 若要详细了解模型部署,请参阅此教程To learn more about model deployment, see this tutorial.

自动生成 Swagger 架构Automatically generate a Swagger schema

若要为 Web 服务自动生成架构,请在一个已定义的类型对象的构造函数中提供输入和/或输出的示例。To automatically generate a schema for your web service, provide a sample of the input and/or output in the constructor for one of the defined type objects. 该类型和示例用于自动创建架构。The type and sample are used to automatically create the schema. Azure 机器学习随后会在部署过程中创建 Web 服务的 OpenAPI (Swagger) 规范。Azure Machine Learning then creates an OpenAPI (Swagger) specification for the web service during deployment.

警告

对于示例输入或输出,不得使用敏感或专用数据。You must not use sensitive or private data for sample input or output. AML 托管推理的 Swagger 页公开了示例数据。The Swagger page for AML-hosted inferencing exposes the sample data.

当前支持以下类型:These types are currently supported:

  • pandas
  • numpy
  • pyspark
  • 标准 Python 对象Standard Python object

若要使用架构生成,请在依赖项文件中包括开源 inference-schema 包 1.1.0 或更高版本。To use schema generation, include the open-source inference-schema package version 1.1.0 or above in your dependencies file. 若要详细了解此包,请参阅 https://github.com/Azure/InferenceSchemaFor more information on this package, see https://github.com/Azure/InferenceSchema. 若要生成符合条件的 swagger 以自动使用 Web 服务,评分脚本 run() 函数的 API 形状必须为:In order to generate conforming swagger for automated web service consumption, scoring script run() function must have API shape of:

  • 第一个参数名为“Inputs”,类型为“StandardPythonParameterType”并处于嵌套状态。A first parameter of type "StandardPythonParameterType", named Inputs and nested.
  • 第二个参数为可选,名为“GlobalParameters”,类型为“StandardPythonParameterType”。An optional second parameter of type "StandardPythonParameterType", named GlobalParameters.
  • 返回名称为“Results”并处于嵌套状态的类型为“StandardPythonParameterType”的字典。Return a dictionary of type "StandardPythonParameterType" named Results and nested.

定义 input_sampleoutput_sample 变量中的输入和输出示例格式,它们表示 Web 服务的请求和响应格式。Define the input and output sample formats in the input_sample and output_sample variables, which represent the request and response formats for the web service. run() 函数的输入和输出函数修饰器中使用这些示例。Use these samples in the input and output function decorators on the run() function. 以下 scikit-learn 示例使用架构生成功能。The following scikit-learn example uses schema generation.

Power BI 兼容终结点Power BI compatible endpoint

下面的示例演示如何根据以上说明定义 API 形状。The following example demonstrates how to define API shape according to above instruction. 此方法支持使用 Power BI 中已部署的 Web 服务。This method is supported for consuming the deployed web service from Power BI. 详细了解如何使用 Power BI 中的 Web 服务。)(Learn more about how to consume the web service from Power BI.)

import json
import pickle
import numpy as np
import pandas as pd
import azureml.train.automl
from sklearn.externals import joblib
from sklearn.linear_model import Ridge

from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types.standard_py_parameter_type import StandardPythonParameterType
from inference_schema.parameter_types.numpy_parameter_type import NumpyParameterType
from inference_schema.parameter_types.pandas_parameter_type import PandasParameterType


def init():
    global model
    # Replace filename if needed.
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_regression_model.pkl')
    # Deserialize the model file back into a sklearn model.
    model = joblib.load(model_path)


    # providing 3 sample inputs for schema generation
    numpy_sample_input = NumpyParameterType(np.array([[1,2,3,4,5,6,7,8,9,10],[10,9,8,7,6,5,4,3,2,1]],dtype='float64'))
    pandas_sample_input = PandasParameterType(pd.DataFrame({'name': ['Sarah', 'John'], 'age': [25, 26]}))
    standard_sample_input = StandardPythonParameterType(0.0)

    # This is a nested input sample, any item wrapped by `ParameterType` will be described by schema
    sample_input = StandardPythonParameterType({'input1': numpy_sample_input, 
                                            'input2': pandas_sample_input, 
                                            'input3': standard_sample_input})

    sample_global_parameters = StandardPythonParameterType(1.0) # this is optional
    sample_output = StandardPythonParameterType([1.0, 1.0])
    outputs = StandardPythonParameterType({'Results':sample_output}) # 'Results' is case sensitive

    @input_schema('Inputs', sample_input) 
    # 'Inputs' is case sensitive
    
    @input_schema('GlobalParameters', sample_global_parameters) 
    # this is optional, 'GlobalParameters' is case sensitive

    @output_schema(outputs)

def run(Inputs, GlobalParameters): 
    # the parameters here have to match those in decorator, both 'Inputs' and 
    # 'GlobalParameters' here are case sensitive
    try:
        data = Inputs['input1']
        # data will be convert to target format
        assert isinstance(data, np.ndarray)
        result = model.predict(data)
        return result.tolist()
    except Exception as e:
        error = str(e)
        return error

二进制(即图像)数据Binary (i.e. image) data

如果模型接受二进制数据(如映像),则必须修改用于部署的 score.py 文件以接受原始 HTTP 请求。If your model accepts binary data, like an image, you must modify the score.py file used for your deployment to accept raw HTTP requests. 若要接受原始数据,请在入口脚本中使用 AMLRequest 类,并向 run() 函数添加 @rawhttp 修饰器。To accept raw data, use the AMLRequest class in your entry script and add the @rawhttp decorator to the run() function.

下面是接受二进制数据的 score.py 的示例:Here's an example of a score.py that accepts binary data:

from azureml.contrib.services.aml_request import AMLRequest, rawhttp
from azureml.contrib.services.aml_response import AMLResponse
from PIL import Image
import json


def init():
    print("This is init()")


@rawhttp
def run(request):
    print("This is run()")
    
    if request.method == 'GET':
        # For this example, just return the URL for GETs.
        respBody = str.encode(request.full_path)
        return AMLResponse(respBody, 200)
    elif request.method == 'POST':
        file_bytes = request.files["image"]
        image = Image.open(file_bytes).convert('RGB')
        # For a real-world solution, you would load the data from reqBody
        # and send it to the model. Then return the response.

        # For demonstration purposes, this example just returns the size of the image as the response..
        return AMLResponse(json.dumps(image.size), 200)
    else:
        return AMLResponse("bad request", 500)

重要

AMLRequest 类位于 azureml.contrib 命名空间中。The AMLRequest class is in the azureml.contrib namespace. 此命名空间中的实体会频繁更改,因为我们正在改进服务。Entities in this namespace change frequently as we work to improve the service. 此命名空间中的任何内容都应被视为预览版,Microsoft 并不完全支持这些内容。Anything in this namespace should be considered a preview that's not fully supported by Microsoft.

如果需要在本地开发环境中对此进行测试,可以使用以下命令安装这些组件:If you need to test this in your local development environment, you can install the components by using the following command:

pip install azureml-contrib-services

AMLRequest 类仅允许访问 score.py 中的原始发布数据,没有客户端组件。The AMLRequest class only allows you to access the raw posted data in the score.py, there is no client-side component. 从客户端中,像往常一样发布数据。From a client, you post data as normal. 例如,以下 Python 代码读取图像文件并发布数据:For example, the following Python code reads an image file and posts the data:

import requests

uri = service.scoring_uri
image_path = 'test.jpg'
files = {'image': open(image_path, 'rb').read()}
response = requests.post(url, files=files)

print(response.json)

跨域资源共享 (CORS)Cross-origin resource sharing (CORS)

通过跨源资源共享 (CORS) 可以从其他域请求网页上的资源。Cross-origin resource sharing is a way to allow resources on a webpage to be requested from another domain. CORS 通过 HTTP 标头工作,这些标头通过客户端请求发送并随服务响应返回。CORS works via HTTP headers sent with the client request and returned with the service response. 若要详细了解 CORS 和有效标头,请参阅维基百科上的跨域资源共享 (CORS)For more information on CORS and valid headers, see Cross-origin resource sharing in Wikipedia.

若要配置模型部署以支持 CORS,请在入口脚本中使用 AMLResponse 类。To configure your model deployment to support CORS, use the AMLResponse class in your entry script. 使用此类,可设置响应对象的标头。This class allows you to set the headers on the response object.

以下示例在入口脚本中设置响应的 Access-Control-Allow-Origin 标头:The following example sets the Access-Control-Allow-Origin header for the response from the entry script:

from azureml.contrib.services.aml_request import AMLRequest, rawhttp
from azureml.contrib.services.aml_response import AMLResponse

def init():
    print("This is init()")

@rawhttp
def run(request):
    print("This is run()")
    print("Request: [{0}]".format(request))
    if request.method == 'GET':
        # For this example, just return the URL for GETs.
        respBody = str.encode(request.full_path)
        return AMLResponse(respBody, 200)
    elif request.method == 'POST':
        reqBody = request.get_data(False)
        # For a real-world solution, you would load the data from reqBody
        # and send it to the model. Then return the response.

        # For demonstration purposes, this example
        # adds a header and returns the request body.
        resp = AMLResponse(reqBody, 200)
        resp.headers['Access-Control-Allow-Origin'] = "http://www.example.com"
        return resp
    else:
        return AMLResponse("bad request", 500)

重要

AMLResponse 类位于 azureml.contrib 命名空间中。The AMLResponse class is in the azureml.contrib namespace. 此命名空间中的实体会频繁更改,因为我们正在改进服务。Entities in this namespace change frequently as we work to improve the service. 此命名空间中的任何内容都应被视为预览版,Microsoft 并不完全支持这些内容。Anything in this namespace should be considered a preview that's not fully supported by Microsoft.

如果需要在本地开发环境中对此进行测试,可以使用以下命令安装这些组件:If you need to test this in your local development environment, you can install the components by using the following command:

pip install azureml-contrib-services

警告

Azure 机器学习仅将 POST 和 GET 请求路由到运行评分服务的容器。Azure Machine Learning will route only POST and GET requests to the containers running the scoring service. 这可能导致错误,因为浏览器使用 OPTIONS 请求预检 CORS 请求。This can cause errors due to browsers using OPTIONS requests to pre-flight CORS requests.

加载已注册的模型Load registered models

可以通过两种方法在入口脚本中查找模型:There are two ways to locate models in your entry script:

  • AZUREML_MODEL_DIR:一个包含模型位置路径的环境变量。AZUREML_MODEL_DIR: An environment variable containing the path to the model location.
  • Model.get_model_path:一个 API,使用注册的模型名称返回指向模型文件的路径。Model.get_model_path: An API that returns the path to model file using the registered model name.

AZUREML_MODEL_DIRAZUREML_MODEL_DIR

AZUREML_MODEL_DIR 是在服务部署过程中创建的环境变量。AZUREML_MODEL_DIR is an environment variable created during service deployment. 可以使用此环境变量来查找部署的模型的位置。You can use this environment variable to find the location of the deployed model(s).

下表描述了 AZUREML_MODEL_DIR 的值,它的值取决于部署的模型数:The following table describes the value of AZUREML_MODEL_DIR depending on the number of models deployed:

部署Deployment 环境变量值Environment variable value
单个模型Single model 包含模型的文件夹的路径。The path to the folder containing the model.
多个模型Multiple models 包含所有模型的文件夹的路径。The path to the folder containing all models. 各个模型按名称和版本放置在此文件夹中 ($MODEL_NAME/$VERSION)Models are located by name and version in this folder ($MODEL_NAME/$VERSION)

在模型注册和部署过程中,会将模型放置在 AZUREML_MODEL_DIR 路径中,并保留它们的原始文件名。During model registration and deployment, Models are placed in the AZUREML_MODEL_DIR path, and their original filenames are preserved.

若要在入口脚本中获取某个模型文件的路径,请将此环境变量与要查找的文件路径组合在一起。To get the path to a model file in your entry script, combine the environment variable with the file path you're looking for.

单个模型示例Single model example

# Example when the model is a file
model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_regression_model.pkl')

# Example when the model is a folder containing a file
file_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'my_model_folder', 'sklearn_regression_model.pkl')

多个模型示例Multiple model example

在此方案中,向工作区注册两个模型:In this scenario, two models are registered with the workspace:

  • my_first_model:包含一个文件 (my_first_model.pkl),并且只有一个版本 (1)。my_first_model: Contains one file (my_first_model.pkl) and there is only one version (1).
  • my_second_model:包含一个文件 (my_second_model.pkl),有两个版本;12my_second_model: Contains one file (my_second_model.pkl) and there are two versions; 1 and 2.

部署服务后,部署操作中将同时提供两种模型:When the service was deployed, both models are provided in the deploy operation:

first_model = Model(ws, name="my_first_model", version=1)
second_model = Model(ws, name="my_second_model", version=2)
service = Model.deploy(ws, "myservice", [first_model, second_model], inference_config, deployment_config)

在托管服务的 Docker 映像中,AZUREML_MODEL_DIR 环境变量包含模型所在的目录。In the Docker image that hosts the service, the AZUREML_MODEL_DIR environment variable contains the directory where the models are located. 在此目录中,每个模型都位于 MODEL_NAME/VERSION 的目录路径中。In this directory, each of the models is located in a directory path of MODEL_NAME/VERSION. 其中 MODEL_NAME 是已注册的模型的名称,VERSION 是模型的版本。Where MODEL_NAME is the name of the registered model, and VERSION is the version of the model. 构成已注册的模型的文件存储在这些目录中。The files that make up the registered model are stored in these directories.

在此示例中,路径将是 $AZUREML_MODEL_DIR/my_first_model/1/my_first_model.pkl$AZUREML_MODEL_DIR/my_second_model/2/my_second_model.pklIn this example, the paths would be $AZUREML_MODEL_DIR/my_first_model/1/my_first_model.pkl and $AZUREML_MODEL_DIR/my_second_model/2/my_second_model.pkl.

# Example when the model is a file, and the deployment contains multiple models
first_model_name = 'my_first_model'
first_model_version = '1'
first_model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), first_model_name, first_model_version, 'my_first_model.pkl')
second_model_name = 'my_second_model'
second_model_version = '2'
second_model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), second_model_name, second_model_version, 'my_second_model.pkl')

get_model_pathget_model_path

注册模型时,请提供用于在注册表中管理该模型的模型名称。When you register a model, you provide a model name that's used for managing the model in the registry. 将此名称与 Model.get_model_path() 方法结合使用,以检索本地文件系统上一个或多个模型文件的路径。You use this name with the Model.get_model_path() method to retrieve the path of the model file or files on the local file system. 如果注册文件夹或文件集合,此 API 会返回包含这些文件的目录的路径。If you register a folder or a collection of files, this API returns the path of the directory that contains those files.

注册模型时,请为其指定一个名称。When you register a model, you give it a name. 该名称对应于模型的放置位置(本地位置或在服务部署过程中指定的位置)。The name corresponds to where the model is placed, either locally or during service deployment.

特定于框架的示例Framework-specific examples

有关特定机器学习用例的更多入口脚本示例,可参阅以下内容:More entry script examples for specific machine learning use cases can be found below:

后续步骤Next steps