使用 ONNX 模型和 SQL 机器学习部署和预测Deploy and make predictions with an ONNX model and SQL machine learning

在本快速入门中,你将了解如何训练模型,将其转换为 ONNX 并部署到 Azure SQL Edge,然后使用上传的 ONNX 模型对数据运行本机 PREDICT。In this quickstart, you'll learn how to train a model, convert it to ONNX, deploy it to Azure SQL Edge, and then run native PREDICT on data using the uploaded ONNX model.

本快速入门基于 scikit-learn 并使用 Boston Housing 数据集This quickstart is based on scikit-learn and uses the Boston Housing dataset.

开始之前Before you begin

  • 如果使用的是 Azure SQL Edge,但尚未部署 Azure SQL Edge 模块,请按照使用 Azure 门户部署 SQL Edge 的步骤进行操作。If you're using Azure SQL Edge, and you haven't deployed an Azure SQL Edge module, follow the steps of deploy SQL Edge using the Azure portal.

  • 安装 Azure Data StudioInstall Azure Data Studio.

  • 安装本快速入门所需的 Python 包:Install Python packages needed for this quickstart:

    1. 打开连接到 Python 3 Kernel 的新笔记本Open New Notebook connected to the Python 3 Kernel.
    2. 单击“管理包”Click Manage Packages
    3. 在“已安装”选项卡中,在已安装包列表中查找以下 Python 包。In the Installed tab, look for the following Python packages in the list of installed packages. 如果未安装这些包中的任何一个,请选择“添加新包”选项卡,搜索该包,然后单击“安装” 。If any of these packages are not installed, select the Add New tab, search for the package, and click Install .
      • scikit-learnscikit-learn
      • numpynumpy
      • onnxmltoolsonnxmltools
      • onnxruntimeonnxruntime
      • pyodbcpyodbc
      • setuptoolssetuptools
      • skl2onnxskl2onnx
      • sqlalchemysqlalchemy
  • 对于下面的每个脚本部分,请将其输入到 Azure Data Studio 笔记本的单元格中,然后运行单元格。For each script part below, enter it in a cell in the Azure Data Studio notebook and run the cell.

训练管道Train a pipeline

拆分数据集,以使用功能来预测房子的中值。Split the dataset to use features to predict the median value of a house.

import numpy as np
import onnxmltools
import onnxruntime as rt
import pandas as pd
import skl2onnx
import sklearn
import sklearn.datasets

from sklearn.datasets import load_boston
boston = load_boston()
boston

df = pd.DataFrame(data=np.c_[boston['data'], boston['target']], columns=boston['feature_names'].tolist() + ['MEDV'])
 
target_column = 'MEDV'
 
# Split the data frame into features and target
x_train = pd.DataFrame(df.drop([target_column], axis = 1))
y_train = pd.DataFrame(df.iloc[:,df.columns.tolist().index(target_column)])

print("\n*** Training dataset x\n")
print(x_train.head())

print("\n*** Training dataset y\n")
print(y_train.head())

输出Output :

*** Training dataset x

        CRIM    ZN  INDUS  CHAS    NOX     RM   AGE     DIS  RAD    TAX  \
0  0.00632  18.0   2.31   0.0  0.538  6.575  65.2  4.0900  1.0  296.0
1  0.02731   0.0   7.07   0.0  0.469  6.421  78.9  4.9671  2.0  242.0
2  0.02729   0.0   7.07   0.0  0.469  7.185  61.1  4.9671  2.0  242.0
3  0.03237   0.0   2.18   0.0  0.458  6.998  45.8  6.0622  3.0  222.0
4  0.06905   0.0   2.18   0.0  0.458  7.147  54.2  6.0622  3.0  222.0

    PTRATIO       B  LSTAT  
0     15.3  396.90   4.98  
1     17.8  396.90   9.14  
2     17.8  392.83   4.03  
3     18.7  394.63   2.94  
4     18.7  396.90   5.33  

*** Training dataset y

0    24.0
1    21.6
2    34.7
3    33.4
4    36.2
Name: MEDV, dtype: float64

创建一个管道来训练 LinearRegression 模型。Create a pipeline to train the LinearRegression model. 还可以使用其他回归模型。You can also use other regression models.

from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler

continuous_transformer = Pipeline(steps=[('scaler', RobustScaler())])

# All columns are numeric - normalize them
preprocessor = ColumnTransformer(
    transformers=[
        ('continuous', continuous_transformer, [i for i in range(len(x_train.columns))])])

model = Pipeline(
    steps=[
        ('preprocessor', preprocessor),
        ('regressor', LinearRegression())])

# Train the model
model.fit(x_train, y_train)

检查模型的准确性,然后计算 R2 评分和均方误差。Check the accuracy of the model and then calculate the R2 score and mean squared error.

# Score the model
from sklearn.metrics import r2_score, mean_squared_error
y_pred = model.predict(x_train)
sklearn_r2_score = r2_score(y_train, y_pred)
sklearn_mse = mean_squared_error(y_train, y_pred)
print('*** Scikit-learn r2 score: {}'.format(sklearn_r2_score))
print('*** Scikit-learn MSE: {}'.format(sklearn_mse))

输出Output :

*** Scikit-learn r2 score: 0.7406426641094094
*** Scikit-learn MSE: 21.894831181729206

将模型转换为 ONNXConvert the model to ONNX

将数据类型转换为支持的 SQL 数据类型。Convert the data types to the supported SQL data types. 还需要对其他数据帧进行此转换。This conversion will be required for other dataframes as well.

from skl2onnx.common.data_types import FloatTensorType, Int64TensorType, DoubleTensorType

def convert_dataframe_schema(df, drop=None, batch_axis=False):
    inputs = []
    nrows = None if batch_axis else 1
    for k, v in zip(df.columns, df.dtypes):
        if drop is not None and k in drop:
            continue
        if v == 'int64':
            t = Int64TensorType([nrows, 1])
        elif v == 'float32':
            t = FloatTensorType([nrows, 1])
        elif v == 'float64':
            t = DoubleTensorType([nrows, 1])
        else:
            raise Exception("Bad type")
        inputs.append((k, t))
    return inputs

使用 skl2onnx,将 LinearRegression 模型转换为 ONNX 格式,并将其保存在本地。Using skl2onnx, convert the LinearRegression model to the ONNX format and save it locally.

# Convert the scikit model to onnx format
onnx_model = skl2onnx.convert_sklearn(model, 'Boston Data', convert_dataframe_schema(x_train), final_types=[('variable1',FloatTensorType([1,1]))])
# Save the onnx model locally
onnx_model_path = 'boston1.model.onnx'
onnxmltools.utils.save_model(onnx_model, onnx_model_path)

测试 ONNX 模型Test the ONNX model

将模型转换为 ONNX 格式后,对模型进行评分,以显示性能几乎没有下降。After converting the model to ONNX format, score the model to show little to no degradation in performance.

备注

ONNX 运行时使用浮点数而不是双精度数,因此可能会出现小差异。ONNX Runtime uses floats instead of doubles so small discrepancies are possible.

import onnxruntime as rt
sess = rt.InferenceSession(onnx_model_path)

y_pred = np.full(shape=(len(x_train)), fill_value=np.nan)

for i in range(len(x_train)):
    inputs = {}
    for j in range(len(x_train.columns)):
        inputs[x_train.columns[j]] = np.full(shape=(1,1), fill_value=x_train.iloc[i,j])

    sess_pred = sess.run(None, inputs)
    y_pred[i] = sess_pred[0][0][0]

onnx_r2_score = r2_score(y_train, y_pred)
onnx_mse = mean_squared_error(y_train, y_pred)

print()
print('*** Onnx r2 score: {}'.format(onnx_r2_score))
print('*** Onnx MSE: {}\n'.format(onnx_mse))
print('R2 Scores are equal' if sklearn_r2_score == onnx_r2_score else 'Difference in R2 scores: {}'.format(abs(sklearn_r2_score - onnx_r2_score)))
print('MSE are equal' if sklearn_mse == onnx_mse else 'Difference in MSE scores: {}'.format(abs(sklearn_mse - onnx_mse)))
print()

输出Output :

*** Onnx r2 score: 0.7406426691136831
*** Onnx MSE: 21.894830759270633

R2 Scores are equal
MSE are equal

插入 ONNX 模型Insert the ONNX model

将 Azure SQL Edge 或 Azure SQL 托管实例中的模型存储在 onnx 数据库的 models 表中。Store the model in Azure SQL Edge or Azure SQL Managed Instance, in a models table in a database onnx. 在连接字符串中,指定“服务器地址”、“用户名”和“密码”。In the connection string, specify the server address , username , and password .

import pyodbc

server = '' # SQL Server IP address
username = '' # SQL Server username
password = '' # SQL Server password

# Connect to the master DB to create the new onnx database
connection_string = "Driver={ODBC Driver 17 for SQL Server};Server=" + server + ";Database=master;UID=" + username + ";PWD=" + password + ";"

conn = pyodbc.connect(connection_string, autocommit=True)
cursor = conn.cursor()

database = 'onnx'
query = 'DROP DATABASE IF EXISTS ' + database
cursor.execute(query)
conn.commit()

# Create onnx database
query = 'CREATE DATABASE ' + database
cursor.execute(query)
conn.commit()

# Connect to onnx database
db_connection_string = "Driver={ODBC Driver 17 for SQL Server};Server=" + server + ";Database=" + database + ";UID=" + username + ";PWD=" + password + ";"

conn = pyodbc.connect(db_connection_string, autocommit=True)
cursor = conn.cursor()

table_name = 'models'

# Drop the table if it exists
query = f'drop table if exists {table_name}'
cursor.execute(query)
conn.commit()

# Create the model table
query = f'create table {table_name} ( ' \
    f'[id] [int] IDENTITY(1,1) NOT NULL, ' \
    f'[data] [varbinary](max) NULL, ' \
    f'[description] varchar(1000))'
cursor.execute(query)
conn.commit()

# Insert the ONNX model into the models table
query = f"insert into {table_name} ([description], [data]) values ('Onnx Model',?)"

model_bits = onnx_model.SerializeToString()

insert_params  = (pyodbc.Binary(model_bits))
cursor.execute(query, insert_params)
conn.commit()

加载数据Load the data

将数据加载到 SQL 中。Load the data into SQL.

首先,创建两个表,“功能”和“目标”,以存储 Boston housing 数据集的子集。First, create two tables, features and target , to store subsets of the Boston housing dataset.

  • “功能”包含用于预测目标中值的所有数据。Features contains all data being used to predict the target, median value.
  • “目标”包含数据集中每个记录的中值。Target contains the median value for each record in the dataset.
import sqlalchemy
from sqlalchemy import create_engine
import urllib

db_connection_string = "Driver={ODBC Driver 17 for SQL Server};Server=" + server + ";Database=" + database + ";UID=" + username + ";PWD=" + password + ";"

conn = pyodbc.connect(db_connection_string)
cursor = conn.cursor()

features_table_name = 'features'

# Drop the table if it exists
query = f'drop table if exists {features_table_name}'
cursor.execute(query)
conn.commit()

# Create the features table
query = \
    f'create table {features_table_name} ( ' \
    f'    [CRIM] float, ' \
    f'    [ZN] float, ' \
    f'    [INDUS] float, ' \
    f'    [CHAS] float, ' \
    f'    [NOX] float, ' \
    f'    [RM] float, ' \
    f'    [AGE] float, ' \
    f'    [DIS] float, ' \
    f'    [RAD] float, ' \
    f'    [TAX] float, ' \
    f'    [PTRATIO] float, ' \
    f'    [B] float, ' \
    f'    [LSTAT] float, ' \
    f'    [id] int)'

cursor.execute(query)
conn.commit()

target_table_name = 'target'

# Create the target table
query = \
    f'create table {target_table_name} ( ' \
    f'    [MEDV] float, ' \
    f'    [id] int)'

x_train['id'] = range(1, len(x_train)+1)
y_train['id'] = range(1, len(y_train)+1)

print(x_train.head())
print(y_train.head())

最后,使用 sqlalchemyx_trainy_train pandas 数据帧分别插入表 featurestargetFinally, use sqlalchemy to insert the x_train and y_train pandas dataframes into the tables features and target, respectively.

db_connection_string = 'mssql+pyodbc://' + username + ':' + password + '@' + server + '/' + database + '?driver=ODBC+Driver+17+for+SQL+Server'
sql_engine = sqlalchemy.create_engine(db_connection_string)
x_train.to_sql(features_table_name, sql_engine, if_exists='append', index=False)
y_train.to_sql(target_table_name, sql_engine, if_exists='append', index=False)

现在可以查看数据库中的数据了。Now you can view the data in the database.

使用 ONNX 模型运行 PREDICTRun PREDICT using the ONNX model

通过 SQL 中的模型,使用上传的 ONNX 模型对数据运行本机 PREDICT。With the model in SQL, run native PREDICT on the data using the uploaded ONNX model.

备注

将笔记本内核更改为 SQL,以运行剩余单元格。Change the notebook kernel to SQL to run the remaining cell.

USE onnx

DECLARE @model VARBINARY(max) = (
        SELECT DATA
        FROM dbo.models
        WHERE id = 1
        );

WITH predict_input
AS (
    SELECT TOP (1000) [id]
        , CRIM
        , ZN
        , INDUS
        , CHAS
        , NOX
        , RM
        , AGE
        , DIS
        , RAD
        , TAX
        , PTRATIO
        , B
        , LSTAT
    FROM [dbo].[features]
    )
SELECT predict_input.id
    , p.variable1 AS MEDV
FROM PREDICT(MODEL = @model, DATA = predict_input, RUNTIME=ONNX) WITH (variable1 FLOAT) AS p;

后续步骤Next Steps