使用 ONNX 在 Azure SQL Edge 上部署 ML 模型Deploy ML model on Azure SQL Edge using ONNX

本 Azure SQL Edge 铁矿石杂质预测教程由三部分组成,在第三部分中,你将:In part three of this three-part tutorial for predicting iron ore impurities in Azure SQL Edge, you'll:

  1. 使用 Azure Data Studio 连接到 Azure SQL Edge 实例中的 SQL 数据库。Use Azure Data Studio to connect to SQL Database in the Azure SQL Edge instance.
  2. 在 Azure SQL Edge 中通过 ONNX 预测铁矿石杂质。Predict iron ore impurities with ONNX in Azure SQL Edge.

连接到 Azure SQl Edge 实例中的 SQL 数据库Connect to the SQL Database in the Azure SQl Edge instance

  1. 打开 Azure Data Studio。Open Azure Data Studio.

  2. 在“欢迎使用”选项卡中,使用以下详细信息启动一个新连接:In the Welcome tab, start a new connection with the following details:

    字段Field Value
    连接类型Connection type Microsoft SQL ServerMicrosoft SQL Server
    服务器Server 为此演示创建的 VM 中提及的公共 IP 地址Public IP address mentioned in the VM that was created for this demo
    用户名Username sasa
    密码Password 创建 Azure SQL Edge 实例时使用的强密码The strong password that was used while creating the Azure SQL Edge instance
    数据库Database 默认Default
    服务器组Server group 默认Default
    名称(可选)Name (optional) 提供可选名称Provide an optional name
  3. 单击“连接”Click Connect

  4. 在“文件”部分中,打开一个新笔记本或使用键盘快捷方式 Alt + Windows + N。In the File section, open a new notebook or use the keyboard shortcut Alt + Windows + N.

  5. 将内核设置为 Python 3。Set the kernel to Python 3.

通过 ONNX 预测铁矿石杂质Predict iron ore impurities with ONNX

在 Azure Data Studio 笔记本中输入以下 Python 代码,然后运行该代码。Enter the following Python code in the Azure Data Studio notebook and run it.

  1. 首先,安装并导入所需的包。First, install and import the required packages.

    !pip install azureml.core -q
    !pip install azureml.train.automl -q
    !pip install matplotlib -q
    !pip install pyodbc -q
    !pip install spicy -q
    
    import logging
    from matplotlib import pyplot as plt
    import numpy as np
    import pandas as pd
    import pyodbc
    
    from scipy import stats
    from scipy.stats import skew #for some statistics
    
    import azureml.core
    from azureml.core.experiment import Experiment
    from azureml.core.workspace import Workspace
    from azureml.train.automl import AutoMLConfig
    from azureml.train.automl import constants
    
  2. 为回归试验定义 Azure AutoML 工作区和 AutoML 试验配置。Define the Azure AutoML workspace and AutoML experiment configuration for the regression experiment.

    ws = Workspace(subscription_id="<Azure Subscription ID>",
                   resource_group="<resource group name>",
                   workspace_name="<ML workspace name>")
    # Choose a name for the experiment.
    experiment_name = 'silic_percent2-Forecasting-onnx'
    experiment = Experiment(ws, experiment_name)
    
  3. 将数据集导入 panda 帧。Import the dataset into a panda frame. 出于模型定型的目的,请使用来自 Kaggle 的定型数据集挖掘进程中的质量预测For the purpose of the model training, use the training data set Quality Prediction in a Mining Process from Kaggle. 下载数据文件,然后将其保存到本地开发计算机上。Download the data file and save it locally on your development machine. 你将使用此数据来预测精矿中的杂质含量。You'll use this data to predict how much impurity is in the ore concentrate.

    df = pd.read_csv("<local path where you have saved the data file>",decimal=",",parse_dates=["date"],infer_datetime_format=True)
    df = df.drop(['date'],axis=1)
    df.describe()
    
  4. 分析数据以确定任何偏度。Analyze the data to identify any skewness. 在此过程中,查看数据帧中每列的分布和倾斜信息。During this process, look at the distribution and the skew information for each of the columns in the data frame.

    ## We can use a histogram chart to view the data distribution for the Dataset. In this example, we are looking at the histogram for the "% Silica Concentrate" 
    ## and the "% Iron Feed". From the histogram, you'll notice the data distribution is skewed for most of the features in the dataset. 
    
    f, (ax1,ax2,ax3) = plt.subplots(1,3)
    ax1.hist(df['% Iron Feed'], bins='auto')
    #ax1.title = 'Iron Feed'
    ax2.hist(df['% Silica Concentrate'], bins='auto')
    #ax2.title = 'Silica Concentrate'
    ax3.hist(df['% Silica Feed'], bins='auto')
    #ax3.title = 'Silica Feed'
    
  5. 检查并修复数据的偏度级别。Check and fix the level of skewness in the data.

    ##Check data skewness with the skew or the kurtosis function in spicy.stats
    ##Skewness using the spicy.stats skew function
    for i in list(df):
            print('Skew value for column "{0}" is: {1}'.format(i,skew(df[i])))
    
    #Fix the Skew using Box Cox Transform
    from scipy.special import boxcox1p
    for i in list(df):
        if(abs(skew(df[i])) >= 0.20):
            #print('found skew in column - {0}'.format(i))
            df[i] = boxcox1p(df[i], 0.10)
            print('Skew value for column "{0}" is: {1}'.format(i,skew(df[i])))
    
  6. 检查其他功能与预测功能的关联性。Check the correlation of other features with the prediction feature. 如果关联不高,则删除这些功能。If the correlation is not high, remove those features.

    silic_corr = df.corr()['% Silica Concentrate']
    silic_corr = abs(silic_corr).sort_values()
    drop_index= silic_corr.index[:8].tolist()
    df = df.drop(drop_index, axis=1)
    df.describe()
    
  7. 启动 AzureML 试验,查找并训练最佳算法。Start the AzureML experiment to find and train the best algorithm. 在这种情况下,将使用所有回归算法进行测试,其主要指标为标准均方根误差 (NRMSE)。In this case, you're testing with all the regression algorithms, with a primary metric of Normalized Root Mean Squared Error (NRMSE). 有关详细信息,请参阅 Azure ML 试验主要指标For more information, see Azure ML Experiments Primary Metric. 以下代码将启动在本地运行 ML 试验。The following code will start a local run of the ML experiment.

    ## Define the X_train and the y_train data sets for the AutoML experiments. X_Train are the inputs or the features, while y_train is the outcome or the prediction result. 
    
    y_train = df['% Silica Concentrate']
    x_train = df.iloc[:,0:-1]
    automl_config = AutoMLConfig(task = 'regression',
                                 primary_metric = 'normalized_root_mean_squared_error',
                                 iteration_timeout_minutes = 60,
                                 iterations = 10,                        
                                 X = x_train, 
                                 y = y_train,
                                 featurization = 'off',
                                 enable_onnx_compatible_models=True)
    
    local_run = experiment.submit(automl_config, show_output = True)
    best_run, onnx_mdl = local_run.get_output(return_onnx_model=True)
    
  8. 在 Azure SQL Edge 数据库中加载用于本地评分的模型。Load the model in Azure SQL Edge database for local scoring.

    ## Load the Model into a SQL Database.
    ## Define the Connection string parameters. These connection strings will be used later also in the demo.
    server = '<SQL Server IP address>'
    username = 'sa' # SQL Server username
    password = '<SQL Server password>'
    database = 'IronOreSilicaPrediction'
    db_connection_string = "Driver={ODBC Driver 17 for SQL Server};Server=" + server + ";Database=" + database + ";UID=" + username + ";PWD=" + password + ";"
    conn = pyodbc.connect(db_connection_string, autocommit=True)
    cursor = conn.cursor()
    
    # Insert the ONNX model into the models table
    query = f"insert into models ([description], [data]) values ('Silica_Percentage_Predict_Regression_NRMSE_New1',?)"
    model_bits = onnx_mdl.SerializeToString()
    insert_params  = (pyodbc.Binary(model_bits))
    cursor.execute(query, insert_params)
    conn.commit()
    cursor.close()
    conn.close()
    
  9. 最后,使用 Azure SQL Edge 模型通过定型模型执行预测。Finally, use the Azure SQL Edge model to perform predictions using the trained model.

    ## Define the Connection string parameters. These connection strings will be used later also in the demo.
    server = '<SQL Server IP address>'
    username = 'sa' # SQL Server username
    password = '<SQL Server password>'
    database = 'IronOreSilicaPrediction'
    db_connection_string = "Driver={ODBC Driver 17 for SQL Server};Server=" + server + ";Database=" + database + ";UID=" + username + ";PWD=" + password + ";"
    conn = pyodbc.connect(db_connection_string, autocommit=True)
    #cursor = conn.cursor()
    query = \
            f'declare @model varbinary(max) = (Select [data] from [dbo].[Models] where [id] = 1);' \
            f' with d as ( SELECT  [timestamp] ,cast([cur_Iron_Feed] as real) [__Iron_Feed] ,cast([cur_Silica_Feed]  as real) [__Silica_Feed]' \
            f',cast([cur_Starch_Flow] as real) [Starch_Flow],cast([cur_Amina_Flow] as real) [Amina_Flow]' \
            f' ,cast([cur_Ore_Pulp_pH] as real) [Ore_Pulp_pH] ,cast([cur_Flotation_Column_01_Air_Flow] as real) [Flotation_Column_01_Air_Flow]' \
            f' ,cast([cur_Flotation_Column_02_Air_Flow] as real) [Flotation_Column_02_Air_Flow]' \
            f' ,cast([cur_Flotation_Column_03_Air_Flow] as real) [Flotation_Column_03_Air_Flow]' \
            f' ,cast([cur_Flotation_Column_07_Air_Flow] as real) [Flotation_Column_07_Air_Flow]' \
            f' ,cast([cur_Flotation_Column_04_Level] as real) [Flotation_Column_04_Level]' \
            f' ,cast([cur_Flotation_Column_05_Level] as real) [Flotation_Column_05_Level]' \
            f' ,cast([cur_Flotation_Column_06_Level] as real) [Flotation_Column_06_Level]' \
            f' ,cast([cur_Flotation_Column_07_Level] as real) [Flotation_Column_07_Level]' \
            f' ,cast([cur_Iron_Concentrate] as real) [__Iron_Concentrate]' \
            f' FROM [dbo].[IronOreMeasurements1]' \
            f' where timestamp between dateadd(hour,-1,getdate()) and getdate()) ' \
            f' SELECT d.*, p.variable_out1' \
            f' FROM PREDICT(MODEL = @model, DATA = d) WITH(variable_out1 numeric(25,17)) as p;' 
    
    df_result = pd.read_sql(query,conn)
    df_result.describe()
    
  10. 使用 Python,创建相对于铁料、日期时间和硅料的预测二氧化硅百分比图表。Using Python, create a chart of the predicted silica percentage against the iron feed, datetime, and silica feed.

    import plotly.graph_objects as go
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=df_result['timestamp'],y=df_result['__Iron_Feed'],mode='lines+markers',name='Iron Feed',line=dict(color='firebrick', width=2)))
    fig.add_trace(go.Scatter(x=df_result['timestamp'],y=df_result['__Silica_Feed'],mode='lines+markers',name='Silica Feed',line=dict(color='green', width=2)))
    fig.add_trace(go.Scatter(x=df_result['timestamp'],y=df_result['variable_out1'],mode='lines+markers',name='Silica Percent',line=dict(color='royalblue', width=3)))
    fig.update_layout(height= 600, width=1500,xaxis_title='Time')
    fig.show()
    

后续步骤Next steps

有关在 Azure SQL Edge 中使用 ONNX 模型的详细信息,请参阅在 SQL Edge 中使用 ONNX 实现机器学习和 AIFor more information on using ONNX models in Azure SQL Edge, see Machine learning and AI with ONNX in SQL Edge.