使用 VS Code 调试 Kusto 查询语言内联 PythonDebug Kusto query language inline Python using VS Code

Azure 数据资源管理器支持使用 python() 插件运行 Kusto 查询语言中嵌入的 Python 代码。Azure Data Explorer supports running Python code embedded in Kusto query language using the python() plugin. 插件运行时托管在沙盒中,这是一个隔离的安全 Python 环境。The plugin runtime is hosted in a sandbox, an isolated and secure Python environment. python() 插件功能扩展了 Kusto 查询语言的本机功能,并提供了大量的 OSS Python 包。The python() plugin capability extends Kusto query language native functionalities with the huge archive of OSS Python packages. 使用此扩展,你可以在查询中运行高级算法,例如机器学习、人工智能、统计和时间系列。This extension enables you to run advanced algorithms, such as machine learning, artificial intelligence, statistical, and time series as part of the query.

Kusto 查询语言工具不适用于开发和调试 Python 算法。Kusto query language tools aren't convenient for developing and debugging Python algorithms. 因此,请在你喜欢使用的 Python 集成开发环境(例如 Jupyter、PyCharm、VS 或 VS Code)中开发算法。Therefore, develop the algorithm on your favorite Python-integrated development environment such as Jupyter, PyCharm, VS, or VS Code. 完成算法后,复制并粘贴到 KQL 中。When the algorithm is complete, copy and paste into KQL. 为了改进和简化此工作流,Azure 数据资源管理器支持在 Kusto Explorer 或 Web UI 客户端与 VS Code 之间进行集成,以便创作和调试 KQL 内联 Python 代码。To improve and streamline this workflow, Azure Data Explorer supports integration between Kusto Explorer or Web UI clients and VS Code for authoring and debugging KQL inline Python code.

备注

此工作流仅可用来调试相对较小的输入表(最大为几 MB)。This workflow can only be used to debug relatively small input tables (up to few MB). 因此,你可能需要限制用于调试的输入。Therefore, you may need to limit the input for debugging. 如果需要处理大型表,请使用 | take| samplewhere rand() < 0.x 限制对该表的调试。If you need to process a large table, limit it for debugging using | take, | sample, or where rand() < 0.x.

先决条件Prerequisites

  1. 安装 Python Anaconda 分发版Install Python Anaconda Distribution. 在“高级选项”中,选择“将 Anaconda 添加到我的 PATH 环境变量”。In Advanced Options, select Add Anaconda to my PATH environment variable.
  2. 安装 Visual Studio CodeInstall Visual Studio Code
  3. 安装适用于 Visual Studio Code 的 Python 扩展Install Python extension for Visual Studio Code.

在客户端应用程序中运行查询Run your query in your client application

  1. 在客户端应用程序中,为包含内联 Python 的查询添加 set query_python_debug; 前缀In your client application, prefix a query containing inline Python with set query_python_debug;

  2. 运行该查询。Run the query.

    • Kusto Explorer:VS Code 是通过 debug_python.py 脚本自动启动的。Kusto Explorer: VS Code is automatically launched with the debug_python.py script.
    • Kusto Web UI:Kusto Web UI:
      1. 下载并保存 debug_python.pydf.txtkargs.txtDownload and save debug_python.py, df.txt, and kargs.txt. 在窗口中,选择“允许”。In window, select Allow. 将文件保存到所选目录中。Save files in selected directory.

        Web UI 将下载内联 python 文件

      2. 右键单击 debug_python.py 并使用 VS code 将其打开。Right-click debug_python.py and open with VS code. debug_python.py 脚本包含来自 KQL 查询的内联 Python 代码(以模板代码作为前缀,用以初始化来自 df.txt 的数据帧和来自 kargs.txt 的参数字典)。The debug_python.py script contains the inline Python code, from the KQL query, prefixed by the template code to initialize the input dataframe from df.txt and the dictionary of parameters from kargs.txt.

  3. 在 VS code 中,启动 VS code 调试器:“调试” > “启动调试(F5)”,选择 Python 配置。In VS code, launch the VS code debugger: Debug > Start Debugging (F5), select Python configuration. 调试器将启动,并自动设置断点来调试内联代码。The debugger will launch and automatically breakpoint to debug the inline code.

VS Code 中的内联 Python 调试如何工作?How does inline Python debugging in VS Code work?

  1. 查询将在服务器中进行分析和执行,直至到达所需的 | evaluate python() 子句。The query is parsed and executed in the server until the required | evaluate python() clause is reached.
  2. 将调用 Python 沙盒,但不运行代码,而是对输入表、参数字典和代码进行序列化,然后将其发送回客户端。The Python sandbox is invoked but instead of running the code, it serializes the input table, the dictionary of parameters, and the code, and sends them back to the client.
  3. 这三个对象保存在三个文件中:df.txtkargs.txtdebug_python.py,这些文件位于所选目录 (Web UI) 或客户端 %TEMP% 目录 (Kusto Explorer) 中。These three objects are saved in three files: df.txt, kargs.txt, and debug_python.py in the selected directory (Web UI) or in the client %TEMP% directory (Kusto Explorer).
  4. VS code 将启动并预加载 debug_python.py 文件,该文件包含一个前缀代码,用以初始化 df 和 kargs(分别来自它们对应的文件),前缀代码后面是嵌入在 KQL 查询中的 Python 脚本。VS code is launched, preloaded with the debug_python.py file that contains a prefix code to initialize df and kargs from their respective files, followed by the Python script embedded in the KQL query.

查询示例Query example

  1. 在客户端应用程序中运行以下 KQL 查询:Run the following KQL query in your client application:

    range x from 1 to 4 step 1
    | evaluate python(typeof(*, x4:int), 
    'exp = kargs["exp"]\n'
    'result = df\n'
    'result["x4"] = df["x"].pow(exp)\n'
    , pack('exp', 4))
    

    查看生成的表:See the resulting table:

    xx x4x4
    11 11
    22 1616
    33 8181
    44 256256
  2. 使用 set query_python_debug; 在客户端应用程序中运行相同的 KQL 查询:Run the same KQL query in your client application using set query_python_debug;:

    set query_python_debug;
    range x from 1 to 4 step 1
    | evaluate python(typeof(*, x4:int), 
    'exp = kargs["exp"]\n'
    'result = df\n'
    'result["x4"] = df["x"].pow(exp)\n'
    , pack('exp', 4))
    
  3. VS Code 将启动:VS Code is launched:

    启动 VS Code

  4. VS Code 将进行调试并在调试控制台中输出“result”数据帧:VS Code debugs and prints 'result' dataframe in the debug console:

    VS Code 调试

备注

Python 沙盒映像与你的本地安装之间可能存在差异。There may be differences between the Python sandbox image and your local installation. 通过对插件进行查询来检查特定包的沙盒映像Check the sandbox image for specific packages by querying the plugin.