分布式图像模型推理的参考解决方案Reference solution for distributed image model inference

本文及其随附的笔记本介绍了一个分布式图像模型推理的参考解决方案,该方案基于许多真实图像应用程序共享的通用设置。This article and its accompanying notebooks describe a reference solution for distributed image model inference based on a common setup shared by many real-world image applications. 此设置假定在对象存储区中存储多个图像。This setup assumes that you store many images in an object store. 假设有多个经过训练的深度学习 (DL) 模型用于图像分类和对象检测(例如,MobileNetV2 用于检测用户上传的照片中的人类对象,以帮助保护隐私),并且你希望将这些 DL 模型应用于存储的图像。Suppose you have several trained deep learning (DL) models for image classification and object detection—for example, MobileNetV2 for detecting human objects in user-uploaded photos to help protect privacy—and you want to apply these DL models to the stored images.

你可以重新训练模型并更新以前计算得出的预测。You might re-train the models and update previously computed predictions. 然而,加载许多图像和应用 DL 模型需要大量的 I/O 和大量的计算。However, it is both I/O-heavy and compute-heavy to load many images and apply DL models. 幸运的是,推理工作负载是易并行的,理论上可以轻松地进行分布。Fortunately, the inference workload is embarrassingly parallel and in theory can be distributed easily. 本指南将指导你完成一个包含两个主要阶段的实用解决方案:This guide walks you through a practical solution that contains two major stages:

  1. ETL 图像到 Delta 表中。ETL images into a Delta table. 专用 ETL 作业可帮助管理数据并简化推理任务。A dedicated ETL job helps data management and simplifies the inference task.
  2. 使用 pandas UDF 执行分布式推理。Perform distributed inference using pandas UDF.

笔记本Notebooks

以下笔记本使用已安装的 PyTorch 和 TensorFlow tf.Keras 演示参考解决方案。The following notebooks use the installed PyTorch and TensorFlow tf.Keras to demonstrate the reference solution.

ETL 图像数据集到 Delta 表笔记本中ETL image dataset into a Delta table notebook

获取笔记本Get notebook

通过 Pytorch 和 pandas UDF 笔记本进行的分布式推理Distributed inference via Pytorch and pandas UDF notebook

获取笔记本Get notebook

通过 Keras 和 pandas UDF 笔记本进行的分布式推理Distributed inference via Keras and pandas UDF notebook

获取笔记本Get notebook