用于迁移学习的特征化 Featurization for transfer learning

Azure Databricks 支持通过深度学习模型实现的特征化。Azure Databricks supports featurization with deep learning models. 可以通过预先训练的深度学习模型计算将要在其他下游模型中使用的特征。Pre-trained deep learning models may be used to compute features for use in other downstream models. Azure Databricks 支持大规模的特征化,在整个群集中分配计算。Azure Databricks supports featurization at scale, distributing the computation across a cluster. 可以使用 Databricks Runtime ML 中包含的深度学习库(包括 TensorFlow 和 PyTorch)执行特征化。You can perform featurization with deep learning libraries included in Databricks Runtime ML, including TensorFlow and PyTorch.

Azure Databricks 还支持迁移学习,一种与特征化密切相关的技术。Azure Databricks also supports transfer learning, a technique closely related to featurization. 可以通过迁移学习重复使用相关域中某个问题域的知识。Transfer learning allows you to reuse knowledge from one problem domain in a related domain. 特征化本身是一种简单但强大的适用于迁移学习的方法:使用预先训练的深度学习模型来计算特征可以将良好特征的相关知识从原始域迁移出来。Featurization is itself a simple and powerful method for transfer learning: computing features using a pre-trained deep learning model transfers knowledge about good features from the original domain.

本文演示了如何使用以下工作流来计算使用预训练 TensorFlow 模型的迁移学习的特征:This article demonstrates how to compute features for transfer learning using a pre-trained TensorFlow model, using the following workflow:

  1. 从预训练的深度学习模型(在本例中为 tensorflow.keras.applications 中的一个图像分类模型)开始。Start with a pre-trained deep learning model, in this case an image classification model from tensorflow.keras.applications.
  2. 截断模型的最后一层。Truncate the last layer(s) of the model. 修改后的模型生成一个特征张量作为输出,而不是一个预测。The modified model produces a tensor of features as output, rather than a prediction.
  3. 将该模型应用于其他问题域的新图像数据集,以计算图像的特征。Apply that model to a new image dataset from a different problem domain, computing features for the images.
  4. 使用这些特征来训练新模型。Use these features to train a new model. 以下笔记本省略了这最后一步。The following notebook omits this final step. 有关训练简单模型(例如逻辑回归)的示例,请参阅机器学习和深度学习For examples of training a simple model such as logistic regression, refer to Machine learning and deep learning.

以下笔记本使用 pandas UDF 执行特征化步骤。The following notebook uses pandas UDFs to perform the featurization step. pandas UDF 及其更新的变体 Scalar Iterator pandas UDF 可提供灵活的 API,支持任何深度学习库并提供高性能。pandas UDFs, and their newer variant Scalar Iterator pandas UDFs, offer flexible APIs, support any deep learning library, and give high performance.

使用 TensorFlow 进行特征化和迁移学习Featurization and transfer learning with TensorFlow

获取笔记本Get notebook