预处理数据Preprocess data

对于大型数据集,可使用 Spark SQL 和 MLlib 进行特征工程处理。On large datasets, you can use Spark SQL and MLlib for feature engineering. Databricks Runtime ML 中包含的第三方库(例如 scikit-learn)也提供了有用的帮助程序方法。Third-party libraries included in Databricks Runtime ML such as scikit-learn also provide useful helper methods. 相关示例,请参阅下面关于 scikit-learn 和 MLlib 的机器学习笔记本:For examples, see the following machine learning notebooks for scikit-learn and MLlib:

对于更复杂的深度学习特征处理,该示例笔记本演示了如何使用迁移学习进行特征化处理:For more complex deep learning feature processing, this example notebook illustrates how to use transfer learning for featurization: