Azure Databricks 数据集 Azure Databricks datasets

Azure Databricks 包括装载到 Databricks 文件系统 (DBFS) 的各种数据集,你可以使用这些数据集来了解 Apache Spark 或测试算法。Azure Databricks includes a variety of datasets mounted to the Databricks File System (DBFS) that you can use to either learn Apache Spark or test algorithms. 这些数据集遍布在所有文档页面中。You’ll see these throughout the documentation pages.

若要浏览这些文件,可以使用 Databricks 实用工具To browse these files, you can use Databricks Utilities. 下面是可以用来列出所有 Databricks 数据集的代码片段。Here’s a code snippet that you can use to list all of the Databricks datasets.

display(dbutils.fs.ls("/databricks-datasets"))

可以打印出任何数据集的 README,以获取其详细信息。You can print out the README for any dataset to get more information about it.

with open("/dbfs/databricks-datasets/README.md") as f:
    x = ''.join(f.readlines())

print(x)