Zip 文件 Zip files

Hadoop 不支持作为压缩解压缩程序的 zip 文件。Hadoop does not have support for zip files as a compression codec. 尽管可将 GZip、BZip2 和其他受支持的压缩格式的文本文件配置为在 Apache Spark 中自动解压缩(只要文件的文件扩展名正确),但是必须执行额外的步骤来读取 zip 文件。While a text file in GZip, BZip2, and other supported compression formats can be configured to be automatically decompressed in Apache Spark as long as it has the right file extension, you must perform additional steps to read zip files.

以下笔记本演示如何读取 zip 文件。The following notebooks show how to read zip files. 将 zip 文件下载到临时目录后,可调用 Azure Databricks %sh zip 魔术命令来解压缩文件。After you download a zip file to a temp directory, you can invoke the Azure Databricks %sh zip magic command to unzip the file. 对于笔记本中使用的示例文件,tail 步骤会从解压缩的文件中删除注释行。For the sample file used in the notebooks, the tail step removes a comment line from the unzipped file.

使用 %sh 对文件进行操作时,结果将存储在 /databricks/driver 目录中。When you use %sh to operate on files, the results are stored in the directory /databricks/driver. 使用 Spark API 加载文件之前,请使用 Databricks 实用程序将文件移动至 DBFS。Before you load the file using the Spark API, you move the file to DBFS using Databricks Utilities.

Zip 文件 Python 笔记本Zip files Python notebook

获取笔记本Get notebook

Zip 文件 Scala 笔记本Zip files Scala notebook

获取笔记本Get notebook