FileStoreFileStore

FileStore 是 Databricks 文件系统 (DBFS) 中的一个特殊文件夹,可在其中保存文件并使其可供 Web 浏览器访问。FileStore is a special folder within Databricks File System (DBFS) where you can save files and have them accessible to your web browser. 可使用 FileStore 执行以下操作:You can use FileStore to:

  • 保存在调用 displayHTML 时可在 HTML 和 JavaScript 中访问的文件,例如图像和库。Save files, such as images and libraries, that are accessible within HTML and JavaScript when you call displayHTML.
  • 保存要下载到本地桌面的输出文件。Save output files that you want to download to your local desktop.
  • 从本地桌面上传 CSV 和其他数据文件以在 Databricks 上处理。Upload CSVs and other data files from your local desktop to process on Databricks.

当你使用某些功能时,Azure Databricks 会将文件放在 FileStore 下的以下文件夹中:When you use certain features, Azure Databricks puts files in the following folders under FileStore:

  • /FileStore/jars - 包含你上传的库。/FileStore/jars - contains libraries that you upload. 如果删除此文件夹中的文件,则工作区中引用这些文件的库可能不再工作。If you delete files in this folder, libraries that reference these files in your workspace may no longer work.
  • /FileStore/tables - 包含使用 UI 导入的文件。/FileStore/tables - contains the files that you import using the UI. 如果删除此文件夹中的文件,则可能无法再访问从这些文件创建的表。If you delete files in this folder, tables that you created from these files may no longer be accessible.
  • /FileStore/plots - 包含在对 Python 或 R 绘图对象(如 ggplotmatplotlib 绘图)调用 display() 时在笔记本中创建的图像。/FileStore/plots - contains images created in notebooks when you call display() on a Python or R plot object, such as a ggplot or matplotlib plot. 如果删除此文件夹中的文件,则可能必须在引用它们的笔记本中再次生成这些绘图。If you delete files in this folder, you may have to regenerate those plots in the notebooks that reference them. 有关详细信息,请参阅 Matplotlibggplot2See Matplotlib and ggplot2for more information.
  • /FileStore/import-stage - 包含导入笔记本Databricks 存档文件时创建的临时文件。/FileStore/import-stage - contains temporary files created when you import notebooks or Databricks archives files. 这些临时文件在笔记本导入完成后消失。These temporary files disappear after the notebook import completes.

将文件保存到 FileStoreSave a file to FileStore

若要将文件保存到 FileStore,请将其放在 DBFS 的 /FileStore 目录中:To save a file to FileStore, put it in the /FileStore directory in DBFS:

dbutils.fs.put("/FileStore/my-stuff/my-file.txt", "Contents of my file")

在以下示例中,请将 <databricks-instance> 替换为 Azure Databricks 部署的工作区 URLIn the following, replace <databricks-instance> with the workspace URL of your Azure Databricks deployment.

可通过 Web 浏览器 (https://<databricks-instance>/files/<path-to-file>?o=######) 访问 /FileStore 中存储的文件。Files stored in /FileStore are accessible in your web browser at https://<databricks-instance>/files/<path-to-file>?o=######. 例如,可在 https://<databricks-instance>/files/my-stuff/my-file.txt?o=###### 访问 /FileStore/my-stuff/my-file.txt 中存储的文件,其中 o= 后面的数字与 URL 中的数字相同。For example, the file you stored in /FileStore/my-stuff/my-file.txt is accessible at https://<databricks-instance>/files/my-stuff/my-file.txt?o=###### where the number after o= is the same as in your URL.

在笔记本中嵌入静态图像 Embed static images in notebooks

可使用 files/ 位置将静态图像嵌入到笔记本中:You can use the files/ location to embed static images into your notebooks:

displayHTML("<img src ='files/image.jpg/'>")

或者使用 Markdown 图像导入语法:or Markdown image import syntax:

%md
![my_test_image](files/image.jpg)

可使用 DBFS Databricks REST APIrequests Python HTTP 库上传静态图像。You can upload static images using the DBFS Databricks REST API and the requests Python HTTP library. 如下示例中:In the following example:

  • (将 <databricks-instance> 替换为 Azure Databricks 部署的工作区 URL)。Replace <databricks-instance> with the workspace URL of your Azure Databricks deployment.
  • <token> 替换为个人访问令牌的值。Replace <token> with the value of your personal access token.
  • <image-dir> 替换为 FileStore 中要上传图像文件的位置。Replace <image-dir> with the location in FileStore where you want to upload the image files.
import requests
import json
import os

TOKEN = '<token>'
headers = {'Authorization': 'Bearer %s' % TOKEN}
url = "https://<databricks-instance>/api/2.0"
dbfs_dir = "dbfs:/FileStore/<image-dir>/"

def perform_query(path, headers, data={}):
  session = requests.Session()
  resp = session.request('POST', url + path, data=json.dumps(data), verify=True, headers=headers)
  return resp.json()

def mkdirs(path, headers):
  _data = {}
  _data['path'] = path
  return perform_query('/dbfs/mkdirs', headers=headers, data=_data)

def create(path, overwrite, headers):
  _data = {}
  _data['path'] = path
  _data['overwrite'] = overwrite
  return perform_query('/dbfs/create', headers=headers, data=_data)

def add_block(handle, data, headers):
  _data = {}
  _data['handle'] = handle
  _data['data'] = data
  return perform_query('/dbfs/add-block', headers=headers, data=_data)

def close(handle, headers):
  _data = {}
  _data['handle'] = handle
  return perform_query('/dbfs/close', headers=headers, data=_data)

def put_file(src_path, dbfs_path, overwrite, headers):
  handle = create(dbfs_path, overwrite, headers=headers)['handle']
  print("Putting file: " + dbfs_path)
  with open(src_path, 'rb') as local_file:
    while True:
      contents = local_file.read(2**20)
      if len(contents) == 0:
        break
      add_block(handle, b64encode(contents).decode(), headers=headers)
    close(handle, headers=headers)

mkdirs(path=dbfs_dir, headers=headers)
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
  if ".png" in f:
    target_path = dbfs_dir + f
    resp = put_file(src_path=f, dbfs_path=target_path, overwrite=True, headers=headers)
    if resp == None:
      print("Success")
    else:
      print(resp)

缩放静态图像Scale static images

若要缩小已保存到 DBFS 中的图像,请将图像复制到 /FileStore,然后使用 displayHTML 中的图像参数重设大小:To scale the size of an image that you have saved to DBFS, copy the image to /FileStore and then resize using image parameters in displayHTML:

dbutils.fs.cp('dbfs:/user/experimental/MyImage-1.png','dbfs:/FileStore/images/')
displayHTML('''<img src="files/images/MyImage-1.png" style="width:600px;height:600px;">''')

使用 Javascript 库Use a Javascript library

此笔记本演示如何使用 FileStore 来包含 JavaScript 库。This notebook shows how to use FileStore to contain a JavaScript library.

FileStore 演示笔记本FileStore demo notebook

获取笔记本Get notebook