使用笔记本Use notebooks

笔记本是可运行的单元格(命令)的集合。A notebook is a collection of runnable cells (commands). 使用笔记本时,你主要是在开发和运行单元格。When you use a notebook, you are primarily developing and running cells.

UI 操作支持所有笔记本任务,但你也可以使用键盘快捷方式执行许多任务。All notebook tasks are supported by UI actions, but you can also perform many tasks using keyboard shortcuts. 通过单击 “键盘”图标 图标或选择“? > 快捷方式”来切换快捷方式显示。Toggle the shortcut display by clicking the Keyboard Icon icon or selecting ? > Shortcuts.

键盘快捷键Keyboard shortcuts

开发笔记本 Develop notebooks

本部分介绍如何开发笔记本单元格并在笔记本中导航。This section describes how to develop notebook cells and navigate around a notebook.

本节内容:In this section:

关于笔记本About notebooks

笔记本包含一个工具栏,用于在笔记本中管理笔记本和执行操作:A notebook has a toolbar that lets you manage the notebook and perform actions within the notebook:

笔记本工具栏Notebook toolbar

此外还有一个或多个单元格(或命令),可以用来运行:and one or more cells (or commands) that you can run:

笔记本单元格Notebook cells

在单元格的最右侧,单元格操作 单元格操作 包含三个菜单:“运行”、“仪表板”和“编辑”:At the far right of a cell, the cell actions Cell actions, contains three menus: Run , Dashboard , and Edit :

“运行”图标 仪表板 编辑

以及两个操作:隐藏and two actions: Hide 将单元格最小化 和“删除”and Delete “删除”图标..

添加单元格Add a cell

若要添加单元格,请将鼠标指针悬停在顶部或底部的某个单元格上,然后单击 添加单元格 图标,或访问最右侧的笔记本单元格菜单,单击 向下的脱字号,然后选择“添加上面的单元格”或“添加下面的单元格”。To add a cell, mouse over a cell at the top or bottom and click the Add Cell icon, or access the notebook cell menu at the far right, click Down Caret, and select Add Cell Above or Add Cell Below.

删除单元格Delete a cell

转到最右侧的“单元格操作”菜单Go to the cell actions menu 单元格操作 并单击at the far right and click “删除”图标 (删除)。(Delete).

删除单元格时,默认情况下会显示一个用于确认删除操作的对话框。When you delete a cell, by default a delete confirmation dialog displays. 若要禁用未来的确认对话框,请选中“不再显示此对话框”复选框,然后单击“确认”。To disable future confirmation dialogs, select the Do not show this again checkbox and click Confirm. 你还可以切换确认对话框设置,只需使用 “帐户”图标 >“用户设置”>“笔记本设置”中的“启用命令删除确认”选项即可。 You can also toggle the confirmation dialog setting with the Turn on command delete confirmation option in Account Icon > User Settings > Notebook Settings.

若要还原已删除的单元格,请选择“编辑 > 撤消删除单元格”或使用 (Z) 键盘快捷方式。To restore deleted cells, either select Edit > Undo Delete Cells or use the (Z) keyboard shortcut.

剪切单元格Cut a cell

转到最右侧的“单元格操作”菜单 单元格操作,单击 向下的脱字号,然后选择“剪切单元格”。Go to the cell actions menu Cell actions at the far right, click Down Caret, and select Cut Cell.

还可以使用 (X) 键盘快捷方式。You can also use the (X) keyboard shortcut.

若要还原已删除的单元格,请选择“编辑 > 撤消剪切单元格”或使用 (Z) 键盘快捷方式。To restore deleted cells, either select Edit > Undo Cut Cells or use the (Z) keyboard shortcut.

选择多个单元格或所有单元格Select multiple cells or all cells

可以分别对上一个和下一个单元格使用 Shift + 向上箭头向下箭头 来选择相邻的笔记本单元格。You can select adjacent notebook cells using Shift + Up or Down for the previous and next cell respectively. 可以复制、剪切、删除和粘贴多选的单元格。Multi-selected cells can be copied, cut, deleted, and pasted.

若要选择所有单元格,请选择“编辑”>“选择所有单元格”或使用命令模式快捷方式 Cmd+ATo select all cells, select Edit > Select All Cells or use the command mode shortcut Cmd+A.

默认语言Default language

每个单元格的默认语言显示在笔记本名称旁边的 () 链接中。The default language for each cell is shown in a () link next to the notebook name. 在以下笔记本中,默认语言为 SQL。In the following notebook, the default language is SQL.

笔记本默认语言Notebook default language

若要更改默认语言,请执行以下操作:To change the default language:

  1. 单击 () 链接。Click () link. 此时会显示“更改默认语言”对话框。The Change Default Language dialog displays.

    笔记本默认语言Notebook default language

  2. 从“默认语言”下拉列表中选择新语言。Select the new language from the Default Language drop-down.

  3. 单击“更改”。Click Change.

  4. 为确保现有命令可继续正常工作,以前的默认语言的命令会自动带有语言 magic 命令前缀。To ensure that existing commands continue to work, commands of the previous default language are automatically prefixed with a language magic command.

混合语言 Mix languages

可以通过在单元格开头指定语言 magic 命令 %<language> 来重写默认语言。You can override the default language by specifying the language magic command %<language> at the beginning of a cell. 支持的 magic 命令为:%python%r%scala%sqlThe supported magic commands are: %python, %r, %scala, and %sql.

备注

调用语言 magic 命令时,该命令会被调度到笔记本的执行上下文中的 REPL。When you invoke a language magic command, the command is dispatched to the REPL in the execution context for the notebook. 用一种语言定义(并且因此位于该语言的 REPL 中)的变量在其他语言的 REPL 中不可用。Variables defined in one language (and hence in the REPL for that language) are not available in the REPL of another language. REPL 只能通过外部资源(例如 DBFS 中的文件或对象存储中的对象)共享状态。REPLs can share state only through external resources such as files in DBFS or objects in object storage.

笔记本还支持几个辅助 magic 命令:Notebooks also support a few auxiliary magic commands:

  • %sh:允许你在笔记本中运行 shell 代码。%sh: Allows you to run shell code in your notebook. 若要在 shell 命令的退出状态为非零值的情况下使单元格发生失败,请添加 -e 选项。To fail the cell if the shell command has a non-zero exit status, add the -e option. 此命令仅在 Apache Spark 驱动程序上运行,不在工作器上运行。This command runs only on the Apache Spark driver, and not the workers. 若要在所有节点上运行 shell 命令,请使用初始化脚本To run a shell command on all nodes, use an init script.
  • %fs:允许你使用 dbutils 文件系统命令。%fs: Allows you to use dbutils filesystem commands. 请参阅 dbutilsSee dbutils.
  • %md:允许你包括各种类型的文档,例如文本、图像以及数学公式和等式。%md: Allows you to include various types of documentation, including text, images, and mathematical formulas and equations. 请参阅下一部分。See the next section.

包括文档Include documentation

若要在笔记本中包括文档,可以使用 %md magic 命令来标识 Markdown 标记。To include documentation in a notebook you can use the %md magic command to identify Markdown markup. 包括的 Markdown 标记将呈现为 HTML。The included Markdown markup is rendered into HTML. 例如,此 Markdown 代码片段包含一级标题的标记:For example, this Markdown snippet contains markup for a level-one heading:

%md # Hello This is a Title

它会呈现为 HTML 标题:It is rendered as a HTML title:

笔记本 HTML 标题Notebook HTML title

可折叠的标题Collapsible headings

在包含 Markdown 标题的单元格后显示的单元格可以折叠到标题单元格中。Cells that appear after cells containing Markdown headings can be collapsed into the heading cell. 下图显示了名为“标题 1”的一级标题,后面的两个单元格已折叠到其中。The following image shows a level-one heading called Heading 1 with the following two cells collapsed into it.

折叠的单元格Collapsed cells

若要展开和折叠标题,请单击 +-To expand and collapse headings, click the + and -.

另请参阅隐藏和显示单元格内容Also see Hide and show cell content.

可以使用相对路径链接到 Markdown 单元格中的其他笔记本或文件夹。You can link to other notebooks or folders in Markdown cells using relative paths. 将定位点标记的 href 属性指定为相对路径(以 $ 开头),然后遵循与 Unix 文件系统中的模式相同的模式:Specify the href attribute of an anchor tag as the relative path, starting with a $ and then follow the same pattern as in Unix file systems:

%md
<a href="$./myNotebook">Link to notebook in same folder as current notebook</a>
<a href="$../myFolder">Link to folder in parent folder of current notebook</a>
<a href="$./myFolder2/myNotebook2">Link to nested notebook</a>

显示图像Display images

若要显示在 FileStore 中存储的图像,请使用以下语法:To display images stored in the FileStore, use the syntax:

%md
![test](files/image.png)

例如,假设你在 FileStore 中有 Databricks 徽标图像文件:For example, suppose you have the Databricks logo image file in FileStore:

dbfs ls dbfs:/FileStore/
databricks-logo-mobile.png

在 Markdown 单元格中包括以下代码时:When you include the following code in a Markdown cell:

Markdown 单元格中的图像Image in Markdown cell

图像会呈现在单元格中:the image is rendered in the cell:

呈现的图像Rendered image

显示数学等式Display mathematical equations

笔记本支持 KaTeX,用于显示数学公式和等式。Notebooks support KaTeX for displaying mathematical formulas and equations. 例如,For example,

%md
\\(c = \\pm\\sqrt{a^2 + b^2} \\)

\\(A{_i}{_j}=B{_i}{_j}\\)

$$c = \\pm\\sqrt{a^2 + b^2}$$

\\[A{_i}{_j}=B{_i}{_j}\\]

呈现为:renders as:

呈现的等式 1Rendered equation 1

andand

%md
\\( f(\beta)= -Y_t^T X_t \beta + \sum log( 1+{e}^{X_t\bullet\beta}) + \frac{1}{2}\delta^t S_t^{-1}\delta\\)

where \\(\delta=(\beta - \mu_{t-1})\\)

呈现为:renders as:

呈现的等式 2Rendered equation 2

包括 HTMLInclude HTML

可以使用函数 displayHTML 在笔记本中包括 HTML。You can include HTML in a notebook by using the function displayHTML. 请参阅笔记本中的 HTML、D3 和 SVG,通过示例来了解如何执行此操作。See HTML, D3, and SVG in notebooks for an example of how to do this.

备注

displayHTML iframe 是从域 databricksusercontent.com 提供的,iframe 沙盒包含 allow-same-origin 属性。The displayHTML iframe is served from the domain databricksusercontent.com and the iframe sandbox includes the allow-same-origin attribute. 必须可在浏览器中访问 databricksusercontent.comdatabricksusercontent.com must be accessible from your browser. 如果它当前被企业网络阻止,IT 人员需要将它加入允许列表。If it is currently blocked by your corporate network, it will need to be whitelisted by IT.

命令注释Command comments

你可以使用命令注释与协作者进行讨论。You can have discussions with collaborators using command comments.

若要切换“注释”边栏,请单击笔记本右上方的“注释”按钮。To toggle the Comments sidebar, click the Comments button at the top right of a notebook.

切换笔记本注释Toggle notebook comments

若要向命令添加注释,请执行以下操作:To add a comment to a command:

  1. 突出显示命令文本,然后单击注释气泡:Highlight the command text and click the comment bubble:

    打开注释Open comments

  2. 添加你的注释,然后单击“注释”。Add your comment and click Comment.

    添加注释Add commenty

若要编辑、删除或回复某项注释,请单击该注释并选择一项操作。To edit, delete, or reply to a comment, click the comment and choose an action.

编辑注释Edit comment

更改单元格显示 Change cell display

笔记本有三个显示选项:There are three display options for notebooks:

  • 标准视图:结果紧跟在代码单元格之后显示Standard view: results are displayed immediately after code cells
  • 仅显示结果:只显示结果Results only: only results are displayed
  • 并排显示:代码和结果单元格并排显示,结果显示在右侧Side-by-side: code and results cells are displayed side by side, with results to the right

转到“视图”菜单Go to the View menu 视图菜单 以选择显示选项。to select your display option.

“并排显示”视图side-by-side view

显示行号和命令号Show line and command numbers

若要显示行号或命令号,请转到“视图”菜单 “视图”菜单,选择“显示行号”或“显示命令号”。To show line numbers or command numbers, go to the View menu View Menu and select Show line numbers or Show command numbers. 在它们显示后,你可以在同一菜单中再次隐藏它们。Once they’re displayed, you can hide them again from the same menu. 你还可以使用键盘快捷方式 Control+L 来启用行号。You can also enable line numbers with the keyboard shortcut Control+L.

通过“视图”菜单显示行号或命令号Show line or command numbers via the view menu

在笔记本中启用的行号和命令号Line and command numbers enabled in notebook

如果启用行号或命令号,Databricks 会保存你的首选项,并在该浏览器的所有其他笔记本中显示它们。If you enable line or command numbers, Databricks saves your preference and shows them in all of your other notebooks for that browser.

单元格上方的命令号会链接到该特定命令。Command numbers above cells link to that specific command. 如果单击某个单元格的命令号,则会更新 URL,使之定位到该命令。If you click on the command number for a cell, it updates your URL to be anchored to that command. 如果要链接到笔记本中的特定命令,请右键单击命令号,然后选择“复制链接地址”。If you want to link to a specific command in your notebook, right-click the command number and choose copy link address.

查找和替换文本Find and replace text

若要查找和替换笔记本中的文本,请选择“文件”>“查找和替换”。To find and replace text within a notebook, select File > Find and Replace.

查找和替换文本Find and replace text

当前的匹配项以橙色突出显示,所有其他的匹配项以黄色突出显示。The current match is highlighted in orange and all other matches are highlighted in yellow.

匹配文本Matching text

可以通过单击“替换”来逐个替换匹配项。You can replace matches on an individual basis by clicking Replace.

可以在匹配项之间切换,方法是:单击“上一个”和“下一个”按钮,或按 Shift+Enter 和 Enter,分别转到上一个和下一个匹配项。 You can switch between matches by clicking the Prev and Next buttons or pressing shift+enter and enter to go to the previous and next matches, respectively.

通过单击 “删除”图标 或按 Esc ,关闭查找和替换工具。Close the find and replace tool by clicking Delete Icon or by pressing esc.

自动完成Autocomplete

可以使用 Azure Databricks 的自动完成功能,以便在单元格中输入代码段时自动完成这些代码段。You can use Azure Databricks autocomplete features to automatically complete code segments as you enter them in cells. 这可以减少那些需要记住的内容,最大限度地减少需要完成的键入量。This reduces what you have to remember and minimizes the amount of typing you have to do. Azure Databricks 在笔记本中支持两种类型的自动完成:本地自动完成和服务器自动完成。Azure Databricks supports two types of autocomplete in your notebook: local and server.

本地自动完成会完成笔记本中存在的单词。Local autocomplete completes words that exist in the notebook. 服务器自动完成的功能更强大,因为它会针对定义的类型、类和对象以及 SQL 数据库和表名称访问群集。Server autocomplete is more powerful because it accesses the cluster for defined types, classes, and objects, as well as SQL database and table names. 若要激活服务器自动完成功能,必须附加将笔记本附加到群集运行所有单元格功能,以便定义可完成的对象。To activate server autocomplete, you must attach your attach a notebook to a cluster and run all cells that define completable objects.

重要

在执行命令的过程中,会阻止 R 笔记本中的服务器自动完成功能。Server autocomplete in R notebooks is blocked during command execution.

输入一个可完成的对象后按 Tab 键可触发自动完成。You trigger autocomplete by pressing Tab after entering a completable object. 例如,在定义和运行包含 MyClassinstance 的定义的单元后,instance 的方法就是可完成的方法。当你按 Tab 键时,会显示有效的完成操作的列表。For example, after you define and run the cells containing the definitions of MyClass and instance, the methods of instance are completable, and a list of valid completions displays when you press Tab.

触发自动完成Trigger autocomplete

类型完成采用的方式与 SQL 数据库和表名称的完成采用的方式相同。Type completion and SQL database and table name completion work in the same way.

类型完成 — —— — SQL 完成

设置 SQL 格式 Format SQL

Azure Databricks 提供的工具可用于在笔记本单元格中快速且轻松地设置 SQL 代码的格式。Azure Databricks provides tools that allow you to format SQL code in notebook cells quickly and easily. 这些工具减少了使代码带有格式的工作量,有助于在笔记本中强制实施相同的编码标准。These tools reduce the effort to keep your code formatted and help to enforce the same coding standards across your notebooks.

可通过以下方式触发格式化程序:You can trigger the formatter in the following ways:

  • 单个单元格Single cells

    • 键盘快捷方式:按 Cmd+Shift+FKeyboard shortcut: Press Cmd+Shift+F.

    • 命令上下文菜单:在 SQL 单元格的命令上下文下拉菜单中选择“设置 SQL 格式”。Command context menu: Select Format SQL in the command context drop-down menu of a SQL cell. 此项仅在 SQL 笔记本单元格和具有 %sql 语言 magic 的单元格中可见。This item is visible only in SQL notebook cells and those with a %sql language magic.

      从命令上下文菜单设置 SQL 格式Formatting SQL From Command Context

  • 多个单元格Multiple cells

    选择多个 SQL 单元格,然后选择“编辑”>“设置 SQL 单元格格式”。Select multiple SQL cells and then select Edit > Format SQL Cells. 如果选择多个语言的单元格,则仅会设置 SQL 单元格的格式。If you select cells of more than one language, only SQL cells are formatted. 这包括那些使用 %sql 的单元格。This includes those that use %sql.

    在“编辑”菜单中设置 SQL 格式Formatting SQL From Edit Menu

下面是前面示例中进行格式设置后的第一个单元格:Here’s the first cell in the preceding example after formatting:

设置 SQL 格式之后After Formatting SQL

运行笔记本 Run notebooks

此部分介绍如何运行一个或多个笔记本单元格。This section describes how to run one or more notebook cells.

本节内容:In this section:

要求Requirements

必须将笔记本附加到群集。The notebook must be attached to a cluster. 如果群集未运行,则会在运行一个或多个单元格时启动该群集。If the cluster is not running, the cluster is started when you run one or more cells.

运行单元格Run a cell

在最右侧的“单元格操作”菜单 单元格操作 中,单击 “运行”图标 并选择“运行单元格”,或者按 Shift+EnterIn the cell actions menu Cell actions at the far right, click Run Icon and select Run Cell , or press shift+enter.

重要

笔记本单元格(内容和输出)的最大大小为 16MB。The maximum size for a notebook cell, both contents and output, is 16MB.

例如,尝试运行这个引用预定义的 spark 变量的 Python 代码片段。For example, try running this Python code snippet that references the predefined spark variable.

spark

然后,运行一些真实的代码:and then, run some real code:

1+1 # => 2

备注

笔记本有多个默认设置:Notebooks have a number of default settings:

  • 运行某个单元格时,笔记本会自动附加到正在运行的群集,而不会进行提示。When you run a cell, the notebook automatically attaches to a running cluster without prompting.
  • Shift+Enter 时,如果该单元格不可见,笔记本会自动滚动到下一个单元格。When you press shift+enter , the notebook auto-scrolls to the next cell if the cell is not visible.

若要更改这些设置,请选择 “帐户”图标 >“用户设置”>“笔记本设置”并配置相应的复选框。To change these settings, select Account Icon > User Settings > Notebook Settings and configure the respective checkboxes.

运行上方或下方的所有单元格Run all above or below

若要运行某个单元格之前或之后的所有单元格,请转到最右侧的“单元格操作”菜单 单元格操作,单击 “运行”菜单 并选择“运行上方的所有单元格”或“运行下方的所有单元格” 。To run all cells before or after a cell, go to the cell actions menu Cell actions at the far right, click Run Menu, and select Run All Above or Run All Below.

“运行下方的所有单元格”包括你所在的单元格。Run All Below includes the cell you are in. “运行上方的所有单元格”不包括你所在的单元格。Run All Above does not.

运行所有单元格 Run all cells

若要运行笔记本中的所有单元格,请在笔记本工具栏中选择“全部运行”。To run all the cells in a notebook, select Run All in the notebook toolbar.

重要

如果在同一笔记本中执行装载和卸载步骤,请不要执行“全部运行”。Do not do a Run All if steps for mount and unmount are in the same notebook. 这可能导致出现争用情况并可能损坏装入点。It could lead to a race condition and possibly corrupt the mount points.

查看每个单元格的多个输出View multiple outputs per cell

Python 笔记本以及非 Python 笔记本中的 %python 单元格支持每个单元格多个输出。Python notebooks and %python cells in non-Python notebooks support multiple outputs per cell.

一个单元格中有多个输出Multiple outputs in one cell

此功能需要 Databricks Runtime 7.1 或更高版本,并且在 Databricks Runtime 7.1 中默认处于禁用状态。This feature requires Databricks Runtime 7.1 or above and is disabled by default in Databricks Runtime 7.1. 可以通过设置 spark.databricks.workspace.multipleResults.enabled true 来启用它。Enable it by setting spark.databricks.workspace.multipleResults.enabled true.

Python 和 Scala 错误突出显示Python and Scala error highlighting

Python 和 Scala 笔记本支持错误突出显示。Python and Scala notebooks support error highlighting. 也就是说,系统会在单元格中突出显示引发错误的代码行。That is, the line of code that is throwing the error will be highlighted in the cell. 此外,如果错误输出为堆栈跟踪,则引发错误的单元格会在堆栈跟踪中显示为指向该单元格的链接。Additionally, if the error output is a stacktrace, the cell in which the error is thrown is displayed in the stacktrace as a link to the cell. 单击该链接可跳转到有问题的代码。You can click this link to jump to the offending code.

Python 错误突出显示Python error highlighting

Scala 错误突出显示Scala error highlighting

通知Notifications

通知会提醒你某些事件,例如,哪个命令当前正在运行所有单元格阶段运行,哪些命令处于错误状态。Notifications alert you to certain events, such as which command is currently running during Run all cells and which commands are in error state. 当笔记本显示多个错误通知时,第一个错误通知会有一个用于清除所有通知的链接。When your notebook is showing multiple error notifications, the first one will have a link that allows you to clear all notifications.

笔记本通知Notebook notifications

默认情况下,笔记本通知处于启用状态。Notebook notifications are enabled by default. 可以在 “帐户”图标 >“用户设置”>“笔记本设置”下禁用它们。You can disable them under Account Icon > User Settings > Notebook Settings.

Databricks 顾问Databricks Advisor

Databricks 顾问会在每次运行命令时自动分析命令,并在笔记本中显示相应的建议。Databricks Advisor automatically analyzes commands every time they are run and displays appropriate advice in the notebooks. 建议通知所提供的信息可帮助你提高工作负载的性能、降低成本以及避免常见错误。The advice notices provide information that can assist you in improving the performance of workloads, reducing costs, and avoiding common mistakes.

查看建议View advice

带有灯泡图标的蓝色框表示命令已有建议。A blue box with a lightbulb icon signals that advice is available for a command. 该框会显示不同建议的数目。The box displays the number of distinct pieces of advice.

Databricks 建议Databricks advice

单击灯泡可展开框并查看建议。Click the lightbulb to expand the box and view the advice. 一个或多个建议将变得可见。One or more pieces of advice will become visible.

查看建议View advice

单击“了解更多”链接可查看相关文档,了解与该建议相关的详细信息。Click the Learn more link to view documentation providing more information related to the advice.

单击“不再显示此对话框”链接可隐藏该建议。Click the Don’t show me this again link to hide the piece of advice. 此类型的建议将不再显示。The advice of this type will no longer be displayed. 此操作可在“笔记本设置”中逆转This action can be reversed in Notebook Settings.

再次单击灯泡可折叠建议框。Click the lightbulb again to collapse the advice box.

建议设置Advice settings

可以通过选择 “帐户”图标 >“用户设置”>“笔记本设置”或单击展开的建议框中的齿轮图标来访问“笔记本设置”页。Access the Notebook Settings page by selecting Account Icon > User Settings > Notebook Settings or by clicking the gear icon in the expanded advice box.

笔记本设置Notebook settings

切换“启用 Databricks 顾问”选项可启用或禁用通知。Toggle the Turn on Databricks Advisor option to enable or disable advice.

如果当前隐藏了一个或多个类型的建议,则会显示“重置隐藏的建议”链接。The Reset hidden advice link is displayed if one or more types of advice is currently hidden. 单击链接可使该建议类型再次可见。Click the link to make that advice type visible again.

从一个笔记本中运行另一个笔记本 Run a notebook from another notebook

可以通过使用 %run <notebook> magic 命令从一个笔记本中运行另一个笔记本。You can run a notebook from another notebook by using the %run <notebook> magic command. 这大致相当于本地计算机上的 Scala REPL 中的 :load 命令或 Python 中的 import 语句。This is roughly equivalent to a :load command in a Scala REPL on your local machine or an import statement in Python. <notebook> 中定义的所有变量都会变得可在当前笔记本中使用。All variables defined in <notebook> become available in your current notebook.

%run 必须独自位于某个单元格中,因为它会以内联方式运行整个笔记本。%run must be in a cell by itself , because it runs the entire notebook inline.

备注

不能使用 %run 来运行 Python 文件并将该文件中定义的实体 import 到笔记本中。You cannot use %run to run a Python file and import the entities defined in that file into a notebook. 若要从 Python 文件导入,必须将文件打包到 Python 库,从该 Python 库创建 Azure Databricks ,然后将库安装到用于运行笔记本的群集To import from a Python file you must package the file into a Python library, create an Azure Databricks library from that Python library, and install the library into the cluster you use to run your notebook.

示例Example

假设你有 notebookAnotebookBSuppose you have notebookA and notebookB. notebookA 包含一个具有以下 Python 代码的单元格:notebookA contains a cell that has the following Python code:

x = 5

即使未在 notebookB 中定义 x,也可以在运行 %run notebookA 后访问 notebookB 中的 xEven though you did not define x in notebookB, you can access x in notebookB after you run %run notebookA.

%run /Users/path/to/notebookA

print(x) # => 5

若要指定相对路径,请在其前面加上 ./../To specify a relative path, preface it with ./ or ../. 例如,如果 notebookAnotebookB 在同一目录中,则也可从相对路径运行它们。For example, if notebookA and notebookB are in the same directory, you can alternatively run them from a relative path.

%run ./notebookA

print(x) # => 5
%run ../someDirectory/notebookA # up a directory and into another

print(x) # => 5

有关笔记本之间的更复杂的交互,请参阅笔记本工作流For more complex interactions between notebooks, see Notebook workflows.

管理笔记本状态和结果 Manage notebook state and results

将笔记本附加到群集运行一个或多个单元格后,笔记本就会有状态并会显示结果。After you attach a notebook to a cluster and run one or more cells, your notebook has state and displays results. 此部分介绍如何管理笔记本状态和结果。This section describes how to manage notebook state and results.

本节内容:In this section:

清除笔记本状态和结果 Clear notebooks state and results

若要清除笔记本状态和结果,请单击笔记本工具栏中的“清除”,然后选择操作:To clear the notebook state and results, click Clear in the notebook toolbar and select the action:

清除状态和结果Clear state and results

下载结果Download results

默认情况下已启用“下载结果”。By default downloading results is enabled. 若要切换此设置,请参阅管理从笔记本下载结果的功能To toggle this setting, see Manage the ability to download results from notebooks. 如果“下载结果”处于禁用状态,则不会显示 下载结果 按钮。If downloading results is disabled, the Download Result button is not visible.

下载单元格结果Download a cell result

可以将包含表格输出的单元格结果下载到本地计算机。You can download a cell result that contains tabular output to your local machine. 单击单元格底部的Click the 下载结果 按钮。button at the bottom of a cell.

下载单元格结果Download cell results

名为 export.csv 的 CSV 文件将下载到默认的下载目录。A CSV file named export.csv is downloaded to your default download directory.

下载完整结果Download full results

默认情况下,Azure Databricks 返回一个数据帧的 1000 行。By default Azure Databricks returns 1000 rows of a DataFrame. 超过 1000 行时,系统会将一个向下箭头When there are more than 1000 rows, a down arrow 下拉按钮 添加到is added to the 下载结果 按钮。button. 若要下载某个查询的所有结果,请执行以下操作:To download all the results of a query:

  1. 单击 下载结果 旁边的向下箭头,然后选择“下载完整结果”。Click the down arrow next to Download Result and select Download full results.

    下载完整结果Download full results

  2. 选择“重新执行并下载”。Select Re-execute and download.

    重新运行并下载结果Re-run and download results

    下载完整结果后,名为 export.csv 的 CSV 文件就会下载到本地计算机,此时会在 /databricks-results 文件夹中出现一个生成的文件夹,其中包含完整的查询结果。After you download full results, a CSV file named export.csv is downloaded to your local machine and the /databricks-results folder has a generated folder containing full the query results.

    下载的结果Downloaded results

隐藏和显示单元格内容 Hide and show cell content

单元格内容包含单元格代码和运行单元格后的结果。Cell content consists of cell code and the result of running the cell. 可以使用单元格右上角的“单元格操作”菜单You can hide and show the cell code and result using the cell actions menu 单元格操作 隐藏和显示单元格代码和结果。at the top right of the cell.

若要隐藏单元格代码,请执行以下操作:To hide cell code:

  • 单击 向下的脱字号 并选择“隐藏代码”Click Down Caret and select Hide Code

若要隐藏和显示单元格结果,请执行下列任一操作:To hide and show the cell result, do any of the following:

  • 单击 向下的脱字号 并选择“隐藏结果”Click Down Caret and select Hide Result
  • SelectSelect 将单元格最小化
  • 键入 Esc > Shift + oType Esc > Shift + o

若要显示隐藏的单元格代码或结果,请单击“显示”链接:To show hidden cell code or results, click the Show links:

显示隐藏的代码和结果Show hidden code and results

另请参阅可折叠的标题See also Collapsible headings.

笔记本隔离Notebook isolation

笔记本隔离是指变量和类在笔记本之间的可见性。Notebook isolation refers to the visibility of variables and classes between notebooks. Azure Databricks 支持两种类型的隔离:Azure Databricks supports two types of isolation:

  • 变量和类隔离Variable and class isolation
  • Spark 会话隔离Spark session isolation

备注

由于附加到同一群集的所有笔记本都在同一群集 VM 上执行,因此即使启用了 Spark 会话隔离,也不能保证群集内的用户隔离。Since all notebooks attached to the same cluster execute on the same cluster VMs, even with Spark session isolation enabled there is no guaranteed user isolation within a cluster.

变量和类隔离Variable and class isolation

变量和类仅在当前笔记本中可用。Variables and classes are available only in the current notebook. 例如,附加到同一群集的两个笔记本可以定义具有同一名称的变量和类,但这些对象是不同的。For example, two notebooks attached to the same cluster can define variables and classes with the same name, but these objects are distinct.

若要定义一个对附加到同一群集的所有笔记本均可见的类,请在包单元格中定义该类。To define a class that is visible to all notebooks attached to the same cluster , define the class in a package cell. 然后,可以使用完全限定的名称来访问该类,这与访问附加的 Scala 或 Java 库中的类是相同的。Then you can access the class by using its fully qualified name, which is the same as accessing a class in an attached Scala or Java library.

Spark 会话隔离 Spark session isolation

附加到运行 Apache Spark 2.0.0 及更高版本的群集的每个笔记本都有一个称为 spark预定义变量,该变量表示 SparkSessionEvery notebook attached to a cluster running Apache Spark 2.0.0 and above has a pre-defined variable called spark that represents a SparkSession. SparkSession 是使用 Spark API 以及设置运行时配置的入口点。SparkSession is the entry point for using Spark APIs as well as setting runtime configurations.

默认情况下已启用 Spark 会话隔离。Spark session isolation is enabled by default. 你还可以使用全局临时视图跨笔记本共享临时视图。You can also use global temporary views to share temporary views across notebooks. 请参阅创建视图See Create View. 若要禁用 Spark 会话隔离,请在 Spark 配置中将 spark.databricks.session.share 设置为 trueTo disable Spark session isolation, set spark.databricks.session.share to true in the Spark configuration.

重要

spark.databricks.session.share 设置为 true 会中断流式处理笔记本单元格和流式处理作业所使用的监视。Setting spark.databricks.session.share true breaks the monitoring used by both streaming notebook cells and streaming jobs. 具体而言:Specifically:

  • 不会显示流式处理单元格中的The graphs in streaming cells are not displayed.
  • 只要流在运行,就不会阻止作业(它们在“成功地”完成后就会停止流)。Jobs do not block as long as a stream is running (they just finish “successfully”, stopping the stream).
  • 系统不会监视作业中的流是否已终止,Streams in jobs are not monitored for termination. 而必须由你来手动调用 awaitTermination()Instead you must manually call awaitTermination().
  • 对流式处理数据帧调用显示函数将不起作用。Calling the display function on streaming DataFrames doesn’t work.

使用其他语言来触发命令的单元格(即,使用 %scala%python%r%sql 的单元格)以及包含其他笔记本的单元格(即使用 %run 的单元格)都是当前笔记本的一部分。Cells that trigger commands in other languages (that is, cells using %scala, %python, %r, and %sql) and cells that include other notebooks (that is, cells using %run) are part of the current notebook. 因此,这些单元格与其他笔记本单元格位于相同的会话中。Thus, these cells are in the same session as other notebook cells. 相比之下,笔记本工作流使用独立的 SparkSession 来运行笔记本,这意味着在此类笔记本中定义的临时视图在其他笔记本中不可见。By contrast, a notebook workflow runs a notebook with an isolated SparkSession, which means temporary views defined in such a notebook are not visible in other notebooks.

版本控制Version control

Azure Databricks 为笔记本提供了基本版本控制功能。Azure Databricks has basic version control for notebooks. 你可以执行以下有关修订的操作:添加注释、还原和删除修订,以及清除修订历史记录。You can perform the following actions on revisions: add comments, restore and delete revisions, and clear revision history.

若要访问笔记本修订版本,请单击笔记本工具栏右上方的“修订历史记录”。To access notebook revisions, click Revision History at the top right of the notebook toolbar.

本节内容:In this section:

添加注释Add a comment

若要将注释添加到最新修订版本,请执行以下操作:To add a comment to the latest revision:

  1. 单击修订版本。Click the revision.

  2. 单击“立即保存”链接。Click the Save now link.

    保存注释Save comment

  3. 在“保存笔记本修订版本”对话框中,输入注释。In the Save Notebook Revision dialog, enter a comment.

  4. 单击“保存” 。Click Save. 笔记本修订版本将连同输入的注释一起保存。The notebook revision is saved with the entered comment.

还原修订版本Restore a revision

若要还原某个修订版本,请执行以下操作:To restore a revision:

  1. 单击修订版本。Click the revision.

  2. 单击“还原此修订版本”。Click Restore this revision.

    还原修订版本Restore revision

  3. 单击“确认” 。Click Confirm. 所选修订版本将成为笔记本的最新版本。The selected revision becomes the latest revision of the notebook.

删除修订版本Delete a revision

若要删除笔记本的修订条目,请执行以下操作:To delete a notebook’s revision entry:

  1. 单击修订版本。Click the revision.

  2. 单击回收站图标Click the trash icon 回收站..

    删除修订版本Delete revision

  3. 单击“是,擦除”。Click Yes, erase. 将从笔记本的修订历史记录中删除所选的修订版本。The selected revision is deleted from the notebook’s revision history.

清除修订历史记录Clear a revision history

若要清除笔记本的修订历史记录,请执行以下操作:To clear a notebook’s revision history:

  1. 选择“文件”>“清除修订历史记录”。Select File > Clear Revision History.

  2. 单击“是,清除”。Click Yes, clear. 此时会清除笔记本修订历史记录。The notebook revision history is cleared.

    警告

    清除后,修订历史记录将无法恢复。Once cleared, the revision history is not recoverable.

Git 版本控制Git version control

Azure Databricks 还集成了以下基于 Git 的版本控制工具:Azure Databricks also integrates with these Git-based version control tools: