GitHub 版本控制GitHub version control

本文介绍如何通过 UI 使用 GitHub 为笔记本设置版本控制。This article describes how to set up version control for notebooks using GitHub through the UI. 本文档介绍如何通过 UI 设置 GitHub 集成,但你也可以使用 Databricks CLI工作区 API 来导入和导出笔记本,并使用 GitHub 工具管理笔记本版本。Although this document describes how to set up GitHub integration through the UI, you can also use the Databricks CLI or Workspace API to import and export notebooks and manage notebook versions using GitHub tools.

启用和禁用 Git 版本控制Enable and disable Git versioning

默认情况下启用版本控制。By default version control is enabled. 若要切换此设置,请参阅管理 Git 中的笔记本版本控制功能To toggle this setting, see Manage the ability to version notebooks in Git. 如果禁用 Git 版本控制,则“用户设置”屏幕中的“Git 集成”选项卡将不可用 。If Git versioning is disabled, the Git Integration tab is not available in the User Settings screen.

配置版本控制Configure version control

配置版本控制需要在版本控制提供程序中创建访问凭据,然后将其添加到 Azure Databricks。Configuring version control involves creating access credentials in your version control provider and adding those credentials to Azure Databricks.

获取访问令牌Get an access token

访问 GitHub 并创建允许访问存储库的个人访问令牌:Go to GitHub and create a personal access token that allows access to your repositories:

  1. 打开 GitHub 右上角 Gravitar 旁的菜单,然后选择“设置”。From GitHub, access the menu on the upper right, next to your Gravitar, and select Settings.

  2. 单击“开发人员设置”。Click Developer settings.

  3. 单击“个人访问令牌”选项卡。Click the Personal access tokens tab.

  4. 单击“生成新令牌”按钮。Click the Generate new token button.

  5. 输入令牌说明。Enter a token description.

  6. 选择“存储库”权限,然后单击“生成令牌”按钮 。Select the repo permission, and click the Generate token button.

    生成 GitHub 令牌Generate GitHub token

  7. 将令牌复制到剪贴板。Copy the token to your clipboard. 在下一步中,将此令牌输入 Azure Databricks。You enter this token in Azure Databricks in the next step.

若要详细了解如何创建个人访问令牌,请参阅 GitHub 文档See the GitHub documentation to learn more about how to create personal access tokens.

将访问令牌保存到 Azure DatabricksSave your access token to Azure Databricks

  1. 单击屏幕右上方的用户图标帐户图标,然后选择“用户设置”。Click the User icon Account Icon at the top right of your screen and select User Settings.

    AccountSettingsAccountSettings

  2. 单击“Git 集成”选项卡。Click the Git Integration tab.

  3. 如果之前输入过凭据,请单击“更改令牌或应用密码”按钮。If you have previously entered credentials, click the Change token or app password button.

  4. 在“Git 提供程序”下拉列表中,选择“GitHub”。In the Git provider drop-down, select GitHub.

    选择“GitHub”Git 提供程序Select GitHub Git provider

  5. 将令牌粘贴到“令牌或应用密码”字段,然后单击“保存” 。Paste your token into the Token or app password field and click Save.

使用笔记本修订版本 Work with notebook revisions

可以在“历史记录”面板中使用笔记本修订版本。You work with notebook revisions in the History panel. 单击笔记本右上角的“修订版本历史记录”,以打开“历史记录”面板。Open the history panel by clicking Revision history at the top right of the notebook.

修订版本历史记录Revision history

备注

“历史记录”面板处于打开状态时,无法修改笔记本。You cannot modify a notebook while the History panel is open.

  1. 打开“历史记录”面板。Open the History panel. Git 状态栏显示“Git:未链接”。The Git status bar displays Git: Not linked.

    Git 状态栏Git status bar

  2. 单击“Git:未链接”。Click Git: Not linked.

    此时将显示“Git 首选项”对话框。The Git Preferences dialog displays. 第一次打开笔记本时,“状态”为“未链接”,因为笔记本不在 GitHub 中。The first time you open your notebook, the Status is Unlink , because the notebook is not in GitHub.

    Git 首选项Git preferences

  3. 在“状态”字段中,单击“链接”。In the Status field, click Link.

  4. 在“链接”字段中,粘贴 GitHub 存储库的 URL。In the Link field, paste the URL of the GitHub repository.

  5. 单击“分支”下拉列表,然后选择分支或键入新分支的名称。Click the Branch drop-down and select a branch or type the name of a new branch.

  6. 在“Git 存储库中的路径”字段中,指定文件在存储库中的存储位置。In the Path in Git Repo field, specify where in the repository to store your file.

    Python 笔记本具有推荐的默认文件扩展名 .pyPython notebooks have the suggested default file extension .py. 如果使用 .ipynb,则笔记本将以 iPython 笔记本格式保存。If you use .ipynb, your notebook will save in iPython notebook format. 如果文件已存在于 GitHub 中,则可以直接复制并粘贴文件的 URL。If the file already exists on GitHub, you can directly copy and paste the URL of the file.

  7. 单击“保存”即可完成对笔记本的链接。Click Save to finish linking your notebook. 如果此文件之前不存在,则会显示一个提示,其中包含“将此文件保存到 GitHub 存储库”选项。If this file did not previously exist, a prompt with the option Save this file to your GitHub repo displays.

  8. 键入消息,然后单击“保存”。Type a message and click Save.

将笔记本保存到 GitHubSave a notebook to GitHub

尽管对笔记本所做的更改会自动保存到 Azure Databricks 修订版本历史记录,但更改不会自动保存到 GitHub。While the changes that you make to your notebook are saved automatically to the Azure Databricks revision history, changes do not automatically persist to GitHub.

  1. 打开“历史记录”面板。Open the History panel.

    “历史记录”面板History panel

  2. 单击“立即保存”,将笔记本保存到 GitHub。Click Save Now to save your notebook to GitHub. 此时将显示“保存笔记本修订版本”对话框。The Save Notebook Revision dialog displays.

  3. 根据需要,输入一条消息来描述所做的更改。Optionally, enter a message to describe your change.

  4. 确保选中“另提交到 Git”。Make sure that Also commit to Git is selected.

    保存修订版本Save revision

  5. 单击“保存” 。Click Save.

将笔记本还原或更新为 GitHub 中的版本Revert or update a notebook to a version from GitHub

链接笔记本后,每次重新打开“历史记录”面板时,Azure Databricks 都会将历史记录与 Git 同步。Once you link a notebook, Azure Databricks syncs your history with Git every time you re-open the History panel. 同步到 Git 的版本将提交哈希作为条目的一部分。Versions that sync to Git have commit hashes as part of the entry.

  1. 打开“历史记录”面板。Open the History panel.

    “历史记录”面板History panel

  2. 在“历史记录”面板中选择一个条目。Choose an entry in the History panel. Azure Databricks 会显示该版本。Azure Databricks displays that version.

  3. 单击“还原此版本”。Click Restore this version.

  4. 单击“确认”,以确认想要还原该版本。Click Confirm to confirm that you want to restore that version.

  1. 打开“历史记录”面板。Open the History panel.

  2. Git 状态栏显示“Git:已同步”。The Git status bar displays Git: Synced.

    Git 状态Git status

  3. 单击“Git:已同步”。Click Git: Synced.

    Git 首选项Git preferences

  4. 在“Git 首选项”对话框中,单击“取消链接”。In the Git Preferences dialog, click Unlink.

  5. 单击“保存” 。Click Save.

  6. 单击“确认”,以确认想要通过版本控制取消链接笔记本。Click Confirm to confirm that you want to unlink the notebook from version control.

分支支持Branch support

可以使用存储库的任何分支,并在 Azure Databricks 内新建分支。You can work on any branch of your repository and create new branches inside Azure Databricks.

创建分支Create a branch

  1. 打开“历史记录”面板。Open History panel.

  2. 单击 Git 状态栏以打开 GitHub 面板。Click the Git status bar to open the GitHub panel.

  3. 单击“分支”下拉菜单。Click the Branch dropdown.

  4. 输入分支名。Enter a branch name.

    创建分支Create branch

  5. 选择下拉菜单底部的“创建分支”选项。Select the Create Branch option at the bottom of the dropdown. 父分支已指出。The parent branch is indicated. 始终从当前选定的分支处创建分支。You always branch from your current selected branch.

创建拉取请求Create a pull request

  1. 打开“历史记录”面板。Open History panel.

  2. 单击 Git 状态栏以打开 GitHub 面板。Click the Git status bar to open the GitHub panel.

    Git 状态Git status

  3. 单击“创建 PR”。Click Create PR. GitHub 打开分支的拉取请求页。GitHub opens to a pull request page for the branch.

分支变基Rebase a branch

你还可以在 Azure Databricks 内进行分支变基。You can also rebase your branch inside Azure Databricks. 如果父分支中有新的提交,则会显示“变基”链接。The Rebase link displays if new commits are available in the parent branch. 仅支持在父存储库的默认分支之上进行变基。Only rebasing on top of the default branch of the parent repository is supported.

变基Rebase

例如,假设你正在使用 databricks/reference-appsFor example, assume that you are working on databricks/reference-apps. 可以为其创建指向自己帐户的分支(例如 brkyvz),然后开始使用名为 my-branch 的分支。You fork it into your own account (for example, brkyvz) and start working on a branch called my-branch. 如果将新的更新推送到 databricks:master,则显示 Rebase 按钮,并且可以将更改拉取到分支 brkyvz:my-branch 中。If a new update is pushed to databricks:master, then the Rebase button displays, and you will be able to pull the changes into your branch brkyvz:my-branch.

在 Azure Databricks 中,变基的工作原理略有不同。Rebasing works a little differently in Azure Databricks. 假设分支结构如下:Assume the following branch structure:

变基前的分支结构Before rebase branch structure

变基后,分支结构将如下所示:After a rebase, the branch structure will look like:

变基后的分支结构After rebase branch structure

此时不同之处在于,提交 C5 和 C6 不在 C4 之上应用,What’s different here is that Commits C5 and C6 will not apply on top of C4. 而是显示为笔记本中的本地更改。They will appear as local changes in your notebook. 所有合并冲突将如下所示:Any merge conflict will show up as follows:

合并冲突Merge conflict

然后,可以使用“立即保存”按钮再次提交到 GitHub。You can then commit to GitHub once again using the Save Now button.

如果有人从我刚变基的分支处创建分支,会发生什么?What happens if someone branched off from my branch that I just rebased?

如果你的分支(例如 branch-a)是另一个分支 (branch-b) 的基,然后你进行了变基,则不必担心!If your branch (for example, branch-a) was the base for another branch (branch-b), and you rebase, you need not worry! 用户也对 branch-b 进行变基后,所有问题都会迎刃而解。在这种情况下,最好对不同的笔记本使用不同的分支。Once a user also rebases branch-b, everything will work out. The best practice in this situation is to use separate branches for separate notebooks.

代码评审的最佳做法Best practices for code reviews

Azure Databricks 支持 Git 分支。Azure Databricks supports Git branching.

  • 可以将笔记本链接到自己的分支,然后选择分支。You can link a notebook to your own fork and choose a branch.
  • 建议为每个笔记本使用单独的分支。We recommend using separate branches for each notebook.
  • 对更改满意后,可以点击“Git 首选项”对话框中的“创建 PR”链接,转到 GitHub 的拉取请求页面。Once you are happy with your changes, you can use the Create PR link in the Git Preferences dialog to take you to GitHub’s pull request page.
  • 仅在不使用父存储库的默认分支时,才会显示“创建 PR”链接。The Create PR link displays only if you’re not working on the default branch of the parent repository.

GitHub EnterpriseGitHub Enterprise

重要

不支持与 GitHub Enterprise Server 集成。Integration with GitHub Enterprise Server is not supported. 但可以使用工作区 API 在 GitHub Enterprise Server 中以编程方式创建笔记本并管理代码库。However, you can use the Workspace API to programmatically create notebooks and manage the code base in GitHub Enterprise Server.

疑难解答Troubleshooting

如果收到与同步 GitHub 历史记录有关的错误,请验证以下内容:If you receive errors related to syncing GitHub history, verify the following:

  1. 已初始化 GitHub 中的存储库,并且该存储库不为空。You have initialized the repository on GitHub, and it isn’t empty. 试用输入的 URL,并验证其是否会转到 GitHub 存储库。Try the URL that you entered and verify that it forwards to your GitHub repository.
  2. 个人访问令牌有效。Your personal access token is active.
  3. 如果存储库是专用的,则必须至少具有对存储库的读取级别权限(通过 GitHub)。If the repository is private, you must have at least read level permissions (through GitHub) on the repository.