快速入门:设置适用于 Linux (Ubuntu) 的 Data Science Virtual MachineQuickstart: Set up the Data Science Virtual Machine for Linux (Ubuntu)

启动并运行 Ubuntu 18.04 Data Science Virtual Machine。Get up and running with the Ubuntu 18.04 Data Science Virtual Machine.

先决条件Prerequisites

要创建 Ubuntu 18.04 Data Science Virtual Machine,必须具备 Azure 订阅。To create an Ubuntu 18.04 Data Science Virtual Machine, you must have an Azure subscription. 免费试用 AzureTry Azure for free. 请注意,Azure 试用帐户不支持已启用 GPU 的虚拟机 SKU。Please note Azure trial accounts do not support GPU enabled virtual machine SKUs.

创建适用于 Linux 的数据科学虚拟机Create your Data Science Virtual Machine for Linux

以下步骤用于创建 Data Science Virtual Machine Ubuntu 18.04 的实例:Here are the steps to create an instance of the Data Science Virtual Machine Ubuntu 18.04:

  1. 转到 Azure 门户 如果你尚未登录到 Azure 帐户,系统可能会提示你登录。Go to the Azure portal You might be prompted to sign in to your Azure account if you're not already signed in.

  2. 通过键入“data Science virtual machine”并选择“Data Science Virtual Machine - Ubuntu 18.04”来查找虚拟机列表。Find the virtual machine listing by typing in "data science virtual machine" and selecting "Data Science Virtual Machine- Ubuntu 18.04"

  3. 在下一个窗口上选择“创建”。On the next window, select Create.

  4. 你应该被重定向到“创建虚拟机”边栏选项卡。You should be redirected to the "Create a virtual machine" blade.

  5. 输入以下信息以配置向导的每个步骤:Enter the following information to configure each step of the wizard:

    1. 基本信息Basics:

      • 订阅:如果有多个订阅,请选择要在其上创建虚拟机并对其计费的订阅。Subscription: If you have more than one subscription, select the one on which the machine will be created and billed. 必须具有此订阅的资源创建权限。You must have resource creation privileges for this subscription.

      • 资源组:新建资源组,或使用现有资源组。Resource group: Create a new group or use an existing one.

      • 虚拟机名称:输入虚拟机的名称。Virtual machine name: Enter the name of the virtual machine. 此名称在你的 Azure 门户中使用。This name will be used in your Azure portal.

      • 区域:选择最合适的数据中心。Region: Select the datacenter that's most appropriate. 为获得最快的网络访问速度,请选择包含大部分数据,或者最接近实际位置的数据中心。For fastest network access, it's the datacenter that has most of your data or is closest to your physical location. 详细了解 Azure 区域Learn more about Azure Regions.

      • 映像:保留默认值。Image: Leave the default value.

      • 大小:此选项应自动填充为适合一般工作负荷的大小。Size: This option should autopopulate with a size that is appropriate for general workloads. 详细了解 Azure 中的 Linux VM 大小Read more about Linux VM sizes in Azure.

      • 身份验证类型:为了更快地设置,请选择“密码”。Authentication type: For quicker setup, select "Password."

        备注

        如果打算使用 JupyterHub,请确保选择“密码”,因为 JupyterHub 未配置为使用 SSH 公钥。If you intend to use JupyterHub, make sure to select "Password," as JupyterHub is not configured to use SSH public keys.

      • 用户名:输入管理员用户名。Username: Enter the administrator username. 你使用此用户名登录你的虚拟机。You'll use this username to log into your virtual machine. 此用户名无需与 Azure 用户名相同。This username need not be the same as your Azure username. 不要使用大写字母。Do not use capitalized letters.

        重要

        如果在用户名中使用大写字母,JupyterHub 将无法正常工作,并且你将遇到 500 内部服务器错误。If you use capitalized letters in your username, JupyterHub will not work, and you'll encounter a 500 internal server error.

      • 密码:输入将用于登录到虚拟机的密码。Password: Enter the password you'll use to log into your virtual machine.

    2. 选择“查看 + 创建”。Select Review + create.

    3. 查看 + 创建Review+create

      • 验证输入的所有信息是否正确。Verify that all the information you entered is correct.
      • 选择“创建” 。Select Create.

    预配大约需要 5 分钟。The provisioning should take about 5 minutes. 状态将显示在 Azure 门户中。The status is displayed in the Azure portal.

如何访问 Ubuntu Data Science Virtual MachineHow to access the Ubuntu Data Science Virtual Machine

可以通过以下三种方式之一访问 Ubuntu DSVM:You can access the Ubuntu DSVM in one of three ways:

  • 终端会话 SSHSSH for terminal sessions
  • 图形会话 X2GoX2Go for graphical sessions
  • Jupyter 笔记本的 JupyterHub 和 JupyterLabJupyterHub and JupyterLab for Jupyter notebooks

还可以将 Data Science Virtual Machine 附加到 Azure Notebooks,以在 VM 上运行 Jupyter Notebook,并绕过免费服务层的限制。You can also attach a Data Science Virtual Machine to Azure Notebooks to run Jupyter notebooks on the VM and bypass the limitations of the free service tier.

SSHSSH

如果使用 SSH 身份验证配置了 VM,可以使用在步骤 3 的“基本信息”部分中为文本 shell 接口创建的帐户凭据进行登录。If you configured your VM with SSH authentication, you can logon using the account credentials that you created in the Basics section of step 3 for the text shell interface. 可在 Windows 上下载 PuTTY 之类的 SSH 客户端工具。On Windows, you can download an SSH client tool like PuTTY. 如果你偏好图形桌面(X Window 系统),可以在 PuTTY 上使用 X11 转发。If you prefer a graphical desktop (X Window System), you can use X11 forwarding on PuTTY.

备注

在测试方面,X2Go 客户端的性能优于 X11 转发。The X2Go client performed better than X11 forwarding in testing. 建议对图形桌面界面使用 X2Go 客户端。We recommend using the X2Go client for a graphical desktop interface.

X2GoX2Go

Linux VM 已通过 X2Go 服务器进行预配并且可接受客户端连接。The Linux VM is already provisioned with X2Go Server and ready to accept client connections. 若要连接到 Linux VM 图形桌面,请在客户端上完成以下过程:To connect to the Linux VM graphical desktop, complete the following procedure on your client:

  1. X2Go 为客户端平台下载并安装 X2Go 客户端。Download and install the X2Go client for your client platform from X2Go.

  2. 请记下虚拟机的公共 IP 地址,可以通过在 Azure 门户中打开创建的虚拟机找到该地址。Make note of the virtual machine's public IP address, which you can find in the Azure portal by opening the virtual machine you created.

    Ubuntu 虚拟机 IP 地址

  3. 运行 X2Go 客户端。Run the X2Go client. 如果“新建会话”窗口未自动弹出,请转到“会话”->“新建会话”。If the "New Session" window doesn't pop up automatically, go to Session -> New Session.

  4. 在显示的配置窗口中,输入以下配置参数:On the resulting configuration window, enter the following configuration parameters:

    • 会话选项卡Session tab:
      • 主机:输入之前记下的 VM 的 IP 地址。Host: Enter the IP address of your VM, which you made note of earlier.
      • 登录名:输入 Linux VM 上的用户名。Login: Enter the username on the Linux VM.
      • SSH 端口:保留默认值 22。SSH Port: Leave it at 22, the default value.
      • 会话类型:将值更改为“XFCE”。Session Type: Change the value to XFCE. Linux VM 目前仅支持 XFCE 桌面。Currently, the Linux VM supports only the XFCE desktop.
    • 媒体选项卡:如果无需使用声音支持和客户端打印功能,可将其关闭。Media tab: You can turn off sound support and client printing if you don't need to use them.
    • 共享文件夹:使用此选项卡添加要装载到 VM 上的客户端计算机目录。Shared folders: Use this tab to add client machine directory that you would like to mount on the VM.

    X2go 配置

  5. 选择“确定” 。Select OK.

  6. 单击 X2Go 窗口右窗格中的框以调出 VM 的登录屏幕。Click on the box in the right pane of the X2Go window to bring up the log-in screen for your VM.

  7. 输入 VM 的密码。Enter the password for your VM.

  8. 选择“确定” 。Select OK.

  9. 可能必须授予 X2Go 绕过防火墙的权限才能完成连接。You may have to give X2Go permission to bypass your firewall to finish connecting.

  10. 现在应会看到 Ubuntu DSVM 的图形界面。You should now see the graphical interface for your Ubuntu DSVM.

JupyterHub 和 JupyterLabJupyterHub and JupyterLab

Ubuntu DSVM 运行 JupyterHub,一个多用户 Jupyter 服务器。The Ubuntu DSVM runs JupyterHub, a multiuser Jupyter server. 若要连接,请执行以下步骤:To connect, take the following steps:

  1. 通过在 Azure 门户中搜索并选择 VM,记下 VM 的公共 IP 地址。Make note of the public IP address for your VM, by searching for and selecting your VM in the Azure portal. Ubuntu 计算机 IP 地址Ubuntu machine IP address

  2. 在本地计算机上,打开 Web 浏览器,然后导航到 https://your-vm-ip:8000,将“your-vm-ip”替换为之前记下的 IP 地址。From your local machine, open a web browser and navigate to https://your-vm-ip:8000, replacing "your-vm-ip" with the IP address you took note of earlier.

  3. 浏览器可能会阻止你直接打开页面,并告知你存在证书错误。Your browser will probably prevent you from opening the page directly, telling you that there is a certificate error. DSVM 通过自签名证书提供安全性。The DSVM is providing security via a self-signed certificate. 大多数浏览器都允许你在此警告后单击浏览余下内容。Most browsers will allow you to click through after this warning. 许多浏览器会继续在整个 Web 会话中提供有关证书的某种视觉警告。Many browsers will continue to provide some kind of visual warning about the certificate throughout your Web session.

  4. 输入用于创建 VM 的用户名和密码,然后登录。Enter the username and password that you used to create the VM, and sign in.

    输入 Jupyter 登录名

备注

如果在此阶段收到 500 错误,很可能是因为在用户名中使用了大写字母。If you receive a 500 Error at this stage, it is likely that you used capitalized letters in your username. 这是 Jupyter 中心与其使用的 PAMAuthenticator 之间的一种已知交互。This is a known interaction between Jupyter Hub and the PAMAuthenticator it uses.

  1. 浏览许多可用的示例笔记本。Browse the many sample notebooks that are available.

也会提供 JupyterLab(下一代的 Jupyter 笔记本和 JupyterHub)。JupyterLab, the next generation of Jupyter notebooks and JupyterHub, is also available. 若要访问它,请登录到 JupyterHub,然后浏览到 URL https://your-vm-ip:8000/user/your-username/lab,将“your-username”替换为在配置 VM 时选择的用户名。To access it, sign in to JupyterHub, and then browse to the URL https://your-vm-ip:8000/user/your-username/lab, replacing "your-username" with the username you chose when configuring the VM. 同样,由于证书错误,系统一开始可能就会阻止你访问站点。Again, you may be initially blocked from accessing the site because of a certificate error.

可以通过将此行添加到 /etc/jupyterhub/jupyterhub_config.py,将 JupyterLab 设置为默认 Notebook 服务器:You can set JupyterLab as the default notebook server by adding this line to /etc/jupyterhub/jupyterhub_config.py:

c.Spawner.default_url = '/lab'

后续步骤Next steps

以下是继续学习和探索的方法:Here's how you can continue your learning and exploration:

  • 适用于 Linux 的 Data Science Virtual Machine 上的数据科学演练演示了如何使用此处预配的 Linux DSVM 执行多种常见的数据科学任务。The Data science on the Data Science Virtual Machine for Linux walkthrough shows you how to do several common data science tasks with the Linux DSVM provisioned here.
  • 请在 DSVM 上尝试探索本文中所述的各种数据科学工具。Explore the various data science tools on the DSVM by trying out the tools described in this article. 还可以在虚拟机上的 shell 中运行 dsvm-more-info,获取有关 VM 上安装的工具的基本介绍和信息指南。You can also run dsvm-more-info on the shell within the virtual machine for a basic introduction and pointers to more information about the tools installed on the VM.
  • 了解如何使用 Team Data Science Process 系统地生成分析解决方案。Learn how to systematically build analytical solutions using the Team Data Science Process.
  • 访问 Azure AI 库,获取使用 Azure AI 服务的机器学习和数据分析示例。Visit the Azure AI Gallery for machine learning and data analytics samples that use the Azure AI services.
  • 请参阅此虚拟机的相应参考文档Consult the appropriate reference documentation for this virtual machine.