快速入门:运行查询并将其可视化Quickstart: Run and visualize a query

本快速入门介绍如何基于 Databricks 数据集创建一个包含 1000 万人的表,查询该表来查找名为 Mary 的按出生年份分组的女性人数,并直观显示结果。This quickstart shows how to create a table of 10 million people from a Databricks dataset, query the table to find the number of women named Mary grouped by birth year, and visualize the result.

要求Requirements

SQL Analytics 管理员必须完成 SQL Analytics 管理员快速入门Your SQL Analytics admin must complete the SQL Analytics admin quickstart.

步骤 1:创建一个包含 1000 万人的表Step 1: Create a table of 10 million people

  1. 登录 SQL Analytics。Log in to SQL Analytics.

    如果既可访问 Azure Databricks 工作区又可访问 SQL Analytics,则可能需要从工作区切换到 SQL Analytics。If you have access to both Azure Databricks Workspace and SQL Analytics, you might need to switch from Workspace to SQL Analytics. 在边栏底部,单击At the bottom of the sidebar, click the 应用切换器图标 应用切换器图标,然后选择app switcher icon and select 应用切换器 - SQL Analytics

  2. 单击Click the 边栏中的 “模型”图标。icon in the sidebar.

  3. 单击“+新建查询”。Click + New Query.

    此时会显示查询编辑器。The query editor displays.

  4. 在“新建查询”下面的框中,单击 向下箭头图标 图标,然后选择“QS 终结点” 。In the box below New Query, click the Down Arrow  Icon icon and select QS Endpoint.

    查询编辑器Query editor

    第一次创建查询时,可用 SQL 终结点的列表按字母顺序显示。The first time you create a query the list of available SQL endpoints displays in alphabetical order. 后面创建查询时,将选择上次使用的终结点。The next time you create a query, the last used endpoint is selected.

  5. 在终结点下面的框中,单击 向下箭头图标 图标;如果未选中,请选择默认数据库。In the box below the endpoint, click the Down Arrow  Icon icon and if not selected, select the default database.

    默认数据库Default database

  6. 将以下内容粘贴到查询编辑器中:Paste the following in the query editor:

     CREATE TABLE default.people10m OPTIONS (PATH 'dbfs:/databricks-datasets/learning-spark-v2/people/people-10m.delta')
    

    此语句使用存储在 Databricks 数据集中的 Delta Lake 文件创建增量表。This statement creates a Delta table using a Delta Lake file stored in Databricks datasets.

  7. 按 Ctrl/Cmd + Enter 或单击“执行”按钮 。Press Ctrl/Cmd + Enter or click the Execute button. 此查询将返回 No data was returned.The query will return No data was returned.

  8. 若要刷新架构,请单击架构浏览器底部的To refresh the schema, click the 刷新架构 按钮。button at the bottom of the schema browser.

  9. 在数据库右侧的文本框中键入 peoType peo in the text box to the right of the database. 架构浏览器将显示新表。The schema browser displays the new table.

    架构浏览器Schema browser

步骤 2:查询该表Step 2: Query the table

  1. 粘贴查询名为 Mary 的女性人数的 SELECT 语句:Paste in a SELECT statement that queries the number of women named Mary:

     SELECT year(birthDate) as birthYear, count(*) AS total
     FROM default.people10m
     WHERE firstName = 'Mary' AND gender = 'F'
     GROUP BY birthYear
     ORDER BY birthYear
    
  2. 按 Ctrl/Cmd + Enter 或单击“执行”按钮 。Press Ctrl/Cmd + Enter or click the Execute button.

    执行查询Execute query

    “限制 1000”复选框已默认选中,以确保查询最多返回 1000 行。The Limit 1000 checkbox is selected by default to ensure that the query returns at most 1000 rows. 如果需要更多行,可取消选中此复选框,并在查询中指定 LIMIT 子句。If you want more rows, you can unselect the checkbox and specify a LIMIT clause in your query. 查询结果将显示在“表”选项卡中。The query result displays in the Table tab.

    查询结果Query result

步骤 3:创建可视化效果Step 3: Create a visualization

  1. 单击“+添加可视化效果”选项卡。Click the + Add Visualization tab.

    这会显示可视化效果编辑器。The visualization editor displays.

    可视化效果编辑器Visualization editor

  2. 在“X 列”下拉列表中,选择“出生年” 。In the X Column drop-down, select birthYear.

  3. 在“Y 列”下拉列表中,选择“总计” 。In the Y Column drop-down, select total.

  4. 单击“X 轴”选项卡。Click the X Axis tab.

  5. 在“名称”字段中,输入 Birth YearIn the Name field, enter Birth Year.

  6. 单击“Y 轴”选项卡。Click the Y Axis tab.

  7. 在“名称”字段中,输入 Number of MarysIn the Name field, enter Number of Marys.

  8. 单击“保存” 。Click Save.

    保存的图表显示在查询编辑器中。The saved chart displays in the query editor.

    Mary 人数图Marys chart