H3 快速入门 (Databricks SQL)

本页上的 H3 地理空间函数快速入门演示了以下内容:

  • 如何将地理位置数据集加载到 Unity Catalog 中。
  • 如何将纬度和经度列转换为 H3 单元格列。
  • 如何将邮政编码多边形或多多边形 WKT 列转换为 H3 单元格列。
  • 如何查询从拉瓜迪亚机场到曼哈顿金融区的上车和下车分析。
  • 如何在地图上呈现 H3 聚合计数。

示例笔记本和查询

准备 Unity Catalog 数据

在此笔记本中,我们执行以下操作:

  • 从 Databricks 文件系统设置公共出租车数据集。
  • 设置 NYC 邮政编码数据集。

准备 Unity Catalog 数据

获取笔记本

Databricks SQL 查询与 Databricks Runtime 11.3 LTS 及更高版本

查询 1:验证是否已设置基础数据。 请参阅笔记本

use catalog geospatial_docs;
use database nyc_taxi;
show tables;
-- Verify initial data is setup (see instructions in setup notebook)
-- select format_number(count(*),0) as count from yellow_trip;
-- select * from nyc_zipcode;

查询 2:H3 NYC 邮政编码 - 在分辨率 12 处应用 h3_polyfillash3

use catalog geospatial_docs;
use database nyc_taxi;
-- drop table if exists nyc_zipcode_h3_12;
create table if not exists nyc_zipcode_h3_12 as (
  select
    explode(h3_polyfillash3(geom_wkt, 12)) as cell,
    zipcode,
    po_name,
    county
  from
    nyc_zipcode
);
-- optional: zorder by `cell`
optimize nyc_zipcode_h3_12 zorder by (cell);
select
  *
from
  nyc_zipcode_h3_12;

查询 3:H3 出租车行程 - 在分辨率 12 处应用 h3_longlatash3

use catalog geospatial_docs;
use database nyc_taxi;
-- drop table if exists yellow_trip_h3_12;
create table if not exists yellow_trip_h3_12 as (
  select
    h3_longlatash3(pickup_longitude, pickup_latitude, 12) as pickup_cell,
    h3_longlatash3(dropoff_longitude, dropoff_latitude, 12) as dropoff_cell,
    *
  except
    (
      rate_code_id,
      store_and_fwd_flag
    )
  from
    yellow_trip
);
-- optional: zorder by `pickup_cell`
-- optimize yellow_trip_h3_12 zorder by (pickup_cell);
select
  *
from
  yellow_trip_h3_12
 where pickup_cell is not null;

查询 4:H3 LGA 上车人次 - 从拉瓜迪亚 (LGA) 上车 2500 万人次

use catalog geospatial_docs;
use database nyc_taxi;
create
or replace view lga_pickup_h3_12 as (
  select
    t.*
  except(cell),
    s.*
  from
    yellow_trip_h3_12 as s
    inner join nyc_zipcode_h3_12 as t on s.pickup_cell = t.cell
  where
    t.zipcode = '11371'
);
select
  format_number(count(*), 0) as count
from
  lga_pickup_h3_12;
-- select
  -- *
  -- from
  -- lga_pickup_h3_12;

查询 5:H3 金融区下车人次 - 金融区的总下车人次为 3400 万

use catalog geospatial_docs;
use database nyc_taxi;
create
or replace view fd_dropoff_h3_12 as (
  select
    t.*
  except(cell),
    s.*
  from
    yellow_trip_h3_12 as s
    inner join nyc_zipcode_h3_12 as t on s.dropoff_cell = t.cell
  where
    t.zipcode in ('10004', '10005', '10006', '10007', '10038')
);
select
  format_number(count(*), 0) as count
from
  fd_dropoff_h3_12;
-- select * from fd_dropoff_h3_12;

查询 6:H3 LGA-FD - 金融区中从 LGA 上车的下车人次为 82.7 万

use catalog geospatial_docs;
use database nyc_taxi;
create
or replace view lga_fd_dropoff_h3_12 as (
  select
    *
  from
    fd_dropoff_h3_12
  where
    pickup_cell in (
      select
        distinct pickup_cell
      from
        lga_pickup_h3_12
    )
);
select
  format_number(count(*), 0) as count
from
  lga_fd_dropoff_h3_12;
-- select * from lga_fd_dropoff_h3_12;

查询 7:按邮政编码计算的 LGA-FD - 按邮政编码和条形图计算金融区下车人次

use catalog geospatial_docs;
use database nyc_taxi;
select
  zipcode,
  count(*) as count
from
  lga_fd_dropoff_h3_12
group by
  zipcode
order by
  zipcode;

查询 8:按 H3 计算的 LGA-FD - 按 H3 单元格和地图标记可视化效果计算 FD 下车人次

use catalog geospatial_docs;
use database nyc_taxi;
select
  zipcode,
  dropoff_cell,
  h3_centerasgeojson(dropoff_cell) :coordinates [0] as dropoff_centroid_x,
  h3_centerasgeojson(dropoff_cell) :coordinates [1] as dropoff_centroid_y,
  format_number(count(*), 0) as count_disp,
  count(*) as `count`
from
  lga_fd_dropoff_h3_12
group by
  zipcode,
  dropoff_cell
order by
  zipcode,
  `count` DESC;

LGA-FD H3 Counts 1

LGA-FD H3 Counts 2

Databricks Runtime 11.2 及更高版本的笔记本

快速入门-Python:H3 NYC 出租车(拉瓜迪亚到曼哈顿)

获取笔记本

与 Databricks SQL 中的快速入门结构相同的快速入门结构,在 Notebooks + kepler.gl 中使用 Spark Python 绑定。

快速入门-Scala:H3 NYC 出租车(拉瓜迪亚到曼哈顿)

获取笔记本

与 Databricks SQL 中的快速入门结构相同的快速入门结构,通过 Python 单元格在 Notebooks + kepler.gl 中使用 Spark Scala 绑定。

快速入门-SQL:H3 NYC 出租车(拉瓜迪亚到曼哈顿)

获取笔记本

与 Databricks SQL 中的快速入门结构相同的快速入门结构,通过 Python 单元格在 Notebooks + kepler.gl 中使用 Spark SQL 绑定。