使用 Azure Cosmos DB 为地理空间数据编制索引Index geospatial data with Azure Cosmos DB

适用于: SQL API

我们将 Azure Cosmos DB 的数据库引擎设计为真正与架构无关并为 JSON 提供一流的支持。We designed Azure Cosmos DB's database engine to be truly schema agnostic and provide first class support for JSON. Azure Cosmos DB 的写入优化数据库引擎原生了解以 GeoJSON 标准表示的空间数据。The write optimized database engine of Azure Cosmos DB natively understands spatial data represented in the GeoJSON standard.

简单来说,测地坐标的几何图形会投影在 2D 平面上,并使用 四叉树 以渐进方式划分成单元格。In a nutshell, the geometry is projected from geodetic coordinates onto a 2D plane then divided progressively into cells using a quadtree. 这些单元格会根据 Hilbert 空间填充曲线 内的单元格位置映射到 1D,并保留点的位置。These cells are mapped to 1D based on the location of the cell within a Hilbert space filling curve, which preserves locality of points. 此外,当位置数据进行索引编制后,会经历称为 分割 的过程,也就是说,在某个位置上相交的所有单元格都会被识别为键并存储在 Azure Cosmos DB 索引中。Additionally when location data is indexed, it goes through a process known as tessellation, that is, all the cells that intersect a location are identified and stored as keys in the Azure Cosmos DB index. 在查询时,点和多边形等参数也会经过分割,以提取相关的格子 ID 范围,并用于从索引检索数据。At query time, arguments like points and Polygons are also tessellated to extract the relevant cell ID ranges, then used to retrieve data from the index.

如果指定的索引策略包含“/*”(所有路径)的空间索引,则会为容器中找到的所有数据编制索引,以实现高效的空间查询。If you specify an indexing policy that includes spatial index for /* (all paths), then all data found within the container is indexed for efficient spatial queries.

备注

Azure Cosmos DB 支持为 Point、LineString、Polygon 和 MultiPolygon 编制索引Azure Cosmos DB supports indexing of Points, LineStrings, Polygons, and MultiPolygons

修改地理空间数据类型Modifying geospatial data type

在容器中,“地理空间配置”指定如何为空间数据编制索引。In your container, the Geospatial Configuration specifies how the spatial data will be indexed. 为每个容器指定一个地理空间配置:“地理”或“几何”。Specify one Geospatial Configuration per container: geography or geometry.

在 Azure 门户中,可以在“地理”和“几何”空间类型之间进行切换。You can toggle between the geography and geometry spatial type in the Azure portal. 在切换到几何空间类型之前,请务必使用边界框创建一个有效的空间几何索引策略It's important that you create a valid spatial geometry indexing policy with a bounding box before switching to the geometry spatial type.

下面介绍如何在 Azure 门户的“数据资源管理器”中设置“地理空间配置”:Here's how to set the Geospatial Configuration in Data Explorer within the Azure portal:

设置地理空间配置

还可通过修改 .NET SDK 中的 geospatialConfig 来调整“地理空间配置”:You can also modify the geospatialConfig in the .NET SDK to adjust the Geospatial Configuration:

如果未指定,则 geospatialConfig 将默认为 geography 数据类型。If not specified, the geospatialConfig will default to the geography data type. 修改 geospatialConfig 时,容器中所有现有地理空间数据都将重新编制索引。When you modify the geospatialConfig, all existing geospatial data in the container will be reindexed.

下面是一个示例,它通过设置 geospatialConfig 属性并添加一个 boundingBox,将地理空间数据类型修改为 geometryHere is an example for modifying the geospatial data type to geometry by setting the geospatialConfig property and adding a boundingBox:

    //Retrieve the container's details
    ContainerResponse containerResponse = await client.GetContainer("db", "spatial").ReadContainerAsync();
    //Set GeospatialConfig to Geometry
    GeospatialConfig geospatialConfig = new GeospatialConfig(GeospatialType.Geometry);
    containerResponse.Resource.GeospatialConfig = geospatialConfig;
    // Add a spatial index including the required boundingBox
    SpatialPath spatialPath = new SpatialPath
            {  
                Path = "/locations/*",
                BoundingBox = new BoundingBoxProperties(){
                    Xmin = 0,
                    Ymin = 0,
                    Xmax = 10,
                    Ymax = 10
                }
            };
    spatialPath.SpatialTypes.Add(SpatialType.Point);
    spatialPath.SpatialTypes.Add(SpatialType.LineString);
    spatialPath.SpatialTypes.Add(SpatialType.Polygon);
    spatialPath.SpatialTypes.Add(SpatialType.MultiPolygon);

    containerResponse.Resource.IndexingPolicy.SpatialIndexes.Add(spatialPath);

    // Update container with changes
    await client.GetContainer("db", "spatial").ReplaceContainerAsync(containerResponse.Resource);

地理数据索引示例Geography data indexing examples

以下 JSON 代码片段显示了为 地理 数据类型启用空间索引的索引策略。The following JSON snippet shows an indexing policy with spatial indexing enabled for the geography data type. 它适用于地理数据类型的空间数据,并将为文档中找到的任何 GeoJSON 点、多边形、MultiPolygon 或 LineString 编制索引以便进行空间查询。It is valid for spatial data with the geography data type and will index any GeoJSON Point, Polygon, MultiPolygon, or LineString found within documents for spatial querying. 如果要使用 Azure 门户修改索引策略,可以为索引策略指定以下 JSON,以便对容器启用空间索引:If you are modifying the indexing policy using the Azure portal, you can specify the following JSON for indexing policy to enable spatial indexing on your container:

具有地理空间索引功能的容器索引策略 JSONContainer indexing policy JSON with geography spatial indexing

{
    "automatic": true,
    "indexingMode": "Consistent",
    "includedPaths": [
        {
            "path": "/*"
        }
    ],
    "spatialIndexes": [
        {
            "path": "/*",
            "types": [
                "Point",
                "Polygon",
                "MultiPolygon",
                "LineString"
            ]
        }
    ],
    "excludedPaths": []
}

备注

如果文档中的 GeoJSON 位置值格式不正确或无效,则不会为其编制索引以用于空间查询。If the location GeoJSON value within the document is malformed or invalid, then it will not get indexed for spatial querying. 可以使用 ST_ISVALID 和 ST_ISVALIDDETAILED 验证位置值。You can validate location values using ST_ISVALID and ST_ISVALIDDETAILED.

还可以使用 Azure CLI、PowerShell 或任何 SDK 修改索引策略You can also modify indexing policy using the Azure CLI, PowerShell, or any SDK.

地理数据索引编制示例Geometry data indexing examples

使用 geometry 数据类型时,与 geography 数据类型一样,必须指定用于编制索引的相关路径和类型。With the geometry data type, similar to the geography data type, you must specify relevant paths and types to index. 此外,还必须在索引编制策略中指定 boundingBox,以指示需要针对该特定路径进行索引编制的区域。In addition, you must also specify a boundingBox within the indexing policy to indicate the desired area to be indexed for that specific path. 每个地理空间路径都需要其自己的 boundingBoxEach geospatial path requires its ownboundingBox.

边界框包括以下属性:The bounding box consists of the following properties:

  • xmin:编制索引的最小 x 坐标xmin: the minimum indexed x coordinate
  • ymin:编制索引的最小 y 坐标ymin: the minimum indexed y coordinate
  • xmax:编制索引的最大 x 坐标xmax: the maximum indexed x coordinate
  • ymax:编制索引的最大 y 坐标ymax: the maximum indexed y coordinate

边界框是必需的,因为几何图形数据占有的平面可以是无限的。A bounding box is required because geometric data occupies a plane that can be infinite. 但是,空间索引需要有限空间。Spatial indexes, however, require a finite space. 对于 geography 数据类型,地球是边界,你无需设置边界框。For the geography data type, the Earth is the boundary and you do not need to set a bounding box.

创建包含你的所有(或大部分)数据的边界框。Create a bounding box that contains all (or most) of your data. 只有针对完全位于边界框内的对象进行计算的运算才能使用空间索引。Only operations computed on the objects that are entirely inside the bounding box will be able to utilize the spatial index. 使边界框大于需要会对查询性能产生负面影响。Making the bounding box larger than necessary will negatively impact query performance.

以下索引编制策略示例在 geospatialConfig 设置为 geometry 的情况下为 geometry 数据编制索引:Here is an example indexing policy that indexes geometry data with geospatialConfig set to geometry:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*"
        }
    ],
    "excludedPaths": [
        {
            "path": "/\"_etag\"/?"
        }
    ],
    "spatialIndexes": [
        {
            "path": "/locations/*",
            "types": [
                "Point",
                "LineString",
                "Polygon",
                "MultiPolygon"
            ],
            "boundingBox": {
                "xmin": -10,
                "ymin": -20,
                "xmax": 10,
                "ymax": 20
            }
        }
    ]
}

上面的索引编制策略有一个 boundingBox,其 x 坐标为 (-10, 10),y 坐标为 (-20, 20)。The above indexing policy has a boundingBox of (-10, 10) for x coordinates and (-20, 20) for y coordinates. 具有上述索引编制策略的容器将为完全在此区域内的所有 Point、Polygon、MultiPolygon 和 LineString 编制索引。The container with the above indexing policy will index all Points, Polygons, MultiPolygons, and LineStrings that are entirely within this region.

备注

如果尝试将具有 boundingBox 的索引编制策略添加到具有 geography 数据类型的容器中,则会失败。If you try to add an indexing policy with a boundingBox to a container with geography data type, it will fail. 在添加 boundingBox 之前,应将容器的 geospatialConfig 修改为 geometryYou should modify the container's geospatialConfig to be geometry before adding a boundingBox. 你可以添加数据并修改索引编制策略的其余部分(例如路径和类型),这可以在为容器选择地理空间数据类型之前或之后进行。You can add data and modify the remainder of your indexing policy (such as the paths and types) either before or after selecting the geospatial data type for the container.

后续步骤Next steps

已经学会如何开始使用 Azure Cosmos DB 中的地理空间支持,下一步现在可以:Now that you have learned how to get started with geospatial support in Azure Cosmos DB, next you can: