使用 Azure 认知搜索索引器连接到 Azure SQL 内容并为其编制索引Connect to and index Azure SQL content using an Azure Cognitive Search indexer

必须先使用数据填充 Azure 认知搜索索引,然后才能对其进行查询。Before you can query an Azure Cognitive Search index, you must populate it with your data. 如果数据驻留在 Azure SQL 数据库或 SQL 托管实例中,则 Azure SQL 数据库的 Azure 认知搜索索引器(简称“Azure SQL 索引器”)可自动执行索引编制过程,这意味着需要编写的代码更少且需要考虑的基础结构更少。If the data lives in Azure SQL Database or SQL Managed Instance, an Azure Cognitive Search indexer for Azure SQL Database (or Azure SQL indexer for short) can automate the indexing process, which means less code to write and less infrastructure to care about.

本文不但介绍了使用索引器的机制,而且还介绍了仅适用于 Azure SQL 数据库或 SQL 托管实例的功能(例如,集成的更改跟踪)。This article covers the mechanics of using indexers, but also describes features only available with Azure SQL Database or SQL Managed Instance (for example, integrated change tracking).

除了 Azure SQL 数据库和 SQL 托管实例之外,Azure 认知搜索还针对 Azure Cosmos DBAzure Blob 存储Azure 表存储提供了索引器。In addition to Azure SQL Database and SQL Managed Instance, Azure Cognitive Search provides indexers for Azure Cosmos DB, Azure Blob storage, and Azure table storage. 若要请求对其他数据源的支持,请在 Azure 认知搜索反馈论坛上提供反馈。To request support for other data sources, provide your feedback on the Azure Cognitive Search feedback forum.

索引器和数据源Indexers and data sources

数据源指定要编制索引的数据、用于访问数据的凭据和有效标识数据更改(新行、修改的行或删除的行)的策略。A data source specifies which data to index, credentials for data access, and policies that efficiently identify changes in the data (new, modified, or deleted rows). 它定义为独立的资源,以便可供多个索引器使用。It's defined as an independent resource so that it can be used by multiple indexers.

索引器是将单个数据源连接到目标搜索索引的资源。An indexer is a resource that connects a single data source with a targeted search index. 可通过以下方式使用索引器:An indexer is used in the following ways:

  • 执行数据的一次性复制以填充索引。Perform a one-time copy of the data to populate an index.
  • 按计划使用数据源中的更改更新索引。Update an index with changes in the data source on a schedule.
  • 按需运行以根据需要更新索引。Run on-demand to update an index as needed.

单个索引器只能使用一个表或视图,但是,如果希望填充多个搜索索引,可以创建多个索引器。A single indexer can only consume one table or view, but you can create multiple indexers if you want to populate multiple search indexes. 有关概念的详细信息,请参阅索引器操作:典型工作流For more information on concepts, see Indexer Operations: Typical workflow.

可使用以下内容设置和配置 Azure SQL 索引器:You can set up and configure an Azure SQL indexer using:

在本文中,我们将使用 REST API 创建索引器数据源In this article, we'll use the REST API to create indexers and data sources.

何时使用 Azure SQL 索引器When to use Azure SQL Indexer

根据与数据相关的多个因素,可能适合也可能不适合使用 Azure SQL 索引器。Depending on several factors relating to your data, the use of Azure SQL indexer may or may not be appropriate. 如果数据符合以下要求,可以使用 Azure SQL 索引器。If your data fits the following requirements, you can use Azure SQL indexer.

条件Criteria 详细信息Details
数据来自单个表或视图Data originates from a single table or view 如果数据分散在多个表中,可以创建数据的单一视图。If the data is scattered across multiple tables, you can create a single view of the data. 但是,如果使用视图,则无法使用 SQL Server 集成的更改检测来使用增量更改刷新索引。However, if you use a view, you won’t be able to use SQL Server integrated change detection to refresh an index with incremental changes. 有关详细信息,请参阅下文中的捕获更改和删除的行For more information, see Capturing Changed and Deleted Rows below.
数据类型是兼容的Data types are compatible Azure 认知搜索索引中支持大多数但并非全部 SQL 类型。Most but not all the SQL types are supported in an Azure Cognitive Search index. 有关列表,请参阅映射数据类型For a list, see Mapping data types.
不需要进行实时数据同步Real-time data synchronization is not required 索引器最多每五分钟可以为表重新编制索引。An indexer can reindex your table at most every five minutes. 如果数据频繁更改并且所做更改需要在数秒或数分钟内反映在索引中,建议使用 REST API.NET SDK 来直接推送更新的行。If your data changes frequently, and the changes need to be reflected in the index within seconds or single minutes, we recommend using the REST API or .NET SDK to push updated rows directly.
可以进行增量索引编制Incremental indexing is possible 如果具有大型数据集并打算按计划运行索引器,则 Azure 认知搜索必须能够有效地标识新的、更改的或删除的行。If you have a large data set and plan to run the indexer on a schedule, Azure Cognitive Search must be able to efficiently identify new, changed, or deleted rows. 只有按需(而非按计划)编制索引时或者为少于 100,000 行的数据编制索引时,才允许非增量索引编制。Non-incremental indexing is only allowed if you're indexing on demand (not on schedule), or indexing fewer than 100,000 rows. 有关详细信息,请参阅下文中的捕获更改和删除的行For more information, see Capturing Changed and Deleted Rows below.

备注

Azure 认知搜索仅支持 SQL Server 身份验证。Azure Cognitive Search supports SQL Server authentication only. 如果需要支持 Azure Active Directory 密码身份验证,请为此 UserVoice 建议投票。If you require support for Azure Active Directory Password authentication, please vote for this UserVoice suggestion.

创建 Azure SQL 索引器Create an Azure SQL Indexer

  1. 创建数据源:Create the data source:

     POST https://myservice.search.azure.cn/datasources?api-version=2020-06-30
     Content-Type: application/json
     api-key: admin-key
    
     {
         "name" : "myazuresqldatasource",
         "type" : "azuresql",
         "credentials" : { "connectionString" : "Server=tcp:<your server>.database.chinacloudapi.cn,1433;Database=<your database>;User ID=<your user name>;Password=<your password>;Trusted_Connection=False;Encrypt=True;Connection Timeout=30;" },
         "container" : { "name" : "name of the table or view that you want to index" }
     }
    

    可从 Azure 门户获取连接字符串;请使用 ADO.NET connection string 选项。You can get the connection string from the Azure portal; use the ADO.NET connection string option.

  2. 创建目标 Azure 认知搜索索引(如果还没有)。Create the target Azure Cognitive Search index if you don't have one already. 可以使用门户创建索引 API 创建索引。You can create an index using the portal or the Create Index API. 确保目标索引的架构与源表的架构兼容 - 请参阅 SQL 和 Azure 认知搜索数据类型之间的映射Ensure that the schema of your target index is compatible with the schema of the source table - see mapping between SQL and Azure Cognitive search data types.

  3. 通过为索引器命名并引用数据源和目标索引创建索引器:Create the indexer by giving it a name and referencing the data source and target index:

     POST https://myservice.search.azure.cn/indexers?api-version=2020-06-30
     Content-Type: application/json
     api-key: admin-key
    
     {
         "name" : "myindexer",
         "dataSourceName" : "myazuresqldatasource",
         "targetIndexName" : "target index name"
     }
    

通过此方式创建的索引器不包含计划。An indexer created in this way doesn’t have a schedule. 它会在创建后自动运行一次。It automatically runs once when it’s created. 可使用运行索引器请求随时再次运行:You can run it again at any time using a run indexer request:

    POST https://myservice.search.azure.cn/indexers/myindexer/run?api-version=2020-06-30
    api-key: admin-key

可自定义索引器行为的几个方面,例如批大小和可在索引器执行失败前跳过的文档数。You can customize several aspects of indexer behavior, such as batch size and how many documents can be skipped before an indexer execution fails. 有关详细信息,请参阅创建索引器 APIFor more information, see Create Indexer API.

可能需要允许 Azure 服务连接到数据库。You may need to allow Azure services to connect to your database. 有关如何执行该操作的说明,请参阅从 Azure 连接See Connecting From Azure for instructions on how to do that.

若要监视索引器状态和执行历史记录(已编制索引的项目数、失败数等),请使用索引器状态请求:To monitor the indexer status and execution history (number of items indexed, failures, etc.), use an indexer status request:

    GET https://myservice.search.azure.cn/indexers/myindexer/status?api-version=2020-06-30
    api-key: admin-key

响应应类似于以下形式:The response should look similar to the following:

    {
        "\@odata.context":"https://myservice.search.azure.cn/$metadata#Microsoft.Azure.Search.V2015_02_28.IndexerExecutionInfo",
        "status":"running",
        "lastResult": {
            "status":"success",
            "errorMessage":null,
            "startTime":"2015-02-21T00:23:24.957Z",
            "endTime":"2015-02-21T00:36:47.752Z",
            "errors":[],
            "itemsProcessed":1599501,
            "itemsFailed":0,
            "initialTrackingState":null,
            "finalTrackingState":null
        },
        "executionHistory":
        [
            {
                "status":"success",
                "errorMessage":null,
                "startTime":"2015-02-21T00:23:24.957Z",
                "endTime":"2015-02-21T00:36:47.752Z",
                "errors":[],
                "itemsProcessed":1599501,
                "itemsFailed":0,
                "initialTrackingState":null,
                "finalTrackingState":null
            },
            ... earlier history items
        ]
    }

执行历史记录包含最多 50 个最近完成的执行,它们按反向时间顺序排序(以便最新执行出现在响应中的第一个)。Execution history contains up to 50 of the most recently completed executions, which are sorted in the reverse chronological order (so that the latest execution comes first in the response). 有关响应的其他信息可在获取索引器状态中找到Additional information about the response can be found in Get Indexer Status

按计划运行索引器Run indexers on a schedule

还可以排列索引器,以按计划定期运行。You can also arrange the indexer to run periodically on a schedule. 若要执行此操作,在创建或更新索引器时添加计划属性。To do this, add the schedule property when creating or updating the indexer. 下面的示例显示了用于更新索引器的 PUT 请求:The example below shows a PUT request to update the indexer:

    PUT https://myservice.search.azure.cn/indexers/myindexer?api-version=2020-06-30
    Content-Type: application/json
    api-key: admin-key

    {
        "dataSourceName" : "myazuresqldatasource",
        "targetIndexName" : "target index name",
        "schedule" : { "interval" : "PT10M", "startTime" : "2015-01-01T00:00:00Z" }
    }

间隔参数是必需的。The interval parameter is required. 间隔是指开始两个连续的索引器执行之间的时间。The interval refers to the time between the start of two consecutive indexer executions. 允许的最小间隔为 5 分钟;最长为一天。The smallest allowed interval is 5 minutes; the longest is one day. 必须将其格式化为 XSD“dayTimeDuration”值(ISO 8601 持续时间值的受限子集)。It must be formatted as an XSD "dayTimeDuration" value (a restricted subset of an ISO 8601 duration value). 它的模式为: P(nD)(T(nH)(nM))The pattern for this is: P(nD)(T(nH)(nM)). 示例:PT15M 为每隔 15 分钟,PT2H 为每隔 2 小时。Examples: PT15M for every 15 minutes, PT2H for every 2 hours.

若要详细了解如何定义索引器计划,请参阅如何为 Azure 认知搜索计划索引器For more information about defining indexer schedules see How to schedule indexers for Azure Cognitive Search.

捕获新的、更改的和删除的行Capture new, changed, and deleted rows

Azure 认知搜索使用 “增量索引编制”来避免索引器每次运行时都必须为整个表或视图重新编制索引。Azure Cognitive Search uses incremental indexing to avoid having to reindex the entire table or view every time an indexer runs. Azure 认知搜索提供了两个更改检测策略来支持增量索引编制。Azure Cognitive Search provides two change detection policies to support incremental indexing.

SQL 集成的更改跟踪策略SQL Integrated Change Tracking Policy

如果 SQL 数据库支持更改跟踪,我们建议使用 SQL 集成的更改跟踪策略If your SQL database supports change tracking, we recommend using SQL Integrated Change Tracking Policy. 这是最有效的策略。This is the most efficient policy. 此外,它允许 Azure 认知搜索标识删除的行,无需向表中添加显式“软删除”列。In addition, it allows Azure Cognitive Search to identify deleted rows without you having to add an explicit "soft delete" column to your table.

要求Requirements

  • 数据库版本要求:Database version requirements:
    • SQL Server 2012 SP3 及更高版本,如果使用的是 Azure VM 上的 SQL Server。SQL Server 2012 SP3 and later, if you're using SQL Server on Azure VMs.
    • Azure SQL 数据库或 SQL 托管实例。Azure SQL Database or SQL Managed Instance.
  • 只有表(无视图)。Tables only (no views).
  • 在数据库上,为表启用更改跟踪On the database, enable change tracking for the table.
  • 表上没有组合主键(包含多个列的主键)。No composite primary key (a primary key containing more than one column) on the table.

使用情况Usage

若要使用此策略,按如下所示创建或更新数据源:To use this policy, create or update your data source like this:

    {
        "name" : "myazuresqldatasource",
        "type" : "azuresql",
        "credentials" : { "connectionString" : "connection string" },
        "container" : { "name" : "table or view name" },
        "dataChangeDetectionPolicy" : {
           "@odata.type" : "#Microsoft.Azure.Search.SqlIntegratedChangeTrackingPolicy"
      }
    }

当使用 SQL 集成的更改跟踪策略时,不指定单独的数据删除检测策略 - 此策略具有对标识删除的行的内置支持。When using SQL integrated change tracking policy, do not specify a separate data deletion detection policy - this policy has built-in support for identifying deleted rows. 但是,对于要“自动”检测的删除项,搜索索引中的文档键必须与 SQL 表中的主键相同。However, for the deletes to be detected "automagically", the document key in your search index must be the same as the primary key in the SQL table.

备注

使用 TRUNCATE TABLE 从 SQL 表中删除大量行时,需要重置索引器才能重置更改跟踪状态,从而选取行删除项。When using TRUNCATE TABLE to remove a large number of rows from a SQL table, the indexer needs to be reset to reset the change tracking state to pick up row deletions.

高使用标记更改检测策略High Water Mark Change Detection policy

此更改检测策略依赖于对版本或行的上次更新时间进行捕获的一个“高使用标记”列。This change detection policy relies on a "high water mark" column capturing the version or time when a row was last updated. 如果在使用视图,则必须使用高使用标记策略。If you're using a view, you must use a high water mark policy. 高使用标记列必须满足以下要求。The high water mark column must meet the following requirements.

要求Requirements

  • 所有插入都为列指定一个值。All inserts specify a value for the column.
  • 对某个项目的所有更新也会更改该列的值。All updates to an item also change the value of the column.
  • 此列的值随每次插入或更新而增加。The value of this column increases with each insert or update.
  • 具有以下 WHERE 和 ORDER BY 子句的查询可以高效执行:WHERE [High Water Mark Column] > [Current High Water Mark Value] ORDER BY [High Water Mark Column]Queries with the following WHERE and ORDER BY clauses can be executed efficiently: WHERE [High Water Mark Column] > [Current High Water Mark Value] ORDER BY [High Water Mark Column]

重要

强烈建议为高使用标记列使用 rowversion 数据类型。We strongly recommend using the rowversion data type for the high water mark column. 如果使用其他任何数据类型,则当存在与索引器查询并发执行的事务时,不能保证更改跟踪捕获所有更改。If any other data type is used, change tracking is not guaranteed to capture all changes in the presence of transactions executing concurrently with an indexer query. 在具有只读副本的配置中使用 rowversion 时,必须将索引器指向主副本。When using rowversion in a configuration with read-only replicas, you must point the indexer at the primary replica. 只有主副本可以用于数据同步方案。Only a primary replica can be used for data sync scenarios.

使用情况Usage

若要使用高使用标记策略,请按如下所示创建或更新数据源:To use a high water mark policy, create or update your data source like this:

    {
        "name" : "myazuresqldatasource",
        "type" : "azuresql",
        "credentials" : { "connectionString" : "connection string" },
        "container" : { "name" : "table or view name" },
        "dataChangeDetectionPolicy" : {
           "@odata.type" : "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
           "highWaterMarkColumnName" : "[a rowversion or last_updated column name]"
      }
    }

警告

如果源表在高使用标记列上没有索引,SQL 索引器使用的查询可能会超时。特别是,当表中包含多个行时,ORDER BY [High Water Mark Column] 子句需要索引才能有效运行。If the source table does not have an index on the high water mark column, queries used by the SQL indexer may time out. In particular, the ORDER BY [High Water Mark Column] clause requires an index to run efficiently when the table contains many rows.

convertHighWaterMarkToRowVersionconvertHighWaterMarkToRowVersion

如果对高使用标记列使用 rowversion 数据类型,请考虑使用 convertHighWaterMarkToRowVersion 索引器配置设置。If you're using a rowversion data type for the high water mark column, consider using the convertHighWaterMarkToRowVersion indexer configuration setting. convertHighWaterMarkToRowVersion 执行两项操作:convertHighWaterMarkToRowVersion does two things:

  • 在索引器 sql 查询中,对高使用标记列使用 rowversion 数据类型。Use the rowversion data type for the high water mark column in the indexer sql query. 使用正确的数据类型可提高索引器查询性能。Using the correct data type improves indexer query performance.
  • 在索引器查询运行之前从 rowversion 值中减去 1。Subtract 1 from the rowversion value before the indexer query runs. 具有一对多联接的视图可能包含具有重复 rowversion 值的行。Views with 1 to many joins may have rows with duplicate rowversion values. 减 1 可确保索引器查询不会错过这些行。Subtracting 1 ensures the indexer query doesn't miss these rows.

若要启用此功能,请使用以下配置创建或更新索引器:To enable this feature, create or update the indexer with the following configuration:

    {
      ... other indexer definition properties
     "parameters" : {
            "configuration" : { "convertHighWaterMarkToRowVersion" : true } }
    }

queryTimeoutqueryTimeout

如果遇到超时错误,可以使用 queryTimeout 索引器配置设置,查询超时值设置为高于 5 分钟超时的默认值。If you encounter timeout errors, you can use the queryTimeout indexer configuration setting to set the query timeout to a value higher than the default 5-minute timeout. 例如,要将超时设置为 10 分钟,请使用以下配置创建或更新索引器:For example, to set the timeout to 10 minutes, create or update the indexer with the following configuration:

    {
      ... other indexer definition properties
     "parameters" : {
            "configuration" : { "queryTimeout" : "00:10:00" } }
    }

disableOrderByHighWaterMarkColumndisableOrderByHighWaterMarkColumn

也可以禁用 ORDER BY [High Water Mark Column] 子句。You can also disable the ORDER BY [High Water Mark Column] clause. 但是,不建议这样做,因为如果索引器执行由于错误而中断,索引器在将来运行时必须重新处理所有行,即使索引器在中断时就处理了几乎所有行也是如此。However, this is not recommended because if the indexer execution is interrupted by an error, the indexer has to re-process all rows if it runs later - even if the indexer has already processed almost all the rows by the time it was interrupted. 若要禁用 ORDER BY 子句,使用索引器定义中的 disableOrderByHighWaterMarkColumn 设置:To disable the ORDER BY clause, use the disableOrderByHighWaterMarkColumn setting in the indexer definition:

    {
     ... other indexer definition properties
     "parameters" : {
            "configuration" : { "disableOrderByHighWaterMarkColumn" : true } }
    }

软删除列删除检测策略Soft Delete Column Deletion Detection policy

从源表中删除行时,可能还希望从搜索索引中删除这些行。When rows are deleted from the source table, you probably want to delete those rows from the search index as well. 如果使用 SQL 集成的更改跟踪策略,此操作会自动完成。If you use the SQL integrated change tracking policy, this is taken care of for you. 但是,高使用标记更改跟踪策略不会帮助你处理删除的行。However, the high water mark change tracking policy doesn’t help you with deleted rows. 怎么办?What to do?

如果以物理方式从表中删除行,Azure 认知搜索无法推断出不再存在的记录是否存在。If the rows are physically removed from the table, Azure Cognitive Search has no way to infer the presence of records that no longer exist. 但是,可使用“软删除”技术以逻辑方式删除行,无需从表中删除它们。However, you can use the “soft-delete” technique to logically delete rows without removing them from the table. 将列添加到表或视图,并使用该列将行标记为已删除。Add a column to your table or view and mark rows as deleted using that column.

使用软删除技术时,可在创建或更新数据源时,按如下方式指定软删除策略:When using the soft-delete technique, you can specify the soft delete policy as follows when creating or updating the data source:

    {
        …,
        "dataDeletionDetectionPolicy" : {
           "@odata.type" : "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
           "softDeleteColumnName" : "[a column name]",
           "softDeleteMarkerValue" : "[the value that indicates that a row is deleted]"
        }
    }

SoftDeleteMarkerValue 必须是字符串 - 使用实际值的字符串表示形式。The softDeleteMarkerValue must be a string – use the string representation of your actual value. 例如,如果有一个整数列(使用值 1 标记删除的行),则使用 "1"For example, if you have an integer column where deleted rows are marked with the value 1, use "1". 如果有一个 BIT 列(使用布尔值 true 标记删除的行),请使用字符串文本 Truetrue(不区分大小写)。If you have a BIT column where deleted rows are marked with the Boolean true value, use the string literal True or true, the case doesn't matter.

SQL 数据类型与 Azure 认知搜索数据类型之间的映射Mapping between SQL and Azure Cognitive Search data types

SQL 数据类型SQL data type 允许的目标索引字段类型Allowed target index field types 注释Notes
bitbit Edm.Boolean、Edm.StringEdm.Boolean, Edm.String
int、smallint、tinyintint, smallint, tinyint Edm.Int32、Edm.Int64、Edm.StringEdm.Int32, Edm.Int64, Edm.String
bigintbigint Edm.Int64、Edm.StringEdm.Int64, Edm.String
real、floatreal, float Edm.Double、Edm.StringEdm.Double, Edm.String
smallmoney、money decimal numericsmallmoney, money decimal numeric Edm.StringEdm.String Azure 认知搜索不支持将十进制类型转换为 Edm.Double,因为这可能会损失精度Azure Cognitive Search does not support converting decimal types into Edm.Double because this would lose precision
char、nchar、varchar、nvarcharchar, nchar, varchar, nvarchar Edm.StringEdm.String
集合 (Edm.String)Collection(Edm.String)
如果 SQL 字符串表示 JSON 字符串数组,该字符串可用于填充 Collection(Edm.String) 字段:["red", "white", "blue"]A SQL string can be used to populate a Collection(Edm.String) field if the string represents a JSON array of strings: ["red", "white", "blue"]
smalldatetime、datetime、datetime2、date、datetimeoffsetsmalldatetime, datetime, datetime2, date, datetimeoffset Edm.DateTimeOffset、Edm.StringEdm.DateTimeOffset, Edm.String
uniqueidentiferuniqueidentifer Edm.StringEdm.String
geographygeography Edm.GeographyPointEdm.GeographyPoint 仅支持具有 SRID 4326(这是默认值)的类型 POINT 的地理实例Only geography instances of type POINT with SRID 4326 (which is the default) are supported
rowversionrowversion 空值N/A 行版本列不能存储在搜索索引中,但可用于更改跟踪Row-version columns cannot be stored in the search index, but they can be used for change tracking
time、timespan、binary、varbinary、image、xml、geometry、CLR 类型time, timespan, binary, varbinary, image, xml, geometry, CLR types 空值N/A 不支持Not supported

配置设置Configuration Settings

SQL 索引器公开多个配置设置:SQL indexer exposes several configuration settings:

设置Setting 数据类型Data type 目的Purpose 默认值Default value
queryTimeoutqueryTimeout stringstring 设置 SQL 查询执行的超时Sets the timeout for SQL query execution 5 分钟(“00:05:00”)5 minutes ("00:05:00")
disableOrderByHighWaterMarkColumndisableOrderByHighWaterMarkColumn boolbool 导致高使用标记策略使用的 SQL 查询省略 ORDER BY 子句。Causes the SQL query used by the high water mark policy to omit the ORDER BY clause. 请参阅高使用标记策略See High Water Mark policy falsefalse

在索引器定义的 parameters.configuration 对象中使用这些设置。These settings are used in the parameters.configuration object in the indexer definition. 例如,要将查询超时设置为 10 分钟,请使用以下配置创建或更新索引器:For example, to set the query timeout to 10 minutes, create or update the indexer with the following configuration:

    {
      ... other indexer definition properties
     "parameters" : {
            "configuration" : { "queryTimeout" : "00:10:00" } }
    }

常见问题FAQ

问:是否可以将 Azure SQL 索引器与在 Azure 中 IaaS VM 上运行的 SQL 数据库配合使用?Q: Can I use Azure SQL indexer with SQL databases running on IaaS VMs in Azure?

是的。Yes. 但是,需要允许搜索服务连接到数据库。However, you need to allow your search service to connect to your database. 有关详细信息,请参阅配置从 Azure 认知搜索索引器到 Azure VM 上 SQL Server 的连接For more information, see Configure a connection from an Azure Cognitive Search indexer to SQL Server on an Azure VM.

问:是否可以将 Azure SQL 索引器与本地运行的 SQL 数据库配合使用?Q: Can I use Azure SQL indexer with SQL databases running on-premises?

无法直接配合使用。Not directly. 我们不建议使用也不支持直接连接,因为这样做需要使用 Internet 流量打开数据库。We do not recommend or support a direct connection, as doing so would require you to open your databases to Internet traffic.

问:是否可以将 Azure SQL 索引器与在 Azure 上 IaaS 中运行的非 SQL Server 数据库配合使用?Q: Can I use Azure SQL indexer with databases other than SQL Server running in IaaS on Azure?

否。No. 我们不支持此方案,因为我们尚未使用除 SQL Server 以外的任何数据库测试该索引器。We don’t support this scenario, because we haven’t tested the indexer with any databases other than SQL Server.

问:是否可以创建多个按计划运行的索引器?Q: Can I create multiple indexers running on a schedule?

是的。Yes. 但是,一次只能在一个节点上运行一个索引器。However, only one indexer can be running on one node at one time. 如果需要同时运行多个索引器,请考虑将搜索服务扩展到多个搜索单位。If you need multiple indexers running concurrently, consider scaling up your search service to more than one search unit.

问:运行索引器是否会影响我的查询工作负荷?Q: Does running an indexer affect my query workload?

是的。Yes. 索引器在搜索服务中的一个节点上运行,该节点的资源在编制查询流量索引并进行处理和其他 API 请求之间共享。Indexer runs on one of the nodes in your search service, and that node’s resources are shared between indexing and serving query traffic and other API requests. 如果运行密集型编制索引和查询工作负荷,并频繁遇到 503 错误或响应时间增加,请考虑纵向扩展搜索服务If you run intensive indexing and query workloads and encounter a high rate of 503 errors or increasing response times, consider scaling up your search service.

问:是否可以将故障转移群集中的次要副本用作数据源?Q: Can I use a secondary replica in a failover cluster as a data source?

视情况而定。It depends. 对于表或视图的完整索引编制,可以使用辅助副本。For full indexing of a table or view, you can use a secondary replica.

对于增量索引编制,Azure 认知搜索支持两个更改检测策略:SQL 集成的更改跟踪策略和高使用标记策略。For incremental indexing, Azure Cognitive Search supports two change detection policies: SQL integrated change tracking and High Water Mark.

在只读副本上,SQL 数据库不支持集成的更改跟踪。On read-only replicas, SQL Database does not support integrated change tracking. 因此,必须使用高使用标记策略。Therefore, you must use High Water Mark policy.

我们的标准建议是为高使用标记列使用 rowversion 数据类型。Our standard recommendation is to use the rowversion data type for the high water mark column. 但是,rowversion 的使用依赖于 MIN_ACTIVE_ROWVERSION 函数,该函数在只读副本上不受支持。However, using rowversion relies on the MIN_ACTIVE_ROWVERSION function, which is not supported on read-only replicas. 因此,如果使用 rowversion,必须将索引器指向主副本。Therefore, you must point the indexer to a primary replica if you are using rowversion.

如果尝试在只读副本上使用 rowversion,则会看到以下错误:If you attempt to use rowversion on a read-only replica, you will see the following error:

“辅助(只读)可用性副本不支持使用 rowversion 列进行更改跟踪。"Using a rowversion column for change tracking is not supported on secondary (read-only) availability replicas. 请更新数据源并指定与主可用性副本的连接。当前数据库的 'Updateability' 属性为 'READ_ONLY'”。Please update the datasource and specify a connection to the primary availability replica.Current database 'Updateability' property is 'READ_ONLY'".

问:是否可以使用替代的非 rowversion 列进行高使用标记更改跟踪?Q: Can I use an alternative, non-rowversion column for high water mark change tracking?

不建议这样做。It's not recommended. 只有 rowversion 能够实现可靠的数据同步。Only rowversion allows for reliable data synchronization. 不过,取决于你的应用程序逻辑,如果满足以下条件,则可能也很可靠:However, depending on your application logic, it may be safe if:

  • 你可以确保当索引器运行时在编制索引的表上没有未完成的事务(例如,所有表更新都按计划作为批处理进行,并且 Azure 认知搜索索引器计划设置为避免与表更新计划重叠)。You can ensure that when the indexer runs, there are no outstanding transactions on the table that’s being indexed (for example, all table updates happen as a batch on a schedule, and the Azure Cognitive Search indexer schedule is set to avoid overlapping with the table update schedule).

  • 你定期执行完整重新索引来补充任何缺少的行。You periodically do a full reindex to pick up any missed rows.