如何使用批处理来改善 SQL 数据库应用程序的性能How to use batching to improve SQL Database application performance

对 Azure SQL 数据库执行批处理操作可以大幅改善应用程序的性能和缩放性。Batching operations to Azure SQL Database significantly improves the performance and scalability of your applications. 为了帮助你了解优点,本文的第一部分包含一些示例测试结果用于比较对 SQL 数据库发出的顺序请求和分批请求。In order to understand the benefits, the first part of this article covers some sample test results that compare sequential and batched requests to a SQL Database. 本文的余下部分介绍了帮助你在 Azure 应用程序中成功使用批处理的方法、方案和注意事项。The remainder of the article shows the techniques, scenarios, and considerations to help you to use batching successfully in your Azure applications.

为什么批处理对 SQL 数据库很重要Why is batching important for SQL Database

对远程服务的批处理调用是提高性能和可伸缩性的常用策略。Batching calls to a remote service is a well-known strategy for increasing performance and scalability. 对于任何与远程服务的交互(如序列化、网络传输和反序列化),都有固定的处理开销。There are fixed processing costs to any interactions with a remote service, such as serialization, network transfer, and deserialization. 将很多单独的事务打包为一个批处理操作可最大限度降低这些成本。Packaging many separate transactions into a single batch minimizes these costs.

在本文中,我们要比较各种 SQL 数据库批处理策略和情形。In this paper, we want to examine various SQL Database batching strategies and scenarios. 尽管这些策略对于使用 SQL Server 的本地应用程序也很重要,但是将批处理用于 SQL 数据库主要是基于以下两个原因:Although these strategies are also important for on-premises applications that use SQL Server, there are several reasons for highlighting the use of batching for SQL Database:

  • 在访问 SQL 数据库时可能有更长的网络延迟,特别是从同一 Azure 数据中心外部访问 SQL 数据库时。There is potentially greater network latency in accessing SQL Database, especially if you are accessing SQL Database from outside the same Azure datacenter.
  • SQL 数据库的多租户特征意味着数据访问层的效率与数据库的总体缩放性关联。The multitenant characteristics of SQL Database means that the efficiency of the data access layer correlates to the overall scalability of the database. SQL 数据库必须防止任何单个租户/用户独占数据库资源,从而对其他租户不利。SQL Database must prevent any single tenant/user from monopolizing database resources to the detriment of other tenants. 在使用量超过预定义的配额时,SQL 数据库可减小吞吐量或引发限制异常。In response to usage in excess of predefined quotas, SQL Database can reduce throughput or respond with throttling exceptions. 一些提高效率的措施(如批处理),允许在达到这些配额前在 SQL 数据库上做更多的工作。Efficiencies, such as batching, enable you to do more work on SQL Database before reaching these limits.
  • 批处理对于使用多个数据库或联合的体系结构也很有效(分片)。Batching is also effective for architectures that use multiple databases (sharding). 与每个数据库单位的交互效率仍是影响总体伸缩性的关键因素。The efficiency of your interaction with each database unit is still a key factor in your overall scalability.

使用 SQL 数据库的一个好处是不必管理用于托管数据库的服务器。One of the benefits of using SQL Database is that you don't have to manage the servers that host the database. 但是,这个托管的基础结构也意味着你必须重新考虑数据库优化。However, this managed infrastructure also means that you have to think differently about database optimizations. 将不再致力于改进数据库硬件或网络基础结构。You can no longer look to improve the database hardware or network infrastructure. Azure 将控制这些环境。Azure controls those environments. 可以控制的主要方面是应用程序如何与 SQL 数据库交互。The main area that you can control is how your application interacts with SQL Database. 批处理就是这些优化措施之一。Batching is one of these optimizations.

本文的第一部分比较了使用 SQL 数据库的 .NET 应用程序可用的各种批处理方法。The first part of the paper examines various batching techniques for .NET applications that use SQL Database. 最后两个部分介绍批处理准则和方案。The last two sections cover batching guidelines and scenarios.

批处理策略Batching strategies

请注意本文中的执行时间结果Note about timing results in this article

Note

结果并不是基准,而是用于显示相对性能Results are not benchmarks but are meant to show relative performance. 计时基于至少运行 10 次测试后的平均值。Timings are based on an average of at least 10 test runs. 操作将插入空表。Operations are inserts into an empty table. 这些测试会在 V12 以前的版本中测量,不一定对应于在使用新 DTU 服务层级vCore 服务层级的 V12 数据库中可能获得的吞吐量。These tests were measured pre-V12, and they do not necessarily correspond to throughput that you might experience in a V12 database using the new DTU service tiers or vCore service tiers. 批处理技术的相对优势应该类似。The relative benefit of the batching technique should be similar.

事务Transactions

通过讨论事务来开始讲述批处理似乎有点奇怪。It seems strange to begin a review of batching by discussing transactions. 但是使用客户端事务具有提高性能的微妙服务器端批处理效果。But the use of client-side transactions has a subtle server-side batching effect that improves performance. 可以使用几行代码来添加事务,因此这提供了一个快速提高顺序操作的性能的方法。And transactions can be added with only a few lines of code, so they provide a fast way to improve performance of sequential operations.

请注意以下 C# 代码,其中包含对一个简单表执行的插入和更新操作序列。Consider the following C# code that contains a sequence of insert and update operations on a simple table.

List<string> dbOperations = new List<string>();
dbOperations.Add("update MyTable set mytext = 'updated text' where id = 1");
dbOperations.Add("update MyTable set mytext = 'updated text' where id = 2");
dbOperations.Add("update MyTable set mytext = 'updated text' where id = 3");
dbOperations.Add("insert MyTable values ('new value',1)");
dbOperations.Add("insert MyTable values ('new value',2)");
dbOperations.Add("insert MyTable values ('new value',3)");

以下 ADO.NET 代码可按顺序执行这些操作。The following ADO.NET code sequentially performs these operations.

using (SqlConnection connection = new SqlConnection(CloudConfigurationManager.GetSetting("Sql.ConnectionString")))
{
    conn.Open();

    foreach(string commandString in dbOperations)
    {
        SqlCommand cmd = new SqlCommand(commandString, conn);
        cmd.ExecuteNonQuery();
    }
}

优化此代码的最佳方法是实现这些调用的某种客户端批处理。The best way to optimize this code is to implement some form of client-side batching of these calls. 但是有一个提高此代码性能的简单方法,即在事务中包装此调用序列。But there is a simple way to increase the performance of this code by simply wrapping the sequence of calls in a transaction. 以下是使用事务的同一代码。Here is the same code that uses a transaction.

using (SqlConnection connection = new SqlConnection(CloudConfigurationManager.GetSetting("Sql.ConnectionString")))
{
    conn.Open();
    SqlTransaction transaction = conn.BeginTransaction();

    foreach (string commandString in dbOperations)
    {
        SqlCommand cmd = new SqlCommand(commandString, conn, transaction);
        cmd.ExecuteNonQuery();
    }

    transaction.Commit();
}

事务实际在这两个示例中都用到了。Transactions are actually being used in both of these examples. 在第一个示例中,每个单个调用就是一个隐式事务。In the first example, each individual call is an implicit transaction. 在第二个示例中,用一个显式事务包装了所有调用。In the second example, an explicit transaction wraps all of the calls. 按照预写事务日志的文档中所述,在事务提交时会日志记录刷新到磁盘。Per the documentation for the write-ahead transaction log, log records are flushed to the disk when the transaction commits. 因此通过在事务中包含更多调用,写入事务日志可能延迟,直到提交事务。So by including more calls in a transaction, the write to the transaction log can delay until the transaction is committed. 实际上,你正在为写入服务器的事务日志启用批处理。In effect, you are enabling batching for the writes to the server's transaction log.

下表显示一些即席测试结果。The following table shows some ad-hoc testing results. 这些测试执行具有事务和不具有事务的相同的顺序插入。The tests performed the same sequential inserts with and without transactions. 为了更具对比性,第一组测试是从笔记本电脑针对 Azure 中的数据库远程运行的。For more perspective, the first set of tests ran remotely from a laptop to the database in Azure. 第二组测试是从位于同一 Azure 数据中心(中国北部)的云服务和数据库运行的。The second set of tests ran from a cloud service and database that both resided within the same Azure datacenter (China North). 下表显示具有和不具有事务的一系列顺序插入所用的时间(毫秒)。The following table shows the duration in milliseconds of sequential inserts with and without transactions.

本地到 AzureOn-Premises to Azure:

操作Operations 无事务(毫秒)No Transaction (ms) 事务(毫秒)Transaction (ms)
11 130130 402402
1010 12081208 12261226
100100 1266212662 1039510395
10001000 128852128852 102917102917

Azure 到 Azure(同一数据中心)Azure to Azure (same datacenter):

操作Operations 无事务(毫秒)No Transaction (ms) 事务(毫秒)Transaction (ms)
11 2121 2626
1010 220220 5656
100100 21452145 341341
10001000 2147921479 27562756

Note

结果并非基准。Results are not benchmarks. 请参阅有关本文中计时结果的注意事项See the note about timing results in this article.

根据前面的测试结果,在事务中包装一个操作实际上会降低性能。Based on the previous test results, wrapping a single operation in a transaction actually decreases performance. 但是,增加单个事务中的操作数时,性能提高将变得很明显。But as you increase the number of operations within a single transaction, the performance improvement becomes more marked. 当所有操作发生在 Azure 数据中心内时,性能差异也更明显。The performance difference is also more noticeable when all operations occur within the Azure datacenter. 从 Azure 数据中心外部使用 SQL 数据库增加的延迟超过了使用事务带来的性能提升。The increased latency of using SQL Database from outside the Azure datacenter overshadows the performance gain of using transactions.

尽管使用事务可以提高性能,但还请继续遵循事务和连接的最佳做法Although the use of transactions can increase performance, continue to observe best practices for transactions and connections. 使事务尽可能短,并在工作完成后关闭数据库连接。Keep the transaction as short as possible, and close the database connection after the work completes. 前一个示例中的 using 语句可确保在后续代码阻塞完成时关闭连接。The using statement in the previous example assures that the connection is closed when the subsequent code block completes.

前一个示例演示可以将一个本地事务添加到任何具有两行的 ADO.NET 代码。The previous example demonstrates that you can add a local transaction to any ADO.NET code with two lines. 事务提供了一个快速提高代码性能的方法,这些代码用于执行顺序插入、更新和删除操作。Transactions offer a quick way to improve the performance of code that makes sequential insert, update, and delete operations. 但是,为了实现最佳性能,请考虑进一步更改代码,以利用客户端批处理(如表值参数)。However, for the fastest performance, consider changing the code further to take advantage of client-side batching, such as table-valued parameters.

有关 ADO.NET 中事务的详细信息,请参阅 ADO.NET 中的本地事务For more information about transactions in ADO.NET, see Local Transactions in ADO.NET.

表值参数Table-valued parameters

表值参数支持用户定义的表类型作为 Transact-SQL 语句、存储过程和函数的参数。Table-valued parameters support user-defined table types as parameters in Transact-SQL statements, stored procedures, and functions. 使用这个客户端批处理方法,可以在表值参数中发送多行数据。This client-side batching technique allows you to send multiple rows of data within the table-valued parameter. 若要使用表值参数,请首先定义表类型。To use table-valued parameters, first define a table type. 以下 Transact-SQL 语句将创建一个名为 MyTableType 的表类型。The following Transact-SQL statement creates a table type named MyTableType.

    CREATE TYPE MyTableType AS TABLE 
    ( mytext TEXT,
      num INT );

在代码中,将创建一个与表类型具有相同名称和类型的 DataTableIn code, you create a DataTable with the exact same names and types of the table type. 在文本查询或存储过程调用的参数中传递此 DataTablePass this DataTable in a parameter in a text query or stored procedure call. 以下示例显示了这个方法:The following example shows this technique:

using (SqlConnection connection = new SqlConnection(CloudConfigurationManager.GetSetting("Sql.ConnectionString")))
{
    connection.Open();

    DataTable table = new DataTable();
    // Add columns and rows. The following is a simple example.
    table.Columns.Add("mytext", typeof(string));
    table.Columns.Add("num", typeof(int));
    for (var i = 0; i < 10; i++)
    {
        table.Rows.Add(DateTime.Now.ToString(), DateTime.Now.Millisecond);
    }

    SqlCommand cmd = new SqlCommand(
        "INSERT INTO MyTable(mytext, num) SELECT mytext, num FROM @TestTvp",
        connection);

    cmd.Parameters.Add(
        new SqlParameter()
        {
            ParameterName = "@TestTvp",
            SqlDbType = SqlDbType.Structured,
            TypeName = "MyTableType",
            Value = table,
        });

    cmd.ExecuteNonQuery();
}

在前一示例中,SqlCommand 对象从表值参数 @TestTvp 插入行。In the previous example, the SqlCommand object inserts rows from a table-valued parameter, @TestTvp. 使用 SqlCommand.Parameters.Add 方法以前创建的 DataTable 对象分配到此参数。The previously created DataTable object is assigned to this parameter with the SqlCommand.Parameters.Add method. 对一个调用中的插入进行批处理将显著提高顺序插入的性能。Batching the inserts in one call significantly increases the performance over sequential inserts.

若要进一步改进前一个示例,请使用存储过程来替代基于文本的命令。To improve the previous example further, use a stored procedure instead of a text-based command. 以下 Transact-SQL 命令创建一个采用 SimpleTestTableType 表值参数的存储过程。The following Transact-SQL command creates a stored procedure that takes the SimpleTestTableType table-valued parameter.

CREATE PROCEDURE [dbo].[sp_InsertRows] 
@TestTvp as MyTableType READONLY
AS
BEGIN
INSERT INTO MyTable(mytext, num) 
SELECT mytext, num FROM @TestTvp
END
GO

然后将前一个代码示例中的 SqlCommand 对象声明更改为以下内容。Then change the SqlCommand object declaration in the previous code example to the following.

SqlCommand cmd = new SqlCommand("sp_InsertRows", connection);
cmd.CommandType = CommandType.StoredProcedure;

在大多数情况下,表值参数具有与其他批处理方法等效或更高的性能。In most cases, table-valued parameters have equivalent or better performance than other batching techniques. 人们通常使用表值参数,因为它们比其他选项更灵活。Table-valued parameters are often preferable, because they are more flexible than other options. 例如,其他方法(如 SQL 大容量复制)仅允许插入新行。For example, other techniques, such as SQL bulk copy, only permit the insertion of new rows. 但是使用表值参数,可以在存储过程中使用逻辑来决定更新哪些行和插入哪些行。But with table-valued parameters, you can use logic in the stored procedure to determine which rows are updates and which are inserts. 还可以修改表类型来包含“操作”列,该列指示指定的行应插入、更新还是删除。The table type can also be modified to contain an "Operation" column that indicates whether the specified row should be inserted, updated, or deleted.

下表显示使用表值参数的即席测试结果(毫秒)。The following table shows ad hoc test results for the use of table-valued parameters in milliseconds.

操作Operations 本地到 Azure(毫秒)On-Premises to Azure (ms) 同一 Azure 数据中心(毫秒)Azure same datacenter (ms)
11 124124 3232
1010 131131 2525
100100 338338 5151
10001000 26152615 382382
1000010000 2383023830 35863586

Note

结果并非基准。Results are not benchmarks. 请参阅有关本文中计时结果的注意事项See the note about timing results in this article.

批处理带来的性能提升非常明显。The performance gain from batching is immediately apparent. 在前面的顺序测试中,从数据中心外部执行 1000 个操作花了 129 秒,而从数据中心内部执行该操作花了 21 秒。In the previous sequential test, 1000 operations took 129 seconds outside the datacenter and 21 seconds from within the datacenter. 使用表值参数后,从数据中心外部执行 1000 个操作只花了 2.6 秒,在数据中心内部执行该操作只花了 0.4 秒。But with table-valued parameters, 1000 operations take only 2.6 seconds outside the datacenter and 0.4 seconds within the datacenter.

有关表值参数的详细信息,请参阅表值参数For more information on table-valued parameters, see Table-Valued Parameters.

SQL 批量复制SQL bulk copy

SQL 批量复制是另一种向目标数据库中插入大量数据的方法。SQL bulk copy is another way to insert large amounts of data into a target database. .NET 应用程序可以使用 SqlBulkCopy 类来执行批量插入操作。.NET applications can use the SqlBulkCopy class to perform bulk insert operations. SqlBulkCopy 的功能类似于命令行工具 Bcp.exe,或 Transact-SQL 语句 BULK INSERTSqlBulkCopy is similar in function to the command-line tool, Bcp.exe, or the Transact-SQL statement, BULK INSERT. 以下代码示例显示如何将源 DataTable 中的行 table 批量复制到 SQL Server 中的目标表 MyTable。The following code example shows how to bulk copy the rows in the source DataTable, table, to the destination table in SQL Server, MyTable.

using (SqlConnection connection = new SqlConnection(CloudConfigurationManager.GetSetting("Sql.ConnectionString")))
{
    connection.Open();

    using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
    {
        bulkCopy.DestinationTableName = "MyTable";
        bulkCopy.ColumnMappings.Add("mytext", "mytext");
        bulkCopy.ColumnMappings.Add("num", "num");
        bulkCopy.WriteToServer(table);
    }
}

在某些情况下,批量复制的效果好于表值参数。There are some cases where bulk copy is preferred over table-valued parameters. 请参阅文章表值参数中“表值参数与 BULK INSERT 操作的对比表”。See the comparison table of Table-Valued parameters versus BULK INSERT operations in the article Table-Valued Parameters.

以下即席测试结果显示具有 SqlBulkCopy 的批处理性能(毫秒)。The following ad hoc test results show the performance of batching with SqlBulkCopy in milliseconds.

操作Operations 本地到 Azure(毫秒)On-Premises to Azure (ms) 同一 Azure 数据中心(毫秒)Azure same datacenter (ms)
11 433433 5757
1010 441441 3232
100100 636636 5353
10001000 25352535 341341
1000010000 2160521605 27372737

Note

结果并非基准。Results are not benchmarks. 请参阅有关本文中计时结果的注意事项See the note about timing results in this article.

在较小的批大小中,使用表值参数的效果好于使用 SqlBulkCopy 类的效果。In smaller batch sizes, the use table-valued parameters outperformed the SqlBulkCopy class. 但是,对于涉及 1,000 和 10,000 行的测试,使用 SqlBulkCopy 时比使用表值参数时快 12-31%。However, SqlBulkCopy performed 12-31% faster than table-valued parameters for the tests of 1,000 and 10,000 rows. 与表值参数一样,SqlBulkCopy 是执行批处理插入的一个可选方法,特别是在与非批处理操作的性能作对比时。Like table-valued parameters, SqlBulkCopy is a good option for batched inserts, especially when compared to the performance of non-batched operations.

有关 ADO.NET 中的批量复制的详细信息,请参阅 SQL Server 中的批量复制操作For more information on bulk copy in ADO.NET, see Bulk Copy Operations in SQL Server.

多行参数化 INSERT 语句Multiple-row Parameterized INSERT statements

对于小的批处理,一个替代方法是构造插入多个行的大型参数化 INSERT 语句。One alternative for small batches is to construct a large parameterized INSERT statement that inserts multiple rows. 以下代码示例演示了这个方法。The following code example demonstrates this technique.

using (SqlConnection connection = new SqlConnection(CloudConfigurationManager.GetSetting("Sql.ConnectionString")))
{
    connection.Open();

    string insertCommand = "INSERT INTO [MyTable] ( mytext, num ) " +
        "VALUES (@p1, @p2), (@p3, @p4), (@p5, @p6), (@p7, @p8), (@p9, @p10)";

    SqlCommand cmd = new SqlCommand(insertCommand, connection);

    for (int i = 1; i <= 10; i += 2)
    {
        cmd.Parameters.Add(new SqlParameter("@p" + i.ToString(), "test"));
        cmd.Parameters.Add(new SqlParameter("@p" + (i+1).ToString(), i));
    }

    cmd.ExecuteNonQuery();
}

此示例用于演示基本概念。This example is meant to show the basic concept. 一个更现实的方案是对所需的实体执行循环,以同时构造查询字符串和命令参数。A more realistic scenario would loop through the required entities to construct the query string and the command parameters simultaneously. 最多可使用 2100 个查询参数,因此这限制了可以此方式处理的总行数。You are limited to a total of 2100 query parameters, so this limits the total number of rows that can be processed in this manner.

以下即席测试结果显示此类插入语句的性能(毫秒)。The following ad hoc test results show the performance of this type of insert statement in milliseconds.

操作Operations 表值参数(毫秒)Table-valued parameters (ms) 单语句 INSERT(毫秒)Single-statement INSERT (ms)
11 3232 2020
1010 3030 2525
100100 3333 5151

Note

结果并非基准。Results are not benchmarks. 请参阅有关本文中计时结果的注意事项See the note about timing results in this article.

此方法对于小于 100 行的批处理稍微提高了一下速度。This approach can be slightly faster for batches that are less than 100 rows. 尽管提高不大,但是此方法仍是一个可选方案,它可能在特定的应用程序方案下工作得很好。Although the improvement is small, this technique is another option that might work well in your specific application scenario.

DataAdapterDataAdapter

DataAdapter 类允许修改 DataSet 对象,然后将更改作为 INSERT、UPDATE 和 DELETE 操作提交。The DataAdapter class allows you to modify a DataSet object and then submit the changes as INSERT, UPDATE, and DELETE operations. 如果正在这样使用 DataAdapter,请注意必须为每个不同的操作发出单独的调用。If you are using the DataAdapter in this manner, it is important to note that separate calls are made for each distinct operation. 为了提高性能,请将 UpdateBatchSize 属性值设置为应同时批处理的操作数。To improve performance, use the UpdateBatchSize property to the number of operations that should be batched at a time. 有关详细信息,请参阅使用 DataAdapter 执行批处理操作For more information, see Performing Batch Operations Using DataAdapters.

实体框架Entity framework

实体框架当前不支持批处理。Entity Framework does not currently support batching. 社区中的各个开发人员已尝试寻找解决方法,例如重写 SaveChanges 方法。Different developers in the community have attempted to demonstrate workarounds, such as override the SaveChanges method. 但是这些解决方法通常很复杂,而且仅适用于特定应用程序和数据模型。But the solutions are typically complex and customized to the application and data model. 实体框架 codeplex 项目当前已有关于此功能请求的讨论页。The Entity Framework codeplex project currently has a discussion page on this feature request. 要查看此讨论,请参阅设计会议备忘录 - 2012 年 8 月 2 日To view this discussion, see Design Meeting Notes - August 2, 2012.

XMLXML

为了完整性,我们认为有必要将 XML 作为一种批处理策略来讲述。For completeness, we feel that it is important to talk about XML as a batching strategy. 但是,使用 XML 与其他方法相比没有什么优势,而且还有几个缺点。However, the use of XML has no advantages over other methods and several disadvantages. 此方法类似于表值参数,但是 XML 文件或字符串将传递到存储过程而非用户定义的表。The approach is similar to table-valued parameters, but an XML file or string is passed to a stored procedure instead of a user-defined table. 存储过程分析该存储过程中的命令。The stored procedure parses the commands in the stored procedure.

此方法有几个缺点:There are several disadvantages to this approach:

  • 使用 XML 可能很繁琐,而且容易出错。Working with XML can be cumbersome and error prone.
  • 在数据库上分析 XML 可能占用大量 CPU。Parsing the XML on the database can be CPU-intensive.
  • 在大多数情况下,使用此方法比使用表值参数慢。In most cases, this method is slower than table-valued parameters.

由于上述原因,不建议将 XML 用于批处理查询。For these reasons, the use of XML for batch queries is not recommended.

批处理注意事项Batching considerations

以下部分提供有关在 SQL 数据库应用程序中使用批处理的更多指南。The following sections provide more guidance for the use of batching in SQL Database applications.

权衡Tradeoffs

根据体系结构,批处理可能涉及性能和弹性之间的权衡。Depending on your architecture, batching can involve a tradeoff between performance and resiliency. 例如,请考虑角色意外停止的方案。For example, consider the scenario where your role unexpectedly goes down. 如果丢失一行数据,其影响小于丢失一大批未提交的行的影响。If you lose one row of data, the impact is smaller than the impact of losing a large batch of unsubmitted rows. 在指定的时间窗口内将行发送到数据库前缓冲它们时,丢失数据的风险更大。There is a greater risk when you buffer rows before sending them to the database in a specified time window.

因为要进行权衡,因此需要评估你批处理的操作类型。Because of this tradeoff, evaluate the type of operations that you batch. 对相对不重要的数据进行更激进的批处理(即更大的批次,涉及更长的时间窗口)。Batch more aggressively (larger batches and longer time windows) with data that is less critical.

批大小Batch size

在我们的测试中,将大的批次拆分为更小的块通常没有好处。In our tests, there was typically no advantage to breaking large batches into smaller chunks. 实际上,这种拆分通常导致比提交单个大批次还要慢的性能。In fact, this subdivision often resulted in slower performance than submitting a single large batch. 例如,考虑要插入 1000 行的一个方案。For example, consider a scenario where you want to insert 1000 rows. 下表显示在拆分为更小的批次时使用表值参数插入 1000 行所需的时间。The following table shows how long it takes to use table-valued parameters to insert 1000 rows when divided into smaller batches.

批大小Batch size 迭代Iterations 表值参数(毫秒)Table-valued parameters (ms)
10001000 11 347347
500500 22 355355
100100 1010 465465
5050 2020 630630

Note

结果并非基准。Results are not benchmarks. 请参阅有关本文中计时结果的注意事项See the note about timing results in this article.

可以看到对于 1000 行,在一次提交它们时性能最佳。You can see that the best performance for 1000 rows is to submit them all at once. 在其他测试中(未在此处显示),将 10000 行拆分为两个包含 5000 行的批可略微提高性能。In other tests (not shown here), there was a small performance gain to break a 10000 row batch into two batches of 5000. 但是这些测试的表架构相对简单,因此,应对自己的特定数据和批大小执行测试,以验证这些结果。But the table schema for these tests is relatively simple, so you should perform tests on your specific data and batch sizes to verify these findings.

要考虑的另一个因素是如果总批大小变得太大,SQL 数据库可能限制并拒绝提交该批。Another factor to consider is that if the total batch becomes too large, SQL Database might throttle and refuse to commit the batch. 为了获得最佳结果,请测试特定方案来确定哪个批大小更合适。For the best results, test your specific scenario to determine if there is an ideal batch size. 使批大小在运行时是可配置的,以允许基于性能或错误进行快速调整。Make the batch size configurable at runtime to enable quick adjustments based on performance or errors.

最后,平衡批大小和与批处理有关的风险。Finally, balance the size of the batch with the risks associated with batching. 如果出现暂时性错误或角色失败,请考虑重试操作或丢失批中数据的后果。If there are transient errors or the role fails, consider the consequences of retrying the operation or of losing the data in the batch.

并行处理Parallel processing

如果采用减小批大小但是使用多个线程来执行工作怎么样?What if you took the approach of reducing the batch size but used multiple threads to execute the work? 我们的测试再次显示几个较小的多线程批次的性能通常比单个较大的批次性能差。Again, our tests showed that several smaller multithreaded batches typically performed worse than a single larger batch. 以下测试尝试在一个或多个并行批次中插入 1000 行。The following test attempts to insert 1000 rows in one or more parallel batches. 此测试显示多个同时执行的批次实际上降低了性能。This test shows how more simultaneous batches actually decreased performance.

批大小 [迭代]Batch size [Iterations] 两个线程(毫秒)Two threads (ms) 四个线程(毫秒)Four threads (ms) 六个线程(毫秒)Six threads (ms)
1000 [1]1000 [1] 277277 315315 266266
500 [2]500 [2] 548548 278278 256256
250 [4]250 [4] 405405 329329 265265
100 [10]100 [10] 488488 439439 391391

Note

结果并非基准。Results are not benchmarks. 请参阅有关本文中计时结果的注意事项See the note about timing results in this article.

并行度导致性能下降可能有以下几个原因:There are several potential reasons for the degradation in performance due to parallelism:

  • 有多个同时执行的网络调用而非一个。There are multiple simultaneous network calls instead of one.
  • 针对一个表的多个操作可能导致争用和阻塞。Multiple operations against a single table can result in contention and blocking.
  • 存在与多线程关联的开销。There are overheads associated with multithreading.
  • 打开多个连接的开销超过了并行处理带来的好处。The expense of opening multiple connections outweighs the benefit of parallel processing.

如果针对不同的表或数据库,可能发现使用此策略会提高一点性能。If you target different tables or databases, it is possible to see some performance gain with this strategy. 在数据库分片或联合方案下,可能使用此方法。Database sharding or federations would be a scenario for this approach. 分片使用多个数据库并将不同数据路由到每个数据库。Sharding uses multiple databases and routes different data to each database. 如果每个小批次针对不同数据库,则并行执行操作可能更有效。If each small batch is going to a different database, then performing the operations in parallel can be more efficient. 但是,性能提升并不明显,无法由此决定在解决方案中使用数据库分片。However, the performance gain is not significant enough to use as the basis for a decision to use database sharding in your solution.

在一些设计中,并行执行较小的批次可能导致将提高吞储量的请求置于负荷不大的系统中。In some designs, parallel execution of smaller batches can result in improved throughput of requests in a system under load. 在这种情况下,即使处理单个更大的批次更快,并行处理多个批也可能更有效。In this case, even though it is quicker to process a single larger batch, processing multiple batches in parallel might be more efficient.

如果使用并行执行,请考虑控制最大工作线程数。If you do use parallel execution, consider controlling the maximum number of worker threads. 较小的数可能导致争用减少并且执行时间缩短。A smaller number might result in less contention and a faster execution time. 此外,请注意这会增加目标数据库的连接和事务负载。Also, consider the additional load that this places on the target database both in connections and transactions.

有关数据库性能的通常准则也影响批处理。Typical guidance on database performance also affects batching. 例如,对于具有大的主键或很多非聚集索引的表,插入性能会下降。For example, insert performance is reduced for tables that have a large primary key or many nonclustered indexes.

如果表值参数使用存储过程,可以在该过程开头使用命令 SET NOCOUNT ONIf table-valued parameters use a stored procedure, you can use the command SET NOCOUNT ON at the beginning of the procedure. 此语句禁止返回过程中受影响的行的计数。This statement suppresses the return of the count of the affected rows in the procedure. 但是,在我们的测试中,使用 SET NOCOUNT ON 对性能没有影响或导致性能下降。However, in our tests, the use of SET NOCOUNT ON either had no effect or decreased performance. 测试存储过程很简单,它只有来自表值参数的一个 INSERT 命令。The test stored procedure was simple with a single INSERT command from the table-valued parameter. 更复杂的存储过程可能从此语句受益。It is possible that more complex stored procedures would benefit from this statement. 但是不要认为将 SET NOCOUNT ON 添加到你的存储过程会自动提高性能。But don't assume that adding SET NOCOUNT ON to your stored procedure automatically improves performance. 为了了解该影响,请用包含和不包含 SET NOCOUNT ON 语句来测试存储过程。To understand the effect, test your stored procedure with and without the SET NOCOUNT ON statement.

批处理方案Batching scenarios

以下部分说明如何在三种应用程序方案下使用表值参数。The following sections describe how to use table-valued parameters in three application scenarios. 第一种方案显示缓冲和批处理如何可以配合工作。The first scenario shows how buffering and batching can work together. 第二种方案通过在单个存储过程调用中执行主-从操作来提高性能。The second scenario improves performance by performing master-detail operations in a single stored procedure call. 最后一种方案显示如何在“UPSERT”操作中使用表值参数。The final scenario shows how to use table-valued parameters in an "UPSERT" operation.

缓冲Buffering

尽管一些方案中使用批处理很合适,但是也有很多方案下可以通过延迟处理来利用批处理。Although there are some scenarios that are obvious candidate for batching, there are many scenarios that could take advantage of batching by delayed processing. 但是,在出现意外故障时,延迟处理会使数据丢失的风险加大。However, delayed processing also carries a greater risk that the data is lost in the event of an unexpected failure. 请务必了解此风险并考虑后果。It is important to understand this risk and consider the consequences.

例如,假设有一个跟踪每个用户的导航历史记录的 Web 应用程序。For example, consider a web application that tracks the navigation history of each user. 对于每个页请求,该应用程序将创建一个数据库调用来记录用户的页面浏览量。On each page request, the application could make a database call to record the user's page view. 但是可以通过缓冲用户的导航活动,然后成批将此数据发送到数据库来提高性能和缩放性。But higher performance and scalability can be achieved by buffering the users' navigation activities and then sending this data to the database in batches. 可用经历的时间和/或缓冲区大小来触发数据库更新。You can trigger the database update by elapsed time and/or buffer size. 例如,一个规则可能指定在 20 秒后或缓冲区达到 1000 项时应处理批。For example, a rule could specify that the batch should be processed after 20 seconds or when the buffer reaches 1000 items.

以下代码示例使用反应扩展 - Rx 来处理监视类引发的缓冲事件。The following code example uses Reactive Extensions - Rx to process buffered events raised by a monitoring class. 当缓冲区已满或达到超时值时,使用表值参数将该批用户数据发送到数据库。When the buffer fills or a timeout is reached, the batch of user data is sent to the database with a table-valued parameter.

以下 NavHistoryData 类对用户导航详细信息建模。The following NavHistoryData class models the user navigation details. 它包含基本信息,如用户标识符、访问的 URL 和访问时间。It contains basic information such as the user identifier, the URL accessed, and the access time.

public class NavHistoryData
{
    public NavHistoryData(int userId, string url, DateTime accessTime)
    { UserId = userId; URL = url; AccessTime = accessTime; }
    public int UserId { get; set; }
    public string URL { get; set; }
    public DateTime AccessTime { get; set; }
}

NavHistoryDataMonitor 类负责将用户导航数据缓冲到数据库。The NavHistoryDataMonitor class is responsible for buffering the user navigation data to the database. 它包含一个方法 RecordUserNavigationEntry,该方法通过引发 OnAdded 事件来响应。It contains a method, RecordUserNavigationEntry, which responds by raising an OnAdded event. 以下代码显示一个构造函数逻辑,它使用 Rx 基于该事件来创建可查看的集合。The following code shows the constructor logic that uses Rx to create an observable collection based on the event. 然后它使用 Buffer 方法来订阅这个可查看的集合。It then subscribes to this observable collection with the Buffer method. 该重载指定应每隔 20 秒或 1000 项发送一次缓冲区。The overload specifies that the buffer should be sent every 20 seconds or 1000 entries.

public NavHistoryDataMonitor()
{
    var observableData =
        Observable.FromEventPattern<NavHistoryDataEventArgs>(this, "OnAdded");

    observableData.Buffer(TimeSpan.FromSeconds(20), 1000).Subscribe(Handler);
}

处理程序将所有缓冲的项转换为表值类型,并将此类型传递到处理该批的存储过程。The handler converts all of the buffered items into a table-valued type and then passes this type to a stored procedure that processes the batch. 以下代码显示 NavHistoryDataEventArgs 和 NavHistoryDataMonitor 类的完整定义。The following code shows the complete definition for both the NavHistoryDataEventArgs and the NavHistoryDataMonitor classes.

public class NavHistoryDataEventArgs : System.EventArgs
{
    public NavHistoryDataEventArgs(NavHistoryData data) { Data = data; }
    public NavHistoryData Data { get; set; }
}

public class NavHistoryDataMonitor
{
    public event EventHandler<NavHistoryDataEventArgs> OnAdded;

    public NavHistoryDataMonitor()
    {
        var observableData =
            Observable.FromEventPattern<NavHistoryDataEventArgs>(this, "OnAdded");

        observableData.Buffer(TimeSpan.FromSeconds(20), 1000).Subscribe(Handler);
    }

处理程序将所有缓冲的项转换为表值类型,并将此类型传递到处理该批的存储过程。The handler converts all of the buffered items into a table-valued type and then passes this type to a stored procedure that processes the batch. 以下代码显示 NavHistoryDataEventArgs 和 NavHistoryDataMonitor 类的完整定义。The following code shows the complete definition for both the NavHistoryDataEventArgs and the NavHistoryDataMonitor classes.

    public class NavHistoryDataEventArgs : System.EventArgs
    {
        if (OnAdded != null)
            OnAdded(this, new NavHistoryDataEventArgs(data));
    }

    protected void Handler(IList<EventPattern<NavHistoryDataEventArgs>> items)
    {
        DataTable navHistoryBatch = new DataTable("NavigationHistoryBatch");
        navHistoryBatch.Columns.Add("UserId", typeof(int));
        navHistoryBatch.Columns.Add("URL", typeof(string));
        navHistoryBatch.Columns.Add("AccessTime", typeof(DateTime));
        foreach (EventPattern<NavHistoryDataEventArgs> item in items)
        {
            NavHistoryData data = item.EventArgs.Data;
            navHistoryBatch.Rows.Add(data.UserId, data.URL, data.AccessTime);
        }

        using (SqlConnection connection = new SqlConnection(CloudConfigurationManager.GetSetting("Sql.ConnectionString")))
        {
            connection.Open();

            SqlCommand cmd = new SqlCommand("sp_RecordUserNavigation", connection);
            cmd.CommandType = CommandType.StoredProcedure;

            cmd.Parameters.Add(
                new SqlParameter()
                {
                    ParameterName = "@NavHistoryBatch",
                    SqlDbType = SqlDbType.Structured,
                    TypeName = "NavigationHistoryTableType",
                    Value = navHistoryBatch,
                });

            cmd.ExecuteNonQuery();
        }
    }
}

为了使用此缓冲类,应用程序会创建静态 NavHistoryDataMonitor 对象。To use this buffering class, the application creates a static NavHistoryDataMonitor object. 每次用户访问页时,该应用程序都会调用 NavHistoryDataMonitor.RecordUserNavigationEntry 方法。Each time a user accesses a page, the application calls the NavHistoryDataMonitor.RecordUserNavigationEntry method. 缓冲逻辑继续执行,以将这些项成批发送到数据库。The buffering logic proceeds to take care of sending these entries to the database in batches.

主从Master detail

表值参数对于简单 INSERT 方案很有用。Table-valued parameters are useful for simple INSERT scenarios. 但是,它对于涉及多个表的成批插入作用不大。However, it can be more challenging to batch inserts that involve more than one table. “主/从”方案是一个很好的示例。The "master/detail" scenario is a good example. 主表标识主实体。The master table identifies the primary entity. 一个或多个从表存储有关实体的更多数据。One or more detail tables store more data about the entity. 在此方案中,外键关系将详细信息的关系强制实施到唯一的主实体。In this scenario, foreign key relationships enforce the relationship of details to a unique master entity. 请考虑 PurchaseOrder 表的简化版本和它的关联 OrderDetail 表。Consider a simplified version of a PurchaseOrder table and its associated OrderDetail table. 以下 Transact-SQL 创建包含以下四个列的 PurchaseOrder 表:OrderID、OrderDate、CustomerID 和 Status。The following Transact-SQL creates the PurchaseOrder table with four columns: OrderID, OrderDate, CustomerID, and Status.

CREATE TABLE [dbo].[PurchaseOrder](
[OrderID] [int] IDENTITY(1,1) NOT NULL,
[OrderDate] [datetime] NOT NULL,
[CustomerID] [int] NOT NULL,
[Status] [nvarchar](50) NOT NULL,
CONSTRAINT [PrimaryKey_PurchaseOrder] 
PRIMARY KEY CLUSTERED ( [OrderID] ASC ))

每个订单包含一个或多个产品采购。Each order contains one or more product purchases. 此信息存储在 PurchaseOrderDetail 表中。This information is captured in the PurchaseOrderDetail table. 以下 Transact-SQL 创建包含以下五个列的 PurchaseOrderDetail 表:OrderID、OrderDetailID、ProductID、UnitPrice 和 OrderQty。The following Transact-SQL creates the PurchaseOrderDetail table with five columns: OrderID, OrderDetailID, ProductID, UnitPrice, and OrderQty.

CREATE TABLE [dbo].[PurchaseOrderDetail](
[OrderID] [int] NOT NULL,
[OrderDetailID] [int] IDENTITY(1,1) NOT NULL,
[ProductID] [int] NOT NULL,
[UnitPrice] [money] NULL,
[OrderQty] [smallint] NULL,
CONSTRAINT [PrimaryKey_PurchaseOrderDetail] PRIMARY KEY CLUSTERED 
( [OrderID] ASC, [OrderDetailID] ASC ))

OrderID 表中的 PurchaseOrderDetail 列必须引用 PurchaseOrder 表的订单。The OrderID column in the PurchaseOrderDetail table must reference an order from the PurchaseOrder table. 外键的以下定义实施此约束。The following definition of a foreign key enforces this constraint.

ALTER TABLE [dbo].[PurchaseOrderDetail]  WITH CHECK ADD 
CONSTRAINT [FK_OrderID_PurchaseOrder] FOREIGN KEY([OrderID])
REFERENCES [dbo].[PurchaseOrder] ([OrderID])

为了使用表值参数,对每个目标表必须具有一个用户定义的表类型。In order to use table-valued parameters, you must have one user-defined table type for each target table.

CREATE TYPE PurchaseOrderTableType AS TABLE 
( OrderID INT,
    OrderDate DATETIME,
    CustomerID INT,
    Status NVARCHAR(50) );
GO

CREATE TYPE PurchaseOrderDetailTableType AS TABLE 
( OrderID INT,
    ProductID INT,
    UnitPrice MONEY,
    OrderQty SMALLINT );
GO

然后定义接受这些类型的表的存储过程。Then define a stored procedure that accepts tables of these types. 此过程允许应用程序对单个调用中的一组订单和订单详细信息在本地进行批处理。This procedure allows an application to locally batch a set of orders and order details in a single call. 以下 Transact-SQL 提供此采购订单示例的完整存储过程声明。The following Transact-SQL provides the complete stored procedure declaration for this purchase order example.

CREATE PROCEDURE sp_InsertOrdersBatch (
@orders as PurchaseOrderTableType READONLY,
@details as PurchaseOrderDetailTableType READONLY )
AS
SET NOCOUNT ON;

-- Table that connects the order identifiers in the @orders
-- table with the actual order identifiers in the PurchaseOrder table
DECLARE @IdentityLink AS TABLE ( 
SubmittedKey int, 
ActualKey int, 
RowNumber int identity(1,1)
);

-- Add new orders to the PurchaseOrder table, storing the actual
-- order identifiers in the @IdentityLink table   
INSERT INTO PurchaseOrder ([OrderDate], [CustomerID], [Status])
OUTPUT inserted.OrderID INTO @IdentityLink (ActualKey)
SELECT [OrderDate], [CustomerID], [Status] FROM @orders ORDER BY OrderID;

-- Match the passed-in order identifiers with the actual identifiers
-- and complete the @IdentityLink table for use with inserting the details
WITH OrderedRows As (
SELECT OrderID, ROW_NUMBER () OVER (ORDER BY OrderID) As RowNumber 
FROM @orders
)
UPDATE @IdentityLink SET SubmittedKey = M.OrderID
FROM @IdentityLink L JOIN OrderedRows M ON L.RowNumber = M.RowNumber;

-- Insert the order details into the PurchaseOrderDetail table, 
-- using the actual order identifiers of the master table, PurchaseOrder
INSERT INTO PurchaseOrderDetail (
[OrderID],
[ProductID],
[UnitPrice],
[OrderQty] )
SELECT L.ActualKey, D.ProductID, D.UnitPrice, D.OrderQty
FROM @details D
JOIN @IdentityLink L ON L.SubmittedKey = D.OrderID;
GO

在此示例中,本地定义的 @IdentityLink 表存储新插入的行的实际 OrderID 值。In this example, the locally defined @IdentityLink table stores the actual OrderID values from the newly inserted rows. 这些订单标识符与 @orders 和 @details 表值参数中的临时 OrderID 值不同。These order identifiers are different from the temporary OrderID values in the @orders and @details table-valued parameters. 因此,@IdentityLink 表然后将 @orders 参数的 OrderID 值与 PurchaseOrder 表中新行的实际 OrderID 值关联。For this reason, the @IdentityLink table then connects the OrderID values from the @orders parameter to the real OrderID values for the new rows in the PurchaseOrder table. 执行此步骤后,@IdentityLink 表可以通过满足外键约束的实际 OrderID 方便插入订单详细信息。After this step, the @IdentityLink table can facilitate inserting the order details with the actual OrderID that satisfies the foreign key constraint.

可以从代码或其他 Transact-SQL 调用使用此存储过程。This stored procedure can be used from code or from other Transact-SQL calls. 有关代码示例,请参阅本文的表值参数部分。See the table-valued parameters section of this paper for a code example. 以下 Transact-SQL 显示如何调用 sp_InsertOrdersBatch。The following Transact-SQL shows how to call the sp_InsertOrdersBatch.

declare @orders as PurchaseOrderTableType
declare @details as PurchaseOrderDetailTableType

INSERT @orders 
([OrderID], [OrderDate], [CustomerID], [Status])
VALUES(1, '1/1/2013', 1125, 'Complete'),
(2, '1/13/2013', 348, 'Processing'),
(3, '1/12/2013', 2504, 'Shipped')

INSERT @details
([OrderID], [ProductID], [UnitPrice], [OrderQty])
VALUES(1, 10, $11.50, 1),
(1, 12, $1.58, 1),
(2, 23, $2.57, 2),
(3, 4, $10.00, 1)

exec sp_InsertOrdersBatch @orders, @details

此解决方案允许每个批使用从 1 开始的一组 OrderID 值。This solution allows each batch to use a set of OrderID values that begin at 1. 这些临时 OrderID 值描述批中的关系,但是在执行插入操作时确定实际 OrderID 值。These temporary OrderID values describe the relationships in the batch, but the actual OrderID values are determined at the time of the insert operation. 可以重复运行前一示例中的相同语句,在数据库中生成唯一订单。You can run the same statements in the previous example repeatedly and generate unique orders in the database. 为此,请考虑在使用此批处理方法时添加防止重复订单的更多代码或数据库逻辑。For this reason, consider adding more code or database logic that prevents duplicate orders when using this batching technique.

此示例演示使用表值参数可以成批执行更复杂的数据库操作(如主-从操作)。This example demonstrates that even more complex database operations, such as master-detail operations, can be batched using table-valued parameters.

UPSERTUPSERT

另一批处理方案涉及同时更新现有行和插入新行。Another batching scenario involves simultaneously updating existing rows and inserting new rows. 此操作有时称为“UPSERT”(更新 + 插入)操作。This operation is sometimes referred to as an "UPSERT" (update + insert) operation. 不用单独调用 INSERT 和 UPDATE,MERGE 语句最适合此任务。Rather than making separate calls to INSERT and UPDATE, the MERGE statement is best suited to this task. MERGE 语句可以在单个调用中执行插入和更新操作。The MERGE statement can perform both insert and update operations in a single call.

可以将表值参数用于 MERGE 语句以执行更新和插入。Table-valued parameters can be used with the MERGE statement to perform updates and inserts. 例如,请考虑使用包含以下列的简化 Employee 表:EmployeeID、FirstName、LastName、SocialSecurityNumber:For example, consider a simplified Employee table that contains the following columns: EmployeeID, FirstName, LastName, SocialSecurityNumber:

CREATE TABLE [dbo].[Employee](
[EmployeeID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [nvarchar](50) NOT NULL,
[LastName] [nvarchar](50) NOT NULL,
[SocialSecurityNumber] [nvarchar](50) NOT NULL,
CONSTRAINT [PrimaryKey_Employee] PRIMARY KEY CLUSTERED 
([EmployeeID] ASC ))

在此示例中,可以使用 SocialSecurityNumber 是唯一的这个事实来执行多个员工的 MERGE。In this example, you can use the fact that the SocialSecurityNumber is unique to perform a MERGE of multiple employees. 首先,创建用户定义的表类型:First, create the user-defined table type:

CREATE TYPE EmployeeTableType AS TABLE 
( Employee_ID INT,
    FirstName NVARCHAR(50),
    LastName NVARCHAR(50),
    SocialSecurityNumber NVARCHAR(50) );
GO

接下来,创建一个使用 MERGE 语句的存储过程或编写包含该语句的代码来执行更新和插入。Next, create a stored procedure or write code that uses the MERGE statement to perform the update and insert. 以下示例使用针对类型为 EmployeeTableType 的表值参数 @employees 的 MERGE 语句。The following example uses the MERGE statement on a table-valued parameter, @employees, of type EmployeeTableType. @employees 表的内容未在此处显示。The contents of the @employees table are not shown here.

MERGE Employee AS target
USING (SELECT [FirstName], [LastName], [SocialSecurityNumber] FROM @employees) 
AS source ([FirstName], [LastName], [SocialSecurityNumber])
ON (target.[SocialSecurityNumber] = source.[SocialSecurityNumber])
WHEN MATCHED THEN 
UPDATE SET
target.FirstName = source.FirstName, 
target.LastName = source.LastName
WHEN NOT MATCHED THEN
    INSERT ([FirstName], [LastName], [SocialSecurityNumber])
    VALUES (source.[FirstName], source.[LastName], source.[SocialSecurityNumber]);

有关详细信息,请参阅 MERGE 语句的文档和示例。For more information, see the documentation and examples for the MERGE statement. 尽管可以在包含单独 INSERT 和 UPDATE 操作的多步骤存储过程调用中完成同样的工作,但是 MERGE 语句更有效。Although the same work could be performed in a multiple-step stored procedure call with separate INSERT and UPDATE operations, the MERGE statement is more efficient. 数据库代码还可以构造直接使用 MERGE 语句的 Transact-SQL 调用而无需对 INSERT 和 UPDATE 使用两个数据库调用。Database code can also construct Transact-SQL calls that use the MERGE statement directly without requiring two database calls for INSERT and UPDATE.

建议摘要Recommendation summary

以下列表提供了本文中讨论的批处理建议的摘要:The following list provides a summary of the batching recommendations discussed in this article:

  • 使用缓冲和批处理可提高 SQL 数据库应用程序的性能和缩放性。Use buffering and batching to increase the performance and scalability of SQL Database applications.
  • 了解批处理/缓冲和弹性之间的权衡问题。Understand the tradeoffs between batching/buffering and resiliency. 在角色失败期间,可能遗失一批尚未处理的商务关键数据,这种风险超过批处理带来的性能优点。During a role failure, the risk of losing an unprocessed batch of business-critical data might outweigh the performance benefit of batching.
  • 尝试将所有数据库调用纳入单一数据中心以缩短延迟。Attempt to keep all calls to the database within a single datacenter to reduce latency.
  • 如果选择单个批处理方法,使用表值参数可实现最佳性能和灵活性。If you choose a single batching technique, table-valued parameters offer the best performance and flexibility.
  • 要实现最快速的插入性能,请遵循以下常规准则,但是要针对方案进行测试:For the fastest insert performance, follow these general guidelines but test your scenario:
    • 对于 < 100 行,使用单个参数化的 INSERT 命令。For < 100 rows, use a single parameterized INSERT command.
    • 对于 < 1000 行,使用表值参数。For < 1000 rows, use table-valued parameters.
    • 对于 >= 1000 行,使用 SqlBulkCopy。For >= 1000 rows, use SqlBulkCopy.
  • 对于更新和删除操作,请将表值参数用于存储过程逻辑,该逻辑确定对表参数中每行的正确操作。For update and delete operations, use table-valued parameters with stored procedure logic that determines the correct operation on each row in the table parameter.
  • 批大小准则:Batch size guidelines:
    • 使用可满足应用程序和业务需求的最大批大小。Use the largest batch sizes that make sense for your application and business requirements.
    • 掌握好大批次带来的性能提升与临时或灾难性故障的风险之间的平衡。Balance the performance gain of large batches with the risks of temporary or catastrophic failures. 批中数据重试或丢失的后果是什么?What is the consequence of retries or loss of the data in the batch?
    • 测试最大批大小,以验证 SQL 数据库不拒绝它。Test the largest batch size to verify that SQL Database does not reject it.
    • 创建控制批处理的配置设置,如批大小或缓冲时间窗口。Create configuration settings that control batching, such as the batch size or the buffering time window. 这些设置提供灵活性。These settings provide flexibility. 可以在生产中更改批处理行为,而无需重新部署云服务。You can change the batching behavior in production without redeploying the cloud service.
  • 避免并行执行对一个数据库中单个表进行操作的批处理。Avoid parallel execution of batches that operate on a single table in one database. 如果选择将单个批分配给多个工作线程,请运行测试来确定理想的线程数。If you do choose to divide a single batch across multiple worker threads, run tests to determine the ideal number of threads. 在达到某个阈值后,线程增加将导致性能下降而非提升。After an unspecified threshold, more threads will decrease performance rather than increase it.
  • 请考虑对大小和时间进行缓冲,为更多方案实现批处理。Consider buffering on size and time as a way of implementing batching for more scenarios.

后续步骤Next steps

本文着重于与批处理相关的数据库设计和代码编写技术,以及如何改善应用程序的性能和缩放性。This article focused on how database design and coding techniques related to batching can improve your application performance and scalability. 但这只是整体策略中的一个因素。But this is just one factor in your overall strategy. 有关其他可改善性能和缩放性的方式,请参阅 Azure SQL 数据库的单一数据库性能指导弹性池的价格和性能注意事项For more ways to improve performance and scalability, see Azure SQL Database performance guidance for single databases and Price and performance considerations for an elastic pool.