Azure Cosmos DB Gremlin 服务器响应标头Azure Cosmos DB Gremlin server response headers

本文介绍了执行请求时 Cosmos DB Gremlin 服务器返回给调用方的标头。This article covers headers that Cosmos DB Gremlin server returns to the caller upon request execution. 这些标头可用于排查请求性能问题,生成与 Cosmos DB 服务进行本机集成的应用程序,以及简化客户支持。These headers are useful for troubleshooting request performance, building application that integrates natively with Cosmos DB service and simplifying customer support.

请记住,如果依赖这些标头,应用程序到其他 Gremlin 实现的可移植性将会受到限制。Keep in mind that taking dependency on these headers you are limiting portability of your application to other Gremlin implementations. 好处是能够与 Cosmos DB Gremlin 更紧密地集成。In return, you are gaining tighter integration with Cosmos DB Gremlin. 这些标头不属于 TinkerPop 标准。These headers are not a TinkerPop standard.

标头Headers

标头Header 类型Type 示例值Sample Value 如果包含When Included 说明Explanation
x-ms-request-chargex-ms-request-charge doubledouble 11.324311.3243 Success 和 FailureSuccess and Failure 部分响应消息使用的集合量或数据库吞吐量,以请求单位数(RU/秒或 RU)表示。Amount of collection or database throughput consumed in request units (RU/s or RUs) for a partial response message. 此标头存在于具有多个块的请求的每次延续中。This header is present in every continuation for requests that have multiple chunks. 它反映特定响应区块的费用。It reflects the charge of a particular response chunk. 仅对于由单个响应区块的请求而言,此标头将与遍历的总成本相匹配。Only for requests that consist of a single response chunk this header matches total cost of traversal. 但对于大多数复杂的遍历而言,此值表示部分成本。However, for majority of complex traversals this value represents a partial cost.
x-ms-total-request-chargex-ms-total-request-charge doubledouble 423.987423.987 Success 和 FailureSuccess and Failure 整个请求使用的集合量或数据库吞吐量,以请求单位数(RU/秒或 RU)表示。Amount of collection or database throughput consumed in request units (RU/s or RUs) for entire request. 此标头存在于具有多个块的请求的每次延续中。This header is present in every continuation for requests that have multiple chunks. 它表示自请求开始后的累积费用。It indicates cumulative charge since the beginning of request. 最后一个区块中此标头的值表示完整请求费用。Value of this header in the last chunk indicates complete request charge.
x-ms-server-time-msx-ms-server-time-ms doubledouble 13.7513.75 Success 和 FailureSuccess and Failure 此标头用于延迟故障排除目的。This header is included for latency troubleshooting purposes. 它表示 Cosmos DB Gremlin 服务器执行并生成部分响应消息所花费的时间(以毫秒为单位)。It indicates the amount of time, in milliseconds, that Cosmos DB Gremlin server took to execute and produce a partial response message. 应用程序可以使用此标头的值并将其与整体请求延迟进行比较来计算网络延迟开销。Using value of this header and comparing it to overall request latency applications can calculate network latency overhead.
x-ms-total-server-time-msx-ms-total-server-time-ms doubledouble 130.512130.512 Success 和 FailureSuccess and Failure Cosmos DB Gremlin 服务器执行整个遍历所花费的总时间(以毫秒为单位)。Total time, in milliseconds, that Cosmos DB Gremlin server took to execute entire traversal. 此标头包含在每个部分响应中。This header is included in every partial response. 它表示自请求开始后的累积执行时间。It represents cumulative execution time since the start of request. 最后一个响应表示总执行时间。The last response indicates total execution time. 此标头可用于区分作为延迟来源的客户端和服务器。This header is useful to differentiate between client and server as a source of latency. 可将客户端上的遍历执行时间与此标头的值进行比较。You can compare traversal execution time on the client to the value of this header.
x-ms-status-codex-ms-status-code longlong 200200 Success 和 FailureSuccess and Failure 标头表示请求完成或终止的内部原因。Header indicates internal reason for request completion or termination. 建议在应用程序中检查此标头的值并采取纠正措施。Application is advised to look at the value of this header and take corrective action.
x-ms-substatus-codex-ms-substatus-code longlong 10031003 仅限故障Failure Only Cosmos DB 是在统一存储层之上构建的多模型数据库。Cosmos DB is a multi-model database that is built on top of unified storage layer. 此标头包含当高可用性堆栈的较低层中发生故障时有关故障原因的其他见解。This header contains additional insights about the failure reason when failure occurs within lower layers of high availability stack. 建议在应用程序中存储此标头,并在联系 Cosmos DB 客户支持人员时使用此标头。Application is advised to store this header and use it when contacting Cosmos DB customer support. Cosmos DB 工程师可以使用此标头的值快速进行故障排除。Value of this header is useful for Cosmos DB engineer for quick troubleshooting.
x-ms-retry-after-msx-ms-retry-after-ms 字符串 (TimeSpan)string (TimeSpan) "00:00:03.9500000""00:00:03.9500000" 仅限故障Failure Only 此标头是 .NET TimeSpan 类型的字符串表示形式。This header is a string representation of a .NET TimeSpan type. 此值仅包含在由于耗尽预配吞吐量而失败的请求中。This value will only be included in requests failed due provisioned throughput exhaustion. 在指定的时间段后,应用程序应再次重新提交遍历。Application should resubmit traversal again after instructed period of time.
x-ms-activity-idx-ms-activity-id 字符串 (Guid)string (Guid) "A9218E01-3A3A-4716-9636-5BD86B056613""A9218E01-3A3A-4716-9636-5BD86B056613" Success 和 FailureSuccess and Failure 标头包含请求的唯一服务器端标识符。Header contains a unique server-side identifier of a request. 服务器将为每个请求分配一个唯一标识符用于跟踪。Each request is assigned a unique identifier by the server for tracking purposes. 应用程序应该记录服务器针对请求返回的活动标识符,客户在联系客户支持人员时可以使用这些标识符。Applications should log activity identifiers returned by the server for requests that customers may want to contact customer support about. Cosmos DB 支持人员可在 Cosmos DB 服务遥测数据中按这些标识符查找特定的请求。Cosmos DB support personnel can find specific requests by these identifiers in Cosmos DB service telemetry.

状态代码Status codes

下面列出了服务器返回的最常见状态代码。Most common status codes returned by the server are listed below.

状态Status 说明Explanation
401401 当身份验证密码与 Cosmos DB 帐户密钥不匹配时,将返回错误消息 "Unauthorized: Invalid credentials provided"Error message "Unauthorized: Invalid credentials provided" is returned when authentication password doesn't match Cosmos DB account key. 在 Azure 门户中导航到你的 Cosmos DB Gremlin 帐户,并确认密钥正确。Navigate to your Cosmos DB Gremlin account in the Azure portal and confirm that the key is correct.
404404 尝试同时删除和更新相同的边或顶点的并发操作。Concurrent operations that attempt to delete and update the same edge or vertex simultaneously. 错误消息 "Owner resource does not exist" 指示,在 /dbs/<database name>/colls/<collection or graph name> 格式的连接参数中指定的数据库或集合不正确。Error message "Owner resource does not exist" indicates that specified database or collection is incorrect in connection parameters in /dbs/<database name>/colls/<collection or graph name> format.
408408 "Server timeout" 表示遍历花费的时间超过 30 秒,因此被服务器取消。"Server timeout" indicates that traversal took more than 30 seconds and was canceled by the server. 通过以下方式优化遍历,使之快速运行:在每个遍历跃点上筛选顶点或边缘,以缩小搜索范围。Optimize your traversals to run quickly by filtering vertices or edges on every hop of traversal to narrow down search scope.
409409 "Conflicting request to resource has been attempted. Retry to avoid conflicts." 如果图中已存在带标识符的顶点或边缘,通常会出现这种情况。"Conflicting request to resource has been attempted. Retry to avoid conflicts." This usually happens when vertex or an edge with an identifier already exists in the graph.
412412 状态代码带有补充性的错误消息 "PreconditionFailedException": One of the specified pre-condition is not metStatus code is complemented with error message "PreconditionFailedException": One of the specified pre-condition is not met. 此错误表示在读取边缘或顶点与将它在修改后写回存储区这两个操作之间存在开放式并发控制冲突。This error is indicative of an optimistic concurrency control violation between reading an edge or vertex and writing it back to the store after modification. 此错误通常发生在修改属性后,例如 g.V('identifier').property('name','value')Most common situations when this error occurs is property modification, for example g.V('identifier').property('name','value'). Gremlin 引擎会读取顶点,修改顶点,然后将其写回。Gremlin engine would read the vertex, modify it, and write it back. 如果另一个并行运行的遍历尝试写入同一顶点或边缘,该顶点或边缘将收到此错误。If there is another traversal running in parallel trying to write the same vertex or an edge, one of them will receive this error. 应用程序应再次向服务器提交遍历。Application should submit traversal to the server again.
429429 请求受到了限制,应在达到 x-ms-retry-after-ms 中的值后重试 Request was throttled and should be retried after value in x-ms-retry-after-ms
500500 包含 "NotFoundException: Entity with the specified id does not exist in the system." 的错误消息指示已使用相同的名称重新创建数据库和/或集合。Error message that contains "NotFoundException: Entity with the specified id does not exist in the system." indicates that a database and/or collection was re-created with the same name. 当更改传播并使不同 Cosmos DB 组件中的缓存失效时,此错误将在 5 分钟内消失。This error will disappear within 5 minutes as change propagates and invalidates caches in different Cosmos DB components. 若要避免此问题,请每次都使用唯一的数据库名称和集合名称。To avoid this issue, use unique database and collection names every time.
10001000 当服务器成功分析了消息但无法执行时,将返回此状态代码。This status code is returned when server successfully parsed a message but wasn't able to execute. 这通常表示查询存在问题。It usually indicates a problem with the query.
10011001 当服务器完成遍历执行但无法将响应序列化回到客户端时,将返回此代码。This code is returned when server completes traversal execution but fails to serialize response back to the client. 当遍历生成太大或不符合 TinkerPop 协议规范的复杂结果时,可能会发生此错误。This error can happen when traversal generates complex result, that is too large or does not conform to TinkerPop protocol specification. 应用程序在遇到此错误时应简化遍历。Application should simplify the traversal when it encounters this error.
10031003 当遍历超过允许的内存限制时,将返回 "Query exceeded memory limit. Bytes Consumed: XXX, Max: YYY""Query exceeded memory limit. Bytes Consumed: XXX, Max: YYY" is returned when traversal exceeds allowed memory limit. 每个遍历的内存限制为 2 GBMemory limit is 2 GB per traversal.
10041004 此状态代码表示图形请求格式不正确。This status code indicates malformed graph request. 如果请求反序列化失败、将非值类型反序列化为值类型,或请求了不受支持的 Gremlin 操作,则请求的格式可能不正确。Request can be malformed when it fails deserialization, non-value type is being deserialized as value type or unsupported gremlin operation requested. 应用程序不应重试该请求,因为该请求不会成功。Application should not retry the request because it will not be successful.
10071007 通常,此状态代码会连同错误消息 "Could not process request. Underlying connection has been closed." 一起返回。Usually this status code is returned with error message "Could not process request. Underlying connection has been closed.". 如果客户端驱动程序尝试使用服务器正在关闭的连接,则可能会发生这种情况。This situation can happen if client driver attempts to use a connection that is being closed by the server. 应用程序应在不同的连接上重试遍历。Application should retry the traversal on a different connection.
10081008 Cosmos DB Gremlin 服务器可以终止连接以重新平衡群集中的流量。Cosmos DB Gremlin server can terminate connections to rebalance traffic in the cluster. 客户端驱动程序应处理这种情况,并仅使用活动的连接将请求发送到服务器。Client drivers should handle this situation and use only live connections to send requests to the server. 客户端驱动程序偶尔检测不到该连接已关闭。Occasionally client drivers may not detect that connection was closed. 当应用程序遇到错误 "Connection is too busy. Please retry after sometime or open more connections." 时,应在另一个连接上重试遍历。When application encounters an error, "Connection is too busy. Please retry after sometime or open more connections." it should retry traversal on a different connection.

示例Samples

基于 Gremlin.Net 的、可读取一个状态属性的示例客户端应用程序:A sample client application based on Gremlin.Net that reads one status attribute:

// Following example reads a status code and total request charge from server response attributes.
// Variable "server" is assumed to be assigned to an instance of a GremlinServer that is connected to Cosmos DB account.
using (GremlinClient client = new GremlinClient(server, new GraphSON2Reader(), new GraphSON2Writer(), GremlinClient.GraphSON2MimeType))
{
  ResultSet<dynamic> responseResultSet = await GremlinClientExtensions.SubmitAsync<dynamic>(client, requestScript: "g.V().count()");
  long statusCode = (long)responseResultSet.StatusAttributes["x-ms-status-code"];
  double totalRequestCharge = (double)responseResultSet.StatusAttributes["x-ms-total-request-charge"];

  // Status code and request charge are logged into application telemetry.
}

演示如何从 Gremlin Java 客户端读取状态属性的示例:An example that demonstrates how to read status attribute from Gremlin java client:

try {
  ResultSet resultSet = this.client.submit("g.addV().property('id', '13')");
  List<Result> results = resultSet.all().get();

  // Process and consume results

} catch (ResponseException re) {
  // Check for known errors that need to be retried or skipped
  if (re.getStatusAttributes().isPresent()) {
    Map<String, Object> attributes = re.getStatusAttributes().get();
    int statusCode = (int) attributes.getOrDefault("x-ms-status-code", -1);

    // Now we can check for specific conditions
    if (statusCode == 409) {
        // Handle conflicting writes
      }
    }

    // Check if we need to delay retry
    if (attributes.containsKey("x-ms-retry-after-ms")) {
      // Read the value of the attribute as is
      String retryAfterTimeSpan = (String) attributes.get("x-ms-retry-after-ms"));

      // Convert the value into actionable duration
            LocalTime locaTime = LocalTime.parse(retryAfterTimeSpan);
            Duration duration = Duration.between(LocalTime.MIN, locaTime);

      // Perform a retry after "duration" interval of time has elapsed
    }
  }
}

后续步骤Next steps