有关 Azure Cosmos DB 中 Cassandra API 的常见问题解答。Frequently asked questions about the Cassandra API in Azure Cosmos DB

适用于: Cassandra API

本文介绍了 Azure Cosmos DB 中 Apache Cassandra 与 Cassandra API 之间的功能差异。This article describes the functionality differences between Apache Cassandra and Cassandra API in Azure Cosmos DB. 还提供了有关 Azure Cosmos DB 中 Cassandra API 的常见问题解答。It also provides answers to frequently asked questions about the Cassandra API in Azure Cosmos DB.

Apache Cassandra 和 Cassandra API 之间的主要差异Key differences between Apache Cassandra and the Cassandra API

  • Apache Cassandra 建议分区键的大小限制为 100 MB。Apache Cassandra recommends a 100-MB limit on the size of a partition key. 用于 Azure Cosmos DB 的 Cassandra API 允许每个分区最多 20 GB。The Cassandra API for Azure Cosmos DB allows up to 20 GB per partition.
  • Apache Cassandra 允许禁用持久提交。Apache Cassandra allows you to disable durable commits. 可以跳过写入提交日志,直接转到 memtable。You can skip writing to the commit log and go directly to the memtables. 如果在将 memtable 刷新到磁盘上的 SSTables 之前节点出现故障,则可能导致数据丢失。This can lead to data loss if the node goes down before memtables are flushed to SSTables on disk. Azure Cosmos DB 始终执行持久提交,以帮助防止数据丢失。Azure Cosmos DB always does durable commits to help prevent data loss.
  • 如果工作负荷涉及多次替换或删除,Apache Cassandra 可能会降低性能。Apache Cassandra can see diminished performance if the workload involves many replacements or deletions. 原因是“读取工作负荷”需要跳过逻辑删除来提取最新数据。The reason is tombstones that the read workload needs to skip over to fetch the latest data. 当工作负荷有许多替换操作或删除操作时,Cassandra API 不会降低读取性能。The Cassandra API won't see diminished read performance when the workload has many replacements or deletions.
  • 在高替换工作负载的情况下,需要运行压缩来合并磁盘上的 SSTables。During scenarios of high replacement workloads, compaction needs to run to merge SSTables on disk. (需要进行合并的原因是 Apache Cassandra 的写入仅为追加。(A merge is needed because Apache Cassandra's writes are append only. 多个更新都存储为需要进行定期合并的单个 SSTable 条目。Multiple updates are stored as individual SSTable entries that need to be periodically merged). 这种情况还可能会导致压缩期间读取性能降低。This situation can also lead to lowered read performance during compaction. Cassandra API 中不会对性能造成影响,因为 API 不实现压缩。This performance impact doesn't happen in the Cassandra API because the API doesn't implement compaction.
  • 使用 Apache Cassandra 可以将复制因子设置为 1。Setting a replication factor of 1 is possible with Apache Cassandra. 但是,如果包含数据的唯一节点出现故障,则会导致低可用性。However, it leads to low availability if the only node with the data goes down. 这并不是用于 Azure Cosmos DB 的 Cassandra API 所导致的问题,因为复制因子始终为 4(仲裁为 3)。This is not an issue with the Cassandra API for Azure Cosmos DB because there is always a replication factor of 4 (quorum of 3).
  • 在 Apache Cassandra 中添加或删除节点需要手动干预,且由于现有节点将部分标记范围转移到新节点,新节点上的 CPU 使用率较高。Adding or removing nodes in Apache Cassandra requires manual intervention, along with high CPU usage on the new node while existing nodes move some of their token ranges to the new node. 停止使用某个现有节点时,会发生同样的情况。This situation is the same when you're decommissioning an existing node. 但是,Cassandra API 横向扩展时,服务或应用程序中不会产生任何问题。However, the Cassandra API scales out without any issues observed in the service or application.
  • 不需要像在 Apache Cassandra 中那样在群集中每个节点上设置 num_tokens。There is no need to set num_tokens on each node in the cluster as in Apache Cassandra. Azure Cosmos DB 完全托管节点和标记范围。Azure Cosmos DB fully manages nodes and token ranges.
  • Cassandra API 是完全托管的。The Cassandra API is fully managed. 不需要 Apache Cassandra 中使用的 nodetool 命令,例如修复和解除授权。You don't need the nodetool commands, such as repair and decommission, that are used in Apache Cassandra.

其他常见问题解答Other frequently asked questions

Cassandra API 支持哪个协议版本?What protocol version does the Cassandra API support?

用于 Azure Cosmos DB 的 Cassandra API 支持 CQL 版本 3.x。The Cassandra API for Azure Cosmos DB supports CQL version 3.x. 它的 CQL 兼容性基于公共 Apache Cassandra GitHub 存储库Its CQL compatibility is based on the public Apache Cassandra GitHub repository. 如果有与支持其他协议相关的反馈,请通过 UserVoice 反馈告知我们或者向 askcosmosdbcassandra@microsoft.com 发送电子邮件。If you have feedback about supporting other protocols, let us know via user voice feedback or send email to askcosmosdbcassandra@microsoft.com.

为何要求选择表的吞吐量?Why is choosing throughput for a table a requirement?

Azure Cosmos DB 根据以下表的创建位置设置容器的默认吞吐量:Azure 门户或 CQL。Azure Cosmos DB sets the default throughput for your container based on where you create the table from: Azure portal or CQL.

Azure Cosmos DB 针对操作设置上限,在性能和延迟方面提供保障。Azure Cosmos DB provides guarantees for performance and latency, with upper bounds on operations. 如果引擎可以针对租户的操作实施调控,则可以提供这些保证。These guarantees are possible when the engine can enforce governance on the tenant's operations. 设置吞吐量可确保在吞吐量和延迟方面获得保障,因为平台会保留此容量,并保证操作成功。Setting throughput ensures that you get the guaranteed throughput and latency, because the platform reserves this capacity and guarantees operation success. 可以灵活更改吞吐量以便从应用程序的季节性因素中受益并节省成本。You can elastically change throughput to benefit from the seasonality of your application and save costs.

Azure Cosmos DB 中的请求单位数一文介绍了吞吐量概念。The throughput concept is explained in the Request Units in Azure Cosmos DB article. 表的吞吐量平均分布到各个基础物理分区中。The throughput for a table is equally distributed across the underlying physical partitions.

通过 CQL 创建的表的吞吐量是多少?What is the throughput of a table that's created through CQL?

Azure Cosmos DB 使用每秒请求单位数 (RU/s) 作为所提供的吞吐量的单位。Azure Cosmos DB uses Request Units per second (RU/s) as a currency for providing throughput. 通过 CQL 创建的表的默认吞吐量为 400 RU。Tables created through CQL have 400 RU by default. 可以通过 Azure 门户更改 RU。You can change the RU from the Azure portal.

CQLCQL

CREATE TABLE keyspaceName.tablename (user_id int PRIMARY KEY, lastname text) WITH cosmosdb_provisioned_throughput=1200

.NET.NET

int provisionedThroughput = 400;
var simpleStatement = new SimpleStatement($"CREATE TABLE {keyspaceName}.{tableName} (user_id int PRIMARY KEY, lastname text)");
var outgoingPayload = new Dictionary<string, byte[]>();
outgoingPayload["cosmosdb_provisioned_throughput"] = Encoding.UTF8.GetBytes(provisionedThroughput.ToString());
simpleStatement.SetOutgoingPayload(outgoingPayload);

耗尽吞吐量时会发生什么情况?What happens when throughput is used up?

Azure Cosmos DB 针对操作设置上限,在性能和延迟方面提供保障。Azure Cosmos DB provides guarantees for performance and latency, with upper bounds on operations. 如果引擎可以针对租户的操作实施调控,则可以提供这些保证。These guarantees are possible when the engine can enforce governance on the tenant's operations. 设置吞吐量可确保在吞吐量和延迟方面获得保障,因为平台会保留此容量,并保证操作成功。Setting throughput ensures that you get the guaranteed throughput and latency, because the platform reserves this capacity and guarantees operation success.

当超出此容量时,会收到以下错误消息,指出已耗尽容量:When you go over this capacity, you get the following error message that indicates your capacity was used up:

0x1001 超载:无法处理此请求,因为“请求速率太大”0x1001 Overloaded: the request can't be processed because "Request Rate is large"

必须查明是哪些操作(及其数据量)导致了此问题。It's essential to see what operations (and their volume) cause this issue. 可以通过 Azure 门户上的指标了解超出了预配容量的已消耗容量。You can get an idea about consumed capacity going over the provisioned capacity with metrics on the Azure portal. 然后,你需要确保容量差不多是在所有基础分区中平均消耗的。Then you need to ensure that capacity is consumed nearly equally across all underlying partitions. 如果你看到一个分区在消耗大多数吞吐量,则说明存在工作负载倾斜。If you see that one partition is consuming most of the throughput, you have skew of workload.

相关指标显示了吞吐量在若干小时内、若干天内以及每七天内在各个分区中的使用情况或总体使用情况。Metrics are available that show you how throughput is used over hours, over days, and per seven days, across partitions or in aggregate. 有关详细信息,请参阅使用 Azure Cosmos DB 中的指标进行监视和调试For more information, see Monitoring and debugging with metrics in Azure Cosmos DB.

Azure Cosmos DB 诊断日志记录一文中介绍了诊断日志。Diagnostic logs are explained in the Azure Cosmos DB diagnostic logging article.

主键是否映射到 Azure Cosmos DB 的分区键概念?Does the primary key map to the partition key concept of Azure Cosmos DB?

是,分区键用来将实体放置在正确位置。Yes, the partition key is used to place the entity in the right location. 在 Azure Cosmos DB 中,它用来查找存储在物理分区中的正确逻辑分区。In Azure Cosmos DB, it's used to find the right logical partition that's stored on a physical partition. 在 Azure Cosmos DB 中分区和缩放一文中很好地解释了分区概念。The partitioning concept is well explained in the Partition and scale in Azure Cosmos DB article. 此处必须记住的一点是,逻辑分区不应当超出 20-GB 限制。The essential takeaway here is that a logical partition shouldn't go over the 20-GB limit.

当收到分区已满通知时,会发生什么情况?What happens when I get a notification that a partition is full?

Azure Cosmos DB 是基于服务级别协议 (SLA) 的系统。Azure Cosmos DB is a system based on service-level agreement (SLA). 可提供无限缩放,并在延迟、吞吐量、可用性和一致性方面提供保障。It provides unlimited scale, with guarantees for latency, throughput, availability, and consistency. 此无限制的存储是通过使用分区作为键概念的数据水平横向扩展实现的。This unlimited storage is based on horizontal scale-out of data, using partitioning as the key concept. 在 Azure Cosmos DB 中分区和缩放一文中很好地解释了分区概念。The partitioning concept is well explained in the Partition and scale in Azure Cosmos DB article.

应当遵循每个逻辑分区的实体数或项数不超过 20-GB 的限制。You should adhere to the 20-GB limit on the number of entities or items per logical partition. 为确保应用程序能够很好地进行缩放,建议 不要 创建热分区,即,将所有信息存储在一个分区内并查询它。To ensure that your application scales well, we recommend that you not create a hot partition by storing all information in one partition and querying it. 只有存在数据倾斜时,也就是说,当一个分区键有大量数据(超过 20 GB)时,才会发生此错误。This error can come only if your data is skewed: that is, you have lot of data for one partition key (more than 20 GB). 可以使用存储门户查明数据的分布。You can find the distribution of data by using the storage portal. 修复此错误的方法是:重新创建表并选择一个细粒度的主键(分区键),这可以实现更好的数据分布。The way to fix this error is to re-create the table and choose a granular primary (partition key), which allows better distribution of data.

是否可以将 Cassandra API 用作具有数百万或数十亿分区键的键值存储?Can I use the Cassandra API as a key value store with millions or billions of partition keys?

Azure Cosmos DB 可以通过对存储进行横向扩展来存储无限的数据。Azure Cosmos DB can store unlimited data by scaling out the storage. 此存储与吞吐量无关。This storage is independent of the throughput. 是的,你始终可以使用 Cassandra API 通过指定正确的主键/分区键来存储和检索键和值。Yes, you can always use the Cassandra API just to store and retrieve keys and values by specifying the right primary/partition key. 这些单独的键获取其自己的逻辑分区并存在于物理分区之上,且不会出现问题。These individual keys get their own logical partition and sit atop a physical partition without issues.

能否用 Cassandra API 创建多个表?Can I create more than one table with the Cassandra API?

是的,可以使用 Cassandra API 创建多个表。Yes, it's possible to create more than one table with the Cassandra API. 这些表中的每一个都被视为用于吞吐量和存储的单元。Each of those tables is treated as unit for throughput and storage.

是否可以连续创建多个表?Can I create more than one table in succession?

Azure Cosmos DB 是资源调控的系统,适用于数据和控制平面活动。Azure Cosmos DB is resource-governed system for both data and control plane activities. 与集合和表一样,容器是针对给定吞吐量容量预配的运行时实体。Containers, like collections and tables, are runtime entities that are provisioned for a given throughput capacity. 快速连续创建这些容器是非预期的活动,可能会被限制。The creation of these containers in quick succession isn't an expected activity and might be throttled. 如果存在立即删除或创建表的测试,请尽量将它们隔开。If you have tests that drop or create tables immediately, try to space them out.

最多可以创建几个表?What is the maximum number of tables that I can create?

表的数量没有实际限制。There's no physical limit on the number of tables. 如果需要创建大量表(总大小超过 10 TB 的数据),而不是常见的数十个或几百个,请发送电子邮件到 askcosmosdbcassandra@microsoft.comIf you have a large number of tables (where the total steady size goes over 10 TB of data) that need to be created, not the usual tens or hundreds, send email to askcosmosdbcassandra@microsoft.com.

最多可以创建几个键空间?What is the maximum number of keyspaces that I can create?

键空间的数量没有实际限制,因为它们是元数据容器。There's no physical limit on the number of keyspaces because they're metadata containers. 如果有大量键空间,请发送电子邮件到 askcosmosdbcassandra@microsoft.comIf you have a large number of keyspaces, send email to askcosmosdbcassandra@microsoft.com.

从普通的表启动后,是否可以引入大量数据?Can I bring in a lot of data after starting from a normal table?

是的。Yes. 假设均匀分布分区,存储容量会自动管理,并在推入更多的数据时增加。Assuming uniformly distributed partitions, the storage capacity is automatically managed and increases as you push in more data. 因此,可放心地导入所需数量的数据,不需要管理和预配节点等等。So you can confidently import as much data as you need without managing and provisioning nodes and more. 但是,如果预期数据随即会有大量增长,比较有意义的做法是直接预配预期的吞吐量,而不是一开始预配较低的吞吐量,然后随即增加。But if you're anticipating a lot of immediate data growth, it makes more sense to directly provision for the anticipated throughput rather than starting lower and increasing it immediately.

是否可以使用 YAML 文件设置来配置 API 行为?Can I use YAML file settings to configure API behavior?

用于 Azure Cosmos DB 的 Cassandra API 提供协议级的兼容性来执行操作。The Cassandra API for Azure Cosmos DB provides protocol-level compatibility for executing operations. 它消除了管理、监视和配置的复杂性。It hides away the complexity of management, monitoring, and configuration. 开发人员/用户无需担心可用性、逻辑删除、键缓存、行缓存、布隆筛选器和其他许多设置。As a developer/user, you don't need to worry about availability, tombstones, key cache, row cache, bloom filter, and a multitude of other settings. Cassandra API 注重于提供所需的读取和写入性能,不会产生配置和管理开销。The Cassandra API focuses on providing the read and write performance that you need without the overhead of configuration and management.

Cassandra API 是否支持节点添加、群集状态和节点状态命令?Will the Cassandra API support node addition, cluster status, and node status commands?

Cassandra API 可简化容量规划,并应对吞吐量和存储方面的弹性需求。The Cassandra API simplifies capacity planning and responding to the elasticity demands for throughput and storage. 借助 Azure Cosmos DB,你可以预配所需的吞吐量。With Azure Cosmos DB, you provision the throughput that you need. 然后,可以在一整天中多次上调和下调吞吐量,而无需担心如何添加、删除或管理节点。Then you can scale it up and down any number of times through the day, without worrying about adding, deleting, or managing nodes. 节点和群集管理不需要使用工具。You don't need to use tools for node and cluster management.

在处理有关创建键空间的各项配置设置(例如简单/网络)时,会发生什么情况?What happens with various configuration settings for keyspace creation like simple/network?

出于可用性和低延迟的原因,Azure Cosmos DB 提供现成的多区域分布。Azure Cosmos DB provides multiple-region distribution out of the box for availability and low-latency reasons. 不需要设置副本或其他内容。You don't need to set up replicas or other things. 写入内容均在你进行写入的区域中经过仲裁永久认可,同时提供性能保障。Writes are always durably quorum committed in any region where you write, while providing performance guarantees.

表元数据的各种设置会产生什么效果?What happens with various settings for table metadata?

Azure Cosmos DB 为读取、写入和吞吐量提供性能保障。Azure Cosmos DB provides performance guarantees for reads, writes, and throughput. 因此,无需担心改动配置设置或不小心实施了手动操控会出现问题。So you don't need to worry about touching any of the configuration settings and accidentally manipulating them. 这些设置包括布隆筛选器、缓存、读取修复机会、gc_grace 和压缩 memtable_flush_period。Those settings include bloom filter, caching, read repair chance, gc_grace, and compression memtable_flush_period.

Cassandra 表是否支持生存时间?Is time-to-live supported for Cassandra tables?

是的,支持 TTL。Yes, TTL is supported.

如何监视基础结构以及吞吐量?How can I monitor infrastructure along with throughput?

Azure Cosmos DB 是一个平台服务,可帮助你提高工作效率,而无需担心如何管理和监视基础结构。Azure Cosmos DB is a platform service that helps you increase productivity and not worry about managing and monitoring infrastructure. 例如,你无需使用各种工具监视节点状态、副本状态、gc 和 OS 参数。For example, you don't need to monitor node status, replica status, gc, and OS parameters earlier with various tools. 只需在门户指标中关注可用的吞吐量,以查明你是否受到限制,然后增大或减小该吞吐量。You just need to take care of throughput that's available in portal metrics to see if you're getting throttled, and then increase or decrease that throughput. 方法:You can:

哪些客户端 SDK 可以使用 Cassandra API?Which client SDKs can work with the Cassandra API?

Apache Cassandra SDK 的使用 CQLv3 的客户端驱动程序用于客户端程序。The Apache Cassandra SDK's client drivers that use CQLv3 were used for client programs. 如果使用其他驱动程序或者遇到问题,请联系 Azure 支持部门If you have other drivers that you use or if you're facing issues, contact Azure support.

是否支持复合分区键?Are composite partition keys supported?

是的,可以使用正则语法创建复合分区键。Yes, you can use regular syntax to create composite partition keys.

是否可以使用 sstableloader 加载数据?Can I use sstableloader for data loading?

不可以,不支持 sstableloader。No, sstableloader isn't supported.

是否可以将本地 Apache Cassandra 群集与 Cassandra API 配对?Can I pair an on-premises Apache Cassandra cluster with the Cassandra API?

目前,Azure Cosmos DB 针对云环境提供了优化的体验,且不产生操作开销。At present, Azure Cosmos DB has an optimized experience for a cloud environment without the overhead of operations. 如果需要配对,请联系 Azure 支持部门并提供方案说明。If you require pairing, contact Azure support with a description of your scenario. 我们正致力于提供一项产品/服务,以便帮助用户将本地或云 Cassandra 群集与用于 Azure Cosmos DB 的 Cassandra API 配对。We're working on an offering to help pair the on-premises or cloud Cassandra cluster with the Cassandra API for Azure Cosmos DB.

Cassandra API 是否提供完整备份?Does the Cassandra API provide full backups?

Azure Cosmos DB 的所有 API 都提供间隔四小时的两个免费完整备份。Azure Cosmos DB provides two free full backups taken at four-hour intervals across all APIs. 因此,无需设置备份计划。So you don't need to set up a backup schedule.

如果想要修改保留期和频率,请发送电子邮件到 askcosmosdbcassandra@microsoft.com 或提交支持案例。If you want to modify retention and frequency, send email to askcosmosdbcassandra@microsoft.com or raise a support case. Azure Cosmos DB 的自动联机备份和还原一文中提供了有关备份功能的信息。Information about backup capability is provided in the Automatic online backup and restore with Azure Cosmos DB article.

当某个区域出现故障时,Cassandra API 帐户如何处理故障转移?How does the Cassandra API account handle failover if a region goes down?

Cassandra API 借助 Azure Cosmos DB 的多区域分布式平台。The Cassandra API borrows from the multiple-regionally distributed platform of Azure Cosmos DB. 为了确保应用程序能够容许数据中心停机,需在 Azure 门户中至少再为帐户启用一个区域。To ensure that your application can tolerate datacenter downtime, enable at least one more region for the account in the Azure portal. 有关详细信息,请参阅使用 Azure Cosmos DB 实现高可用性For more information, see High availability with Azure Cosmos DB.

可以视需要为帐户添加任意数目的区域,并通过提供故障转移优先级来控制可将该帐户故障转移到哪个位置。You can add as many regions as you want for the account and control where it can fail over to by providing a failover priority. 若要使用数据库,还需要在那里提供一个应用程序。To use the database, you need to provide an application there too. 这样,客户就不会遇到停机情况。When you do so, your customers won't experience downtime.

Cassandra API 是否默认对实体的所有属性编制索引?Does the Cassandra API index all attributes of an entity by default?

否。No. Cassandra API 支持辅助索引,后者与 Apache Cassandra 的行为类似。The Cassandra API supports secondary indexes, which behave in a similar way to Apache Cassandra. API 默认不会对每个属性编制索引。The API does not index every attribute by default.

是否可以在本地将新的 Cassandra API SDK 用于模拟器?Can I use the new Cassandra API SDK locally with the emulator?

是,系统支持该操作。Yes, this is supported. 可以在使用 Azure Cosmos 模拟器进行本地开发和测试 文章中找到有关如何启用此操作的详细信息。You can find details on how to enable this in the Use the Azure Cosmos Emulator for local development and testing article.

如何将数据从 Apache Cassandra 群集迁移到 Azure Cosmos DB?How can I migrate data from Apache Cassandra clusters to Azure Cosmos DB?

可以通过将数据迁移到 Azure Cosmos DB 中的 Cassandra API 帐户教程,了解迁移选项。You can read about migration options in the Migrate your data to Cassandra API account in Azure Cosmos DB tutorial.

在哪里可以提供有关 Cassandra API 功能的反馈?Where can I give feedback on Cassandra API features?

请通过 UserVoice 反馈提供反馈。Provide feedback via user voice feedback.

后续步骤Next steps