monotonically_increasing_id

生成单调增加 64 位整数。生成的 ID 保证是单调递增且唯一的，但不是连续的。当前实现将分区 ID 置于高 31 位，并将每个分区中的记录编号置于较低的 33 位中。假设数据帧的分区数少于 10 亿，每个分区的记录数少于 80 亿条。

Syntax

from pyspark.sql import functions as sf

sf.monotonically_increasing_id()

退货

pyspark.sql.Column：组的最后一个值。

注释

该函数是不确定的，因为它的结果取决于分区 ID。

例如，请考虑一个：class：DataFrame 具有两个分区，每个分区有 3 条记录。此表达式将返回以下 ID： 0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594。

例子

示例 1：生成单调递增 ID

from pyspark.sql import functions as sf
spark.range(0, 10, 1, 2).select(
    "*",
    sf.spark_partition_id(),
    sf.monotonically_increasing_id()).show()

+---+--------------------+-----------------------------+
| id|SPARK_PARTITION_ID()|monotonically_increasing_id()|
+---+--------------------+-----------------------------+
|  0|                   0|                            0|
|  1|                   0|                            1|
|  2|                   0|                            2|
|  3|                   0|                            3|
|  4|                   0|                            4|
|  5|                   1|                   8589934592|
|  6|                   1|                   8589934593|
|  7|                   1|                   8589934594|
|  8|                   1|                   8589934595|
|  9|                   1|                   8589934596|
+---+--------------------+-----------------------------+

Last updated on 2026-04-20

monotonically_increasing_id

Syntax

退货

注释

例子

Recursos adicionales