生成唯一递增数值

在弹性分布式数据集 (RDD) 中使用 `zipWithIndex()`

`zipWithIndex()` 函数仅在 RDD 中可用。 不能直接在 DataFrame 上使用该函数。

``````df = spark.createDataFrame(
[
('Alice','10'),('Susan','12')
],
['Name','Age']
)

df1=df.rdd.zipWithIndex().toDF()
df2=df1.select(col("_1.*"),col("_2").alias('increasing_id'))
df2.show()
``````

``````+-----+---+-------------+
| Name|Age|increasing_id|
+-----+---+-------------+
|Alice| 10|            0|
|Susan| 12|            1|
+-----+---+-------------+
``````

使用 `monotonically_increasing_id()` 表示唯一但不连续的数字

`monotonically_increasing_id()` 函数生成单调递增的 64 位整数。

``````from pyspark.sql.functions import *

df_with_increasing_id = df.withColumn("monotonically_increasing_id", monotonically_increasing_id())
df_with_increasing_id.show()
``````

``````+-----+---+---------------------------+
| Name|Age|monotonically_increasing_id|
+-----+---+---------------------------+
|Alice| 10|                 8589934592|
|Susan| 12|                25769803776|
+-----+---+---------------------------+
``````

将 `monotonically_increasing_id()` 与 `row_number()` 合并为两列

`row_number()` 函数生成连续的数字。

``````from pyspark.sql.functions import *
from pyspark.sql.window import *

window = Window.orderBy(col('monotonically_increasing_id'))
df_with_consecutive_increasing_id = df_with_increasing_id.withColumn('increasing_id', row_number().over(window))
df_with_consecutive_increasing_id.show()
``````

``````+-----+---+---------------------------+-------------+
| Name|Age|monotonically_increasing_id|increasing_id|
+-----+---+---------------------------+-------------+
|Alice| 10|                 8589934592|            1|
|Susan| 12|                25769803776|            2|
+-----+---+---------------------------+-------------+
``````

``````previous_max_value = 1000
df_with_consecutive_increasing_id.withColumn("cnsecutiv_increase", col("increasing_id") + lit(previous_max_value)).show()
``````

``````+-----+---+---------------------------+-------------+------------------+
| Name|Age|monotonically_increasing_id|increasing_id|cnsecutiv_increase|
+-----+---+---------------------------+-------------+------------------+
|Alice| 10|                 8589934592|            1|              1001|
|Susan| 12|                25769803776|            2|              1002|
+-----+---+---------------------------+-------------+------------------+
``````