中值

返回组中值的中值。

Syntax

from pyspark.sql import functions as sf

sf.median(col)

参数

参数 类型 Description
col pyspark.sql.Column 或列名 要计算的目标列。

退货

pyspark.sql.Column:组中值的中值中值。

例子

from pyspark.sql import functions as sf
df = spark.createDataFrame([
    ("Java", 2012, 20000), ("dotNET", 2012, 5000),
    ("Java", 2012, 22000), ("dotNET", 2012, 10000),
    ("dotNET", 2013, 48000), ("Java", 2013, 30000)],
    schema=("course", "year", "earnings"))
df.groupby("course").agg(sf.median("earnings")).show()
+------+----------------+
|course|median(earnings)|
+------+----------------+
|  Java|         22000.0|
|dotNET|         10000.0|
+------+----------------+