Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.
Syntax
from pyspark.sql import functions as sf
sf.hll_sketch_estimate(col)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or str |
The HLL sketch binary representation. |
Returns
pyspark.sql.Column: The estimated number of unique values for the HllSketch.
Examples
Example 1: Estimate unique values from HLL sketch
from pyspark.sql import functions as sf
df = spark.createDataFrame([1,2,2,3], "INT")
df.agg(sf.hll_sketch_estimate(sf.hll_sketch_agg("value"))).show()
+----------------------------------------------+
|hll_sketch_estimate(hll_sketch_agg(value, 12))|
+----------------------------------------------+
| 3|
+----------------------------------------------+