Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Returns the estimated number of unique values given the binary representation of a Datasketches Theta Sketch.
Syntax
from pyspark.sql import functions as sf
sf.theta_sketch_estimate(col)
Parameters
| Parameter | Type | Description |
|---|---|---|
col |
pyspark.sql.Column or str |
The Theta sketch binary representation. |
Returns
pyspark.sql.Column: The estimated number of unique values for the Theta Sketch.
Examples
Example 1: Estimate unique values from Theta sketch
from pyspark.sql import functions as sf
df = spark.createDataFrame([1,2,2,3], "INT")
df.agg(sf.theta_sketch_estimate(sf.theta_sketch_agg("value"))).show()
+--------------------------------------------------+
|theta_sketch_estimate(theta_sketch_agg(value, 12))|
+--------------------------------------------------+
| 3|
+--------------------------------------------------+