bitmap_or_agg
aggregate function
Applies to: Databricks SQL Databricks Runtime 13.3 LTS and above
Returns the bitwise OR
of all BINARY
input values in the group.
The function is commonly used to count the number of distinct integral numbers in combination with the bitmap_bucket_number(), bitmap_count(), bitmap_bit_position(), and bitmap_construct_agg() functions.
To aggregate bitmaps in form of integral numerics, use the bit_or() aggregate function.
To aggregate bit positions into a BINARY
bitmap, use bitmap_construct_agg().
Syntax
bitmap_or_agg(expr) [FILTER ( WHERE cond ) ]
This function can also be invoked as a window function using the OVER
clause.
Arguments
expr
: ABINARY
of length<= 4096
representing a bitmap. Azure Databricks truncates longer values to 4096.cond
: An optional boolean expression filtering the rows used for aggregation.
Returns
A BINARY
of length 4096.
Examples
> SELECT hex(trim(TRAILING X'00' FROM bitmap_or_agg(val))) FROM VALUES(x'10'), (x'02') AS T(val);
12
-- Count the number of distinct values across two tables
> SELECT sum(num_distinct) AS num_distinct
FROM (SELECT bucket, bitmap_count(bitmap_or_agg(num_distinct)) AS num_distinct
FROM ((SELECT bitmap_bucket_number(val) AS bucket,
bitmap_construct_agg(bitmap_bit_position(val)) AS num_distinct
FROM VALUES(1), (2), (1), (-1), (5), (0), (5) AS t(val)
GROUP BY ALL)
UNION ALL
(SELECT bitmap_bucket_number(val) AS bucket,
bitmap_construct_agg(bitmap_bit_position(val)) AS num_distinct
FROM VALUES(3), (1), (-1), (6), (5), (1), (5), (8) AS t(val)
GROUP BY ALL))
GROUP BY ALL);
8