MISSING_AGGREGATION error class
The non-aggregating expression <expression>
is based on columns which are not participating in the GROUP BY clause.
Add the columns or the expression to the GROUP BY, aggregate the expression, or use <expressionAnyValue>
if you do not care which of the values within a group is returned.
Parameters
- expression: Non aggregating, non grouping expression in the
SELECT
list. - expressionAnyValue:
expression
wrapped in an any_value() aggregate function.
Explanation
Within the context of a query with a GROUP BY clause, the local column-references in the SELECT list must be:
- Consumed as an argument to an aggregate function, or
- Part of an expression which matches an expression on the
GROUP BY
clause.
A local column reference is a column that has been resolved to a table-reference in the query's FROM clause.
In other words: Column-references must either be part of the grouping keys, or they must be part of the aggregation.
Azure Databricks matches expressions on best effort:
For example it will recognize: SELECT c1 + 5 FROM T GROUP BY 5 + c1
as mathing expressions.
But SELECT c1 FROM T GROUP BY c1 + 5
is not a match.
Mitigation
The mitigation of the error depends on the cause:
Did you miss a grouping column?
Add
expression
, or the relevant subexpression ofexpression
to theGROUP BY
clause.Is the column reference part of a
GROUP BY
expression which differs fromepression
?Match the expression in the
SELECT
list or simplify theGROUP BY
expression.Are you missing the aggregation?
Wrap the column reference with an aggregate function. If you only want a representative value from the group, you can use any_value(epression).
Examples
-- Sample data
> CREATE OR REPLACE TEMPORARY VIEW tasks(name, firstname, task, cost) AS
VALUES ('Smith' , 'Sam' , 'UNPIVOT', 10),
('Smith' , 'Sam' , 'LATERAL', 5),
('Shuster', 'Sally' , 'DELETE' , 7),
('Shuster', 'Sally' , 'GRANT' , 8);
-- `name` and `firstname` are part of teh group by coumns, but incomplete
> SELECT name, firstname, sum(cost) FROM tasks GROUP BY firstname || ' ' || name;
[MISSING_AGGREGATION] The expression "name" is neither present in the group by, nor is it an aggregate function.
-- Match the GROUP BY expression
> SELECT firstname || ' ' || name, sum(cost) FROM tasks GROUP BY firstname || ' ' || name;
Sam Smith 15
Sally Shuster 15
-- Break up the GROUP BY expression
> SELECT firstname, name, sum(cost) FROM tasks GROUP BY firstname, name;
Sam Smith 15
Sally Shuster 15
-- Missing grouping column
> SELECT name, firstname, sum(cost) FROM tasks GROUP BY name;
[MISSING_AGGREGATION] The expression "firstname" is neither present in the group by, nor is it an aggregate function.
-- Add the grouping column
> SELECT firstname, name, sum(cost) FROM tasks GROUP BY firstname, name;
Sam Smith 15
Sally Shuster 15
-- Missing aggregate
> SELECT firstname, name, sum(cost), task FROM tasks GROUP BY firstname, name;
[MISSING_AGGREGATION] The expression "task" is neither present in the group by, nor is it an aggregate function.
-- Add an aggregate
> SELECT firstname, name, sum(cost), array_agg(task) FROM tasks GROUP BY firstname, name;
Sam Smith 15 ["UNPIVOT","LATERAL"]
Sally Shuster 15 ["DELETE","GRANT"]
-- Return any task
> SELECT firstname, name, sum(cost), any_value(task) FROM tasks GROUP BY firstname, name;
Sam Smith 15 LATERAL
Sally Shuster 15 DELETE