MISSING_AGGREGATION error class

SQLSTATE: 42803

The non-aggregating expression <expression> is based on columns which are not participating in the GROUP BY clause.

Add the columns or the expression to the GROUP BY, aggregate the expression, or use <expressionAnyValue> if you do not care which of the values within a group is returned.

Parameters

  • expression: Non aggregating, non grouping expression in the SELECT list.
  • expressionAnyValue: expression wrapped in an any_value() aggregate function.

Explanation

Within the context of a query with a GROUP BY clause, the local column-references in the SELECT list must be:

  • Consumed as an argument to an aggregate function, or
  • Part of an expression which matches an expression on the GROUP BY clause.

A local column reference is a column that has been resolved to a table-reference in the query's FROM clause.

In other words: Column-references must either be part of the grouping keys, or they must be part of the aggregation.

Azure Databricks matches expressions on best effort: For example it will recognize: SELECT c1 + 5 FROM T GROUP BY 5 + c1 as mathing expressions. But SELECT c1 FROM T GROUP BY c1 + 5 is not a match.

Mitigation

The mitigation of the error depends on the cause:

  • Did you miss a grouping column?

    Add expression, or the relevant subexpression of expression to the GROUP BY clause.

  • Is the column reference part of a GROUP BY expression which differs from epression?

    Match the expression in the SELECT list or simplify the GROUP BY expression.

  • Are you missing the aggregation?

    Wrap the column reference with an aggregate function. If you only want a representative value from the group, you can use any_value(epression).

Examples

-- Sample data
> CREATE OR REPLACE TEMPORARY VIEW tasks(name, firstname, task, cost) AS
   VALUES ('Smith'  , 'Sam'   , 'UNPIVOT', 10),
          ('Smith'  , 'Sam'   , 'LATERAL',  5),
          ('Shuster', 'Sally' , 'DELETE' ,  7),
          ('Shuster', 'Sally' , 'GRANT'  ,  8);

-- `name` and `firstname` are part of teh group by coumns, but incomplete
> SELECT name, firstname, sum(cost) FROM tasks GROUP BY firstname || ' ' || name;
 [MISSING_AGGREGATION] The expression "name" is neither present in the group by, nor is it an aggregate function.

-- Match the GROUP BY expression
> SELECT firstname || ' ' || name, sum(cost) FROM tasks GROUP BY firstname || ' ' || name;
 Sam Smith      15
 Sally Shuster  15

-- Break up the GROUP BY expression
> SELECT firstname, name, sum(cost) FROM tasks GROUP BY firstname, name;
 Sam    Smith   15
 Sally  Shuster 15

-- Missing grouping column
> SELECT name, firstname, sum(cost) FROM tasks GROUP BY name;
 [MISSING_AGGREGATION] The expression "firstname" is neither present in the group by, nor is it an aggregate function.

-- Add the grouping column
> SELECT firstname, name, sum(cost) FROM tasks GROUP BY firstname, name;
 Sam    Smith   15
 Sally  Shuster 15

-- Missing aggregate
> SELECT firstname, name, sum(cost), task FROM tasks GROUP BY firstname, name;
 [MISSING_AGGREGATION] The expression "task" is neither present in the group by, nor is it an aggregate function.

-- Add an aggregate
> SELECT firstname, name, sum(cost), array_agg(task) FROM tasks GROUP BY firstname, name;
 Sam    Smith   15  ["UNPIVOT","LATERAL"]
 Sally  Shuster 15  ["DELETE","GRANT"]

-- Return any task
> SELECT firstname, name, sum(cost), any_value(task) FROM tasks GROUP BY firstname, name;
 Sam    Smith   15  LATERAL
 Sally  Shuster 15  DELETE