SELECTSELECT

从一个或多个表中检索结果集。Retrieves result sets from one or more tables.

语法Syntax

[ WITH with_query [ , ... ] ]
select_statement [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select_statement, ... ]
    [ ORDER BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ] } ]
    [ SORT BY { expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ] } ]
    [ CLUSTER BY { expression [ , ... ] } ]
    [ DISTRIBUTE BY { expression [, ... ] } ]
    [ WINDOW { named_window [ , WINDOW named_window, ... ] } ]
    [ LIMIT { ALL | expression } ]

select_statement 被定义为While select_statement is defined as

SELECT [ hints , ... ] [ ALL | DISTINCT ] { named_expression [ , ... ] }
    FROM { from_item [ , ... ] }
    [ PIVOT clause ]
    [ LATERAL VIEW clause ] [ ... ]
    [ WHERE boolean_expression ]
    [ GROUP BY expression [ , ... ] ]
    [ HAVING boolean_expression ]

参数Parameters

  • with_querywith_query

    主查询块前面的一个或多个通用表表达式。稍后可在 FROM 子句中引用这些表表达式。One or more common table expressions before the main query block.These table expressions can be referenced later in the FROM clause. 若要在 FROM 子句中找出重复的子查询块,这很有用,它还能提高查询的可读性。This is useful to abstract out repeated subquery blocks in the FROM clause and improves readability of the query.

  • 提示hints

    提示可帮助 Spark 优化器作出更好的计划决策。Hints help the Spark optimizer make better planning decisions. Spark 支持使用会影响联接策略的选择和数据的重新分区的提示。Spark supports hints that influence selection of join strategies and repartitioning of the data.

  • ALLALL

    选择关系中的所有匹配行。Select all matching rows from the relation. 默认情况下启用。Enabled by default.

  • DISTINCTDISTINCT

    删除结果中的重复项后,从关系中选择所有匹配的行。Select all matching rows from the relation after removing duplicates in results.

  • named_expressionnamed_expression

    具有指定名称的表达式。An expression with an assigned name. 表示列表达式。Denotes a column expression.

    语法: expression [AS] [alias]Syntax: expression [AS] [alias]

  • from_itemfrom_item

    查询的输入源。A source of input for the query. 下列类型作之一:One of the following:

  • PIVOTPIVOT

    用于数据透视;可根据特定的列值获取聚合值。Used for data perspective; you can get the aggregated values based on specific column value.

  • LATERAL VIEWLATERAL VIEW

    与生成器函数(例如 EXPLODE)结合使用,将生成包含一个或多个行的虚拟表。Used in conjunction with generator functions such as EXPLODE, which generates a virtual table containing one or more rows. LATERAL VIEW 将行应用于每个原始输出行。LATERAL VIEW applies the rows to each original output row.

  • WHEREWHERE

    根据提供的谓词筛选 FROM 子句的结果。Filters the result of the FROM clause based on the supplied predicates.

  • GROUP BYGROUP BY

    用于对行进行分组的表达式。The expressions that are used to group the rows. 它与聚合函数(MINMAXCOUNTSUMAVG)结合使用,基于分组表达式和每个组中的聚合值对行进行分组。This is used in conjunction with aggregate functions (MIN, MAX, COUNT, SUM, AVG) to group rows based on the grouping expressions and aggregate values in each group. FILTER 子句附加到聚合函数时,只会将匹配的行传递给该函数。When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.

  • HAVINGHAVING

    用于筛选由 GROUP BY 生成的行的谓词。The predicates by which the rows produced by GROUP BY are filtered. HAVING 子句用于在执行分组后对行进行筛选。The HAVING clause is used to filter rows after the grouping is performed. 如果指定的 HAVING 不带 GROUP BY,则表示不带分组表达式的 GROUP BY(全局聚合)。If you specify HAVING without GROUP BY, it indicates a GROUP BY without grouping expressions (global aggregate).

  • ORDER BYORDER BY

    查询的完整结果集的行顺序。An ordering of the rows of the complete result set of the query. 跨分区对输出行进行排序。The output rows are ordered across the partitions. 此参数与 SORT BYCLUSTER BYDISTRIBUTE BY 互斥,不能一起指定。This parameter is mutually exclusive with SORT BY, CLUSTER BY, and DISTRIBUTE BY and cannot be specified together.

  • SORT BYSORT BY

    用于对每个分区中的行进行排序的排序依据。An ordering by which the rows are ordered within each partition. 此参数与 ORDER BYCLUSTER BY 互斥,不能一起指定。This parameter is mutually exclusive with ORDER BY and CLUSTER BY and cannot be specified together.

  • CLUSTER BYCLUSTER BY

    一组表达式,用于对行进行重新分区和排序。A set of expressions that is used to repartition and sort the rows. 使用此子句与同时使用 DISTRIBUTE BYSORT BY 的效果相同。Using this clause has the same effect of using DISTRIBUTE BY and SORT BY together.

  • DISTRIBUTE BYDISTRIBUTE BY

    一组表达式,作为对结果行进行重新分区的依据。A set of expressions by which the result rows are repartitioned. 此参数与 ORDER BYCLUSTER BY 互斥,不能一起指定。This parameter is mutually exclusive with ORDER BY and CLUSTER BY and cannot be specified together.

  • LIMITLIMIT

    语句或子查询可返回的最大行数。The maximum number of rows that can be returned by a statement or subquery. 此子句通常与 ORDER BY 结合使用来生成确定的结果。This clause is mostly used in the conjunction with ORDER BY to produce a deterministic result.

  • boolean_expressionboolean_expression

    计算得出 Boolean 结果类型的任何表达式。Any expression that evaluates to a result type Boolean. 可使用逻辑运算符(ANDOR)将两个或多个表达式组合在一起。You can combine two or more expressions together using the logical operators ( AND, OR ).

  • expressionexpression

    计算得出值的一个或多个值、运算符和 SQL 函数的组合。A combination of one or more values, operators, and SQL functions that evaluates to a value.

  • named_windownamed_window

    一个或多个源窗口规范的别名。Aliases for one or more source window specifications. 可在查询的窗口定义中引用源窗口规范。The source window specifications can be referenced in the window definitions in the query.