Beam Calcite SQL lexical structure
A Beam Calcite SQL statements are comprised of a series of tokens. Tokens include identifiers, quoted identifiers, literals, keywords, operators, and special characters. Tokens can be separated by whitespace (space, backspace, tab, newline) or comments.
Identifiers
Identifiers are names that are associated with columns, tables, and other database objects.
Identifiers must begin with a letter or an underscore. Subsequent
characters can be letters, numbers, or underscores. Quoted identifiers
are identifiers enclosed by backtick (`
) characters and can contain
any character, such as spaces or symbols. However, quoted identifiers
cannot be empty. Reserved Keywords can only be used
as identifiers if enclosed by backticks.
Syntax (defined here as a regular expression):
[A-Za-z_][A-Za-z_0-9]*
Examples:
Customers5
_dataField1
ADGROUP
Invalid examples:
5Customers
_dataField!
GROUP
5Customers
begins with a number, not a letter or underscore.
_dataField!
contains the special character “!” which is not a letter,
number, or underscore. GROUP
is a reserved keyword, and therefore
cannot be used as an identifier without being enclosed by backtick
characters.
Both identifiers and quoted identifiers are case insensitive, with some nuances. See Case Sensitivity for further details.
Quoted identifiers have the same escape sequences as string literals, defined below.
Literals
A literal represents a constant value of a built-in data type. Some, but not all, data types can be expressed as literals.
String Literals
Both string and bytes literals must be quoted with single
('
) quotation mark.
Quoted literals:
Literal | Examples | Description |
---|---|---|
Quoted string |
| Quoted strings enclosed by single (' ) quotes can contain unescaped double (" ) quotes.Two quotation marks ( '' ) is the escape sequence.Quoted strings can contain newlines. |
Integer Literals
Integer literals are either a sequence of decimal digits (0 through 9).
Integers can be prefixed by “+
” or “-
” to represent positive and
negative values, respectively.
Examples:
123
-123
An integer literal is interpreted as an BIGINT
.
Floating Point Literals
Syntax options:
[+-]DIGITS.[DIGITS][e[+-]DIGITS]
[DIGITS].DIGITS[e[+-]DIGITS]
DIGITSe[+-]DIGITS
DIGITS
represents one or more decimal numbers (0 through 9) and e
represents the exponent marker (e or E).
Examples:
123.456e-67
.1E4
58.
4e2
Numeric literals that contain either a decimal point or an exponent marker are presumed to be type double.
Implicit coercion of floating point literals to float type is possible if the value is within the valid float range.
There is no literal representation of NaN or infinity.
Array Literals
Array literals are a comma-separated lists of elements enclosed in
square brackets prefixed with the ARRAY
keyword.
Examples:
ARRAY[1, 2, 3]
ARRAY['x', 'y', 'xy']
Struct Literals
Syntax:
(elem[, elem...])
where elem
is an element in the struct. elem
must be a literal data
type, not an expression or column name.
The output type is an anonymous struct type (structs are not named types) with anonymous fields with types matching the types of the input expressions.
Example | Output Type |
---|---|
(1, 2, 3) | STRUCT<BIGINT,BIGINT,BIGINT> |
(1, 'abc') | STRUCT<BIGINT,STRING> |
Date Literals
Syntax:
DATE 'YYYY-M[M]-D[D]'
Date literals contain the DATE
keyword followed by a string literal
that conforms to the canonical date format, enclosed in single quotation
marks. Date literals support a range between the years 1 and 9999,
inclusive. Dates outside of this range are invalid.
For example, the following date literal represents September 27, 2014:
DATE '2014-09-27'
String literals in canonical date format also implicitly coerce to DATE type when used where a DATE-type expression is expected. For example, in the query
SELECT * FROM foo WHERE date_col = "2014-09-27"
the string literal "2014-09-27"
will be coerced to a date literal.
Time Literals
Syntax:
TIME '[H]H:[M]M:[S]S[.DDDDDD]]'
TIME literals contain the TIME
keyword and a string literal that
conforms to the canonical time format, enclosed in single quotation
marks.
For example, the following time represents 12:30 p.m.:
TIME '12:30:00.45'
Timestamp literals
Syntax:
TIMESTAMP 'YYYY-[M]M-[D]D [[H]H:[M]M:[S]S[.DDDDDD]]'
Timestamp literals contain the TIMESTAMP
keyword and a string literal
that conforms to the canonical timestamp format, enclosed in single
quotation marks.
Timestamp literals support a range between the years 1 and 9999, inclusive. Timestamps outside of this range are invalid.
For example, the following timestamp represents 12:30 p.m. on September 27, 2014:
TIMESTAMP '2014-09-27 12:30:00.45'
Case Sensitivity
Beam SQL follows these rules for case sensitivity:
Category | Case Sensitive? | Notes |
---|---|---|
Keywords | No | |
Function names | No | |
Table names | Yes | |
Column names | Yes | |
String values | Yes | |
String comparisons | Yes | |
Aliases within a query | No | |
Regular expression matching | See Notes | Regular expression matching is case sensitive by default, unless the expression itself specifies that it should be case insensitive. |
LIKE matching | Yes |
Reserved Keywords
Keywords are a group of tokens that have special meaning in the Beam SQL language, and have the following characteristics:
- Keywords cannot be used as identifiers unless enclosed by backtick (`) characters.
- Keywords are case insensitive.
Beam SQL has the following reserved keywords.
A ABS ABSOLUTE ACTION ADA ADD ADMIN AFTER ALL ALLOCATE ALLOW ALTER ALWAYS AND ANY APPLY ARE ARRAY ARRAY_MAX_CARDINALITY AS ASC ASENSITIVE ASSERTION ASSIGNMENT ASYMMETRIC AT ATOMIC ATTRIBUTE ATTRIBUTES AUTHORIZATION AVG BEFORE BEGIN BEGIN_FRAME BEGIN_PARTITION BERNOULLI BETWEEN BIGINT BINARY BIT BLOB BOOLEAN BOTH BREADTH BY C CALL CALLED CARDINALITY CASCADE CASCADED CASE CAST CATALOG CATALOG_NAME CEIL CEILING CENTURY CHAIN CHAR CHAR_LENGTH CHARACTER CHARACTER_LENGTH CHARACTER_SET_CATALOG CHARACTER_SET_NAME CHARACTER_SET_SCHEMA CHARACTERISTICS CHARACTERS CHECK CLASSIFIER CLASS_ORIGIN CLOB CLOSE COALESCE COBOL COLLATE COLLATION COLLATION_CATALOG COLLATION_NAME COLLATION_SCHEMA COLLECT COLUMN COLUMN_NAME COMMAND_FUNCTION COMMAND_FUNCTION_CODE COMMENT COMMIT COMMITTED CONDITION CONDITION_NUMBER CONNECT CONNECTION CONNECTION_NAME CONSTRAINT CONSTRAINT_CATALOG CONSTRAINT_NAME CONSTRAINT_SCHEMA CONSTRAINTS CONSTRUCTOR CONTAINS CONTINUE CONVERT CORR CORRESPONDING COUNT COVAR_POP COVAR_SAMP CREATE CROSS CUBE CUME_DIST CURRENT CURRENT_CATALOG CURRENT_DATE CURRENT_DEFAULT_TRANSFORM_GROUP CURRENT_PATH CURRENT_ROLE CURRENT_ROW CURRENT_SCHEMA CURRENT_TIME CURRENT_TIMESTAMP CURRENT_TRANSFORM_GROUP_FOR_TYPE CURRENT_USER CURSOR CURSOR_NAME CYCLE DATA DATABASE DATE DATETIME_INTERVAL_CODE DATETIME_INTERVAL_PRECISION DAY DEALLOCATE DEC DECADE DECIMAL DECLARE DEFAULT DEFAULTS DEFERRABLE DEFERRED DEFINE DEFINED DEFINER DEGREE DELETE DENSE_RANK DEPTH DEREF DERIVED DESC DESCRIBE DESCRIPTION DESCRIPTOR DETERMINISTIC DIAGNOSTICS DISALLOW DISCONNECT DISPATCH DISTINCT DOMAIN DOUBLE DOW DOY DROP DYNAMIC DYNAMIC_FUNCTION DYNAMIC_FUNCTION_CODE EACH ELEMENT ELSE EMPTY END END-EXEC END_FRAME END_PARTITION EPOCH EQUALS ESCAPE EVERY EXCEPT EXCEPTION EXCLUDE EXCLUDING EXEC EXECUTE EXISTS EXP EXPLAIN EXTEND EXTERNAL EXTRACT FALSE FETCH FILTER FINAL FIRST FIRST_VALUE FLOAT FLOOR FOLLOWING FOR FOREIGN FORTRAN FOUND FRAC_SECOND FRAME_ROW FREE FROM FULL FUNCTION FUSION G GENERAL GENERATED GEOMETRY GET GLOBAL GO GOTO GRANT GRANTED GROUP GROUPING GROUPS HAVING HIERARCHY HOLD HOUR IDENTITY IF IMMEDIATE IMMEDIATELY IMPLEMENTATION IMPORT IN INCLUDING INCREMENT INDICATOR INITIAL INITIALLY INNER INOUT INPUT INSENSITIVE INSERT INSTANCE INSTANTIABLE INT INTEGER INTERSECT INTERSECTION INTERVAL INTO INVOKER IS ISOLATION JAVA JOIN JSON K KEY KEY_MEMBER KEY_TYPE LABEL LAG LANGUAGE LARGE LAST LAST_VALUE LATERAL LEAD LEADING LEFT LENGTH LEVEL LIBRARY LIKE LIKE_REGEX LIMIT LN LOCAL LOCALTIME LOCALTIMESTAMP LOCATION LOCATOR LOWER M MAP MATCH MATCHED MATCHES MATCH_NUMBER MATCH_RECOGNIZE MAX MAXVALUE MEASURES MEMBER MERGE MESSAGE_LENGTH MESSAGE_OCTET_LENGTH MESSAGE_TEXT METHOD MICROSECOND MILLENNIUM MIN MINUTE MINVALUE MOD MODIFIES MODULE MONTH MORE MULTISET MUMPS NAME NAMES NATIONAL NATURAL NCHAR NCLOB NESTING | NEW NEXT NO NONE NORMALIZE NORMALIZED NOT NTH_VALUE NTILE NULL NULLABLE NULLIF NULLS NUMBER NUMERIC OBJECT OCCURRENCES_REGEX OCTET_LENGTH OCTETS OF OFFSET OLD OMIT ON ONE ONLY OPEN OPTION OPTIONS OR ORDER ORDERING ORDINALITY OTHERS OUT OUTER OUTPUT OVER OVERLAPS OVERLAY OVERRIDING PAD PARAMETER PARAMETER_MODE PARAMETER_NAME PARAMETER_ORDINAL_POSITION PARAMETER_SPECIFIC_CATALOG PARAMETER_SPECIFIC_NAME PARAMETER_SPECIFIC_SCHEMA PARTIAL PARTITION PASCAL PASSTHROUGH PAST PATH PATTERN PER PERCENT PERCENTILE_CONT PERCENTILE_DISC PERCENT_RANK PERIOD PERMUTE PLACING PLAN PLI PORTION POSITION POSITION_REGEX POWER PRECEDES PRECEDING PRECISION PREPARE PRESERVE PREV PRIMARY PRIOR PRIVILEGES PROCEDURE PUBLIC QUARTER RANGE RANK READ READS REAL RECURSIVE REF REFERENCES REFERENCING REGR_AVGX REGR_AVGY REGR_COUNT REGR_INTERCEPT REGR_R2 REGR_SLOPE REGR_SXX REGR_SXY REGR_SYY RELATIVE RELEASE REPEATABLE REPLACE RESET RESTART RESTRICT RESULT RETURN RETURNED_CARDINALITY RETURNED_LENGTH RETURNED_OCTET_LENGTH RETURNED_SQLSTATE RETURNS REVOKE RIGHT ROLE ROLLBACK ROLLUP ROUTINE ROUTINE_CATALOG ROUTINE_NAME ROUTINE_SCHEMA ROW ROW_COUNT ROW_NUMBER ROWS RUNNING SAVEPOINT SCALE SCHEMA SCHEMA_NAME SCOPE SCOPE_CATALOGS SCOPE_NAME SCOPE_SCHEMA SCROLL SEARCH SECOND SECTION SECURITY SEEK SELECT SELF SENSITIVE SEQUENCE SERIALIZABLE SERVER SERVER_NAME SESSION SESSION_USER SET SETS MINUS SHOW SIMILAR SIMPLE SIZE SKIP SMALLINT SOME SOURCE SPACE SPECIFIC SPECIFIC_NAME SPECIFICTYPE SQL SQLEXCEPTION SQLSTATE SQLWARNING SQL_BIGINT SQL_BINARY SQL_BIT SQL_BLOB SQL_BOOLEAN SQL_CHAR SQL_CLOB SQL_DATE SQL_DECIMAL SQL_DOUBLE SQL_FLOAT SQL_INTEGER SQL_INTERVAL_DAY SQL_INTERVAL_DAY_TO_HOUR SQL_INTERVAL_DAY_TO_MINUTE SQL_INTERVAL_DAY_TO_SECOND SQL_INTERVAL_HOUR SQL_INTERVAL_HOUR_TO_MINUTE SQL_INTERVAL_HOUR_TO_SECOND SQL_INTERVAL_MINUTE SQL_INTERVAL_MINUTE_TO_SECOND SQL_INTERVAL_MONTH SQL_INTERVAL_SECOND SQL_INTERVAL_YEAR SQL_INTERVAL_YEAR_TO_MONTH SQL_LONGVARBINARY SQL_LONGVARCHAR SQL_LONGVARNCHAR SQL_NCHAR SQL_NCLOB SQL_NUMERIC SQL_NVARCHAR SQL_REAL SQL_SMALLINT SQL_TIME SQL_TIMESTAMP SQL_TINYINT SQL_TSI_DAY SQL_TSI_FRAC_SECOND SQL_TSI_HOUR SQL_TSI_MICROSECOND SQL_TSI_MINUTE SQL_TSI_MONTH SQL_TSI_QUARTER SQL_TSI_SECOND SQL_TSI_WEEK SQL_TSI_YEAR SQL_VARBINARY SQL_VARCHAR SQRT START STATE STATEMENT STATIC STDDEV_POP STDDEV_SAMP STREAM STRUCTURE STYLE SUBCLASS_ORIGIN SUBMULTISET SUBSET SUBSTITUTE SUBSTRING SUBSTRING_REGEX SUCCEEDS SUM SYMMETRIC SYSTEM SYSTEM_TIME SYSTEM_USER TABLE TABLE_NAME TABLESAMPLE TBLPROPERTIES TEMPORARY THEN TIES TIME TIMESTAMP TIMESTAMPADD TIMESTAMPDIFF TIMEZONE_HOUR TIMEZONE_MINUTE TINYINT TO TOP_LEVEL_COUNT TRAILING TRANSACTION TRANSACTIONS_ACTIVE TRANSACTIONS_COMMITTED TRANSACTIONS_ROLLED_BACK TRANSFORM TRANSFORMS TRANSLATE TRANSLATE_REGEX TRANSLATION TREAT TRIGGER TRIGGER_CATALOG TRIGGER_NAME TRIGGER_SCHEMA TRIM TRIM_ARRAY TRUE TRUNCATE TYPE UESCAPE UNBOUNDED UNCOMMITTED UNDER UNION UNIQUE UNKNOWN UNNAMED UNNEST UPDATE UPPER UPSERT USAGE USER USER_DEFINED_TYPE_CATALOG USER_DEFINED_TYPE_CODE USER_DEFINED_TYPE_NAME USER_DEFINED_TYPE_SCHEMA USING VALUE VALUES VALUE_OF VAR_POP VAR_SAMP VARBINARY VARCHAR VARYING VERSION VERSIONING VIEW WEEK WHEN WHENEVER WHERE WIDTH_BUCKET WINDOW WITH WITHIN WITHOUT WORK WRAPPER WRITE XML YEAR ZONE |
Terminating Semicolons
Statements can optionally use a terminating semicolon (;
) in the
context of a query string submitted through an Application Programming
Interface (API). Some interactive tools require statements to have a
terminating semicolon. In a request containing multiple statements,
statements must be separated by semicolons, but the semicolon is
optional for the final statement.
Comments
Comments are sequences of characters that are ignored by the parser. Beam SQL supports the following types of comments.
Single line comments
Single line comments are supported by prepending --
before the comment.
Examples
SELECT x FROM T; --x is a field and T is a table
Comment includes all characters from the ‘--
’ sequence to the end of
the line. You can optionally add a space after the ‘--
’.
Multiline comments
Multiline comments are supported by enclosing the comment using
/* <comment> */
.
Example:
SELECT x FROM T /* x is a field and T is a table */
WHERE x = 3;
Invalid example:
SELECT x FROM T /* comment starts here
/* comment ends on this line */
this line is not considered a comment */
WHERE x = 3;
Comment includes all characters, including newlines, enclosed by the
first occurrence of ‘/*
’ and the first subsequent occurrence of
‘*/
’. Nested comments are not supported. The second example contains a
nested comment that renders the query invalid.
Portions of this page are modifications based on work created and shared by Google and used according to terms described in the Creative Commons 3.0 Attribution License.