Beam SQL Walkthrough

This page illustrates the usage of Beam SQL with example code.

Beam Schemas and Rows

A SQL query can only be applied to a PCollection<T> where T has a schema registered, or a PCollection<Row>. See the schema documentation in the Beam Programming Guide for details on registering a schema for a type T.

If you don’t have an existing type T, a PCollection<Row> can be obtained multiple ways, for example:

Once you have a PCollection<Row> in hand, you may use SqlTransform to apply SQL queries to it.

SqlTransform

SqlTransform.query(queryString) method is the only API to create a PTransform from a string representation of the SQL query. You can apply this PTransform to either a single PCollection or a PCollectionTuple which holds multiple PCollections:

BeamSqlExample in the code repository shows basic usage of both APIs.