apache_beam.runners.interactive.sql.sql_chain module

Module for tracking a chain of beam_sql magics applied.

For internal use only; no backwards-compatibility guarantees.

class apache_beam.runners.interactive.sql.sql_chain.SqlNode(output_name: str, source: Pipeline | Set[str], query: str, schemas: Set[Any] | None = None, evaluated: Set[Pipeline] | None = None, next: SqlNode | None = None, execution_count: int = 0)[source]

Bases: object

Each SqlNode represents a beam_sql magic applied.

output_name

the watched unique name of the beam_sql output. Can be used as an identifier.

Type:

str

source

the inputs consumed by this node. Can be a pipeline or a set of PCollections represented by their variable names watched. When it’s a pipeline, the node computes from raw values in the query, so the output can be consumed by any SqlNode in any SqlChain.

Type:

apache_beam.pipeline.Pipeline | Set[str]

query

the SQL query applied by this node.

Type:

str

schemas

the schemas (NamedTuple classes) used by this node.

Type:

Set[Any]

evaluated

the pipelines this node has been evaluated for.

Type:

Set[apache_beam.pipeline.Pipeline]

next

the next SqlNode applied chronologically.

Type:

apache_beam.runners.interactive.sql.sql_chain.SqlNode | None

execution_count

the execution count if in an IPython env.

Type:

int

output_name: str
source: Pipeline | Set[str]
query: str
schemas: Set[Any] = None
evaluated: Set[Pipeline] = None
next: SqlNode | None = None
execution_count: int = 0
to_pipeline(pipeline: Pipeline | None) Pipeline[source]

Converts the chain into an executable pipeline.

class apache_beam.runners.interactive.sql.sql_chain.SchemaLoadedSqlTransform(output_name, query, schemas, execution_count)[source]

Bases: PTransform

PTransform that loads schema before executing SQL.

When submitting a pipeline to remote runner for execution, schemas defined in the main module are not available without save_main_session. However, save_main_session might fail when there is anything unpicklable. This DoFn makes sure only the schemas needed are pickled locally and restored later on workers.

expand(source)[source]

Applies the SQL transform. If a PCollection uses a schema defined in the main session, use the additional DoFn to restore it on the worker.

class apache_beam.runners.interactive.sql.sql_chain.SqlChain(nodes: Dict[str, SqlNode] | None = None, root: SqlNode | None = None, current: SqlNode | None = None, user_pipeline: Pipeline | None = None)[source]

Bases: object

A chain of SqlNodes.

nodes

all nodes by their output_names.

Type:

Dict[str, apache_beam.runners.interactive.sql.sql_chain.SqlNode]

root

the first SqlNode applied chronologically.

Type:

apache_beam.runners.interactive.sql.sql_chain.SqlNode | None

current

the last node applied.

Type:

apache_beam.runners.interactive.sql.sql_chain.SqlNode | None

user_pipeline

the user defined pipeline this chain originates from. If None, the whole chain just computes from raw values in queries. Otherwise, at least some of the nodes in chain has queried against PCollections.

Type:

apache_beam.pipeline.Pipeline | None

nodes: Dict[str, SqlNode] = None
root: SqlNode | None = None
current: SqlNode | None = None
user_pipeline: Pipeline | None = None
to_pipeline() Pipeline[source]

Converts the chain into a beam pipeline.

append(node: SqlNode) SqlChain[source]

Appends a node to the chain.

get(output_name: str) SqlNode | None[source]

Gets a node from the chain based on the given output_name.