apache_beam.runners.runner module¶
PipelineRunner, an abstract base runner object.
- class apache_beam.runners.runner.PipelineRunner[source]¶
Bases:
object
A runner of a pipeline object.
The base runner provides a run() method for visiting every node in the pipeline’s DAG and executing the transforms computing the PValue in the node.
A custom runner will typically provide implementations for some of the transform methods (ParDo, GroupByKey, Create, etc.). It may also provide a new implementation for clear_pvalue(), which is used to wipe out materialized values in order to reduce footprint.
- run(transform: PTransform, options: PipelineOptions | None = None) PipelineResult [source]¶
Run the given transform or callable with this runner.
Blocks until the pipeline is complete. See also PipelineRunner.run_async.
- run_async(transform: PTransform, options: PipelineOptions | None = None) PipelineResult [source]¶
Run the given transform or callable with this runner.
May return immediately, executing the pipeline in the background. The returned result object can be queried for progress, and wait_until_finish may be called to block until completion.
- run_portable_pipeline(pipeline: Pipeline, options: PipelineOptions) PipelineResult [source]¶
Execute the entire pipeline.
Runners should override this method.
- default_environment(options: PipelineOptions) Environment [source]¶
Returns the default environment that should be used for this runner.
Runners may override this method to provide alternative environments.
- run_pipeline(pipeline: Pipeline, options: PipelineOptions) PipelineResult [source]¶
Execute the entire pipeline or the sub-DAG reachable from a node.
- apply(transform: PTransform, input: pvalue.PValue | None, options: PipelineOptions)[source]¶
- class apache_beam.runners.runner.PipelineState[source]¶
Bases:
object
State of the Pipeline, as returned by
PipelineResult.state
.This is meant to be the union of all the states any runner can put a pipeline in. Currently, it represents the values of the dataflow API JobState enum.
- UNKNOWN = 'UNKNOWN'¶
- STARTING = 'STARTING'¶
- STOPPED = 'STOPPED'¶
- RUNNING = 'RUNNING'¶
- DONE = 'DONE'¶
- FAILED = 'FAILED'¶
- CANCELLED = 'CANCELLED'¶
- UPDATED = 'UPDATED'¶
- DRAINING = 'DRAINING'¶
- DRAINED = 'DRAINED'¶
- PENDING = 'PENDING'¶
- CANCELLING = 'CANCELLING'¶
- RESOURCE_CLEANING_UP = 'RESOURCE_CLEANING_UP'¶
- UNRECOGNIZED = 'UNRECOGNIZED'¶
- class apache_beam.runners.runner.PipelineResult(state)[source]¶
Bases:
object
A
PipelineResult
provides access to info about a pipeline.- property state¶
Return the current state of the pipeline execution.
- wait_until_finish(duration=None)[source]¶
Waits until the pipeline finishes and returns the final status.
- Parameters:
duration (int) – The time to wait (in milliseconds) for job to finish. If it is set to
None
, it will wait indefinitely until the job is finished.- Raises:
IOError – If there is a persistent problem getting job information.
NotImplementedError – If the runner does not support this operation.
- Returns:
The final state of the pipeline, or
None
on timeout.
- cancel()[source]¶
Cancels the pipeline execution.
- Raises:
IOError – If there is a persistent problem getting job information.
NotImplementedError – If the runner does not support this operation.
- Returns:
The final state of the pipeline.
- metrics()[source]¶
Returns
MetricResults
object to query metrics from the runner.- Raises:
NotImplementedError – If the runner does not support this operation.