apache_beam.runners.interactive.display.interactive_pipeline_graph module

Helper to render pipeline graph in IPython when running interactively.

This module is experimental. No backwards-compatibility guarantees.

apache_beam.runners.interactive.display.interactive_pipeline_graph.nice_str(o)[source]
apache_beam.runners.interactive.display.interactive_pipeline_graph.format_sample(contents, count=1000)[source]
class apache_beam.runners.interactive.display.interactive_pipeline_graph.InteractivePipelineGraph(pipeline, required_transforms=None, referenced_pcollections=None, cached_pcollections=None)[source]

Bases: PipelineGraph

Creates the DOT representation of an interactive pipeline. Thread-safe.

Constructor of PipelineGraph.

Parameters:
  • pipeline – (Pipeline proto) or (Pipeline) pipeline to be rendered.

  • required_transforms – (list/set of str) ID of top level PTransforms that lead to visible results.

  • referenced_pcollections – (list/set of str) ID of PCollections that are referenced by top level PTransforms executed (i.e. required_transforms)

  • cached_pcollections – (set of str) a set of PCollection IDs of those whose cached results are used in the execution.

update_pcollection_stats(pcollection_stats)[source]

Updates PCollection stats.

Parameters:

pcollection_stats – (dict of dict) maps PCollection IDs to informations. In particular, we only care about the field ‘sample’ which should be a the PCollection result in as a list.