apache_beam.runners.interactive.utils module

Utilities to be used in Interactive Beam.

apache_beam.runners.interactive.utils.to_element_list(reader: Generator[Union[beam_runner_api_pb2.TestStreamPayload.Event, WindowedValueHolder]] # noqa: F821, coder: Coder # noqa: F821, include_window_info: bool, n: int = None, include_time_events: bool = False) List[WindowedValue][source]

Returns an iterator that properly decodes the elements from the reader.

apache_beam.runners.interactive.utils.elements_to_df(elements: List[WindowedValue], include_window_info: bool = False, element_type: Any = None) DataFrame[source]

Parses the given elements into a Dataframe.

If the elements are a list of WindowedValues, then it will break out the elements into their own DataFrame and return it. If include_window_info is True, then it will concatenate the windowing information onto the elements DataFrame.

apache_beam.runners.interactive.utils.register_ipython_log_handler() None[source]

Adds the IPython handler to a dummy parent logger (named ‘apache_beam.runners.interactive’) of all interactive modules’ loggers so that if is_in_notebook, logging displays the logs as HTML in frontends.

class apache_beam.runners.interactive.utils.IPythonLogHandler(level=0)[source]

Bases: Handler

A logging handler to display logs as HTML in IPython backed frontends.

Initializes the instance - basically setting the formatter to None and the filter list to empty.

log_template = '\n            <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css" integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh" crossorigin="anonymous">\n            <div class="alert alert-{level}">{msg}</div>'
logging_to_alert_level_map = {0: 'light', 10: 'dark', 20: 'info', 30: 'warning', 40: 'danger', 50: 'danger'}
emit(record)[source]
apache_beam.runners.interactive.utils.obfuscate(*inputs: Any) str[source]

Obfuscates any inputs into a hexadecimal string.

class apache_beam.runners.interactive.utils.ProgressIndicator(enter_text: str, exit_text: str)[source]

Bases: object

An indicator visualizing code execution in progress.

spinner_template = '\n            <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css" integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh" crossorigin="anonymous">\n            <div id="{id}">\n              <div class="spinner-border text-info" role="status"></div>\n              <span class="text-info">{text}</span>\n            </div>\n            '
spinner_removal_template = '\n            $("#{id}").remove();'
apache_beam.runners.interactive.utils.progress_indicated(func: Callable[..., Any]) Callable[..., Any][source]

A decorator using a unique progress indicator as a context manager to execute the given function within.

apache_beam.runners.interactive.utils.as_json(func: Callable[..., Any]) Callable[..., str][source]

A decorator convert python objects returned by callables to json string.

The decorated function should always return an object parsable by json.dumps. If the object is not parsable, the str() of original object is returned instead.

apache_beam.runners.interactive.utils.deferred_df_to_pcollection(df)[source]
apache_beam.runners.interactive.utils.pcoll_by_name() Dict[str, PCollection][source]

Finds all PCollections by their variable names defined in the notebook.

apache_beam.runners.interactive.utils.find_pcoll_name(pcoll: PCollection) str[source]

Finds the variable name of a PCollection defined by the user.

Returns None if not assigned to any variable.

apache_beam.runners.interactive.utils.cacheables() Dict[CacheKey, Cacheable][source]

Finds all Cacheables with their CacheKeys.

apache_beam.runners.interactive.utils.watch_sources(pipeline)[source]

Watches the unbounded sources in the pipeline.

Sources can output to a PCollection without a user variable reference. In this case the source is not cached. We still want to cache the data so we synthetically create a variable to the intermediate PCollection.

apache_beam.runners.interactive.utils.has_unbounded_sources(pipeline)[source]

Checks if a given pipeline has recordable sources.

apache_beam.runners.interactive.utils.unbounded_sources(pipeline)[source]

Returns a pipeline’s recordable sources.

apache_beam.runners.interactive.utils.create_var_in_main(name: str, value: Any, watch: bool = True) Tuple[str, Any][source]

Declares a variable in the main module.

Parameters:
  • name – the variable name in the main module.

  • value – the value of the variable.

  • watch – whether to watch it in the interactive environment.

Returns:

A 2-entry tuple of the variable name and value.

apache_beam.runners.interactive.utils.assert_bucket_exists(bucket_name: str) None[source]

Asserts whether the specified GCS bucket with the name bucket_name exists.

Logs an error and raises a ValueError if the bucket does not exist.

Logs a warning if the bucket cannot be verified to exist.

apache_beam.runners.interactive.utils.detect_pipeline_runner(pipeline)[source]