apache_beam.runners.interactive.dataproc.dataproc_cluster_manager module

class apache_beam.runners.interactive.dataproc.dataproc_cluster_manager.DataprocClusterManager(cluster_metadata: ClusterMetadata)[source]

Bases: object

Self-contained cluster manager that controls the lifecyle of a Dataproc cluster connected by one or more pipelines under Interactive Beam.

Initializes the DataprocClusterManager with properties required to interface with the Dataproc ClusterControllerClient.

stage_init_action() str[source]

Stages the initialization action script to GCS cache root to set up Dataproc clusters.

Returns the staged gcs file path.

create_cluster(cluster: dict) None[source]

Attempts to create a cluster using attributes that were initialized with the DataprocClusterManager instance.

Parameters:

cluster – Dictionary representing Dataproc cluster. Read more about the schema for clusters here: https://cloud.google.com/python/docs/reference/dataproc/latest/google.cloud.dataproc_v1.types.Cluster

Calls _create_cluster with a configuration that enables FlinkRunner.

cleanup() None[source]

Deletes the cluster that uses the attributes initialized with the DataprocClusterManager instance.

get_cluster_details() Cluster[source]

Gets the Dataproc_v1 Cluster object for the current cluster manager.

wait_for_cluster_to_provision() None[source]
get_staging_location() str[source]

Gets the staging bucket of an existing Dataproc cluster.

parse_master_url_and_dashboard(line: str) Tuple[str, str][source]

Parses the master_url and YARN application_id of the Flink process from an input line. The line containing both the master_url and application id is always formatted as such: {text} Found Web Interface {master_url} of application ‘{application_id}’.n

Truncated example where ‘…’ represents additional text between segments: … google-dataproc-startup[000]: … activate-component-flink[0000]: …org.apache.flink.yarn.YarnClusterDescriptor… [] - Found Web Interface example-master-url:50000 of application ‘application_123456789000_0001’.

Returns the flink_master_url and dashboard link as a tuple.

get_master_url_and_dashboard() Tuple[str | None, str | None][source]

Returns the master_url of the current cluster.

cleanup_staging_files() None[source]