apache_beam.runners.interactive.dataproc.dataproc_cluster_manager module¶
- class apache_beam.runners.interactive.dataproc.dataproc_cluster_manager.DataprocClusterManager(cluster_metadata: ClusterMetadata)[source]¶
Bases:
object
Self-contained cluster manager that controls the lifecyle of a Dataproc cluster connected by one or more pipelines under Interactive Beam.
Initializes the DataprocClusterManager with properties required to interface with the Dataproc ClusterControllerClient.
- stage_init_action() str [source]¶
Stages the initialization action script to GCS cache root to set up Dataproc clusters.
Returns the staged gcs file path.
- create_cluster(cluster: dict) None [source]¶
Attempts to create a cluster using attributes that were initialized with the DataprocClusterManager instance.
- Parameters:
cluster – Dictionary representing Dataproc cluster. Read more about the schema for clusters here: https://cloud.google.com/python/docs/reference/dataproc/latest/google.cloud.dataproc_v1.types.Cluster
- create_flink_cluster() None [source]¶
Calls _create_cluster with a configuration that enables FlinkRunner.
- cleanup() None [source]¶
Deletes the cluster that uses the attributes initialized with the DataprocClusterManager instance.
- get_cluster_details() Cluster [source]¶
Gets the Dataproc_v1 Cluster object for the current cluster manager.
- parse_master_url_and_dashboard(line: str) Tuple[str, str] [source]¶
Parses the master_url and YARN application_id of the Flink process from an input line. The line containing both the master_url and application id is always formatted as such: {text} Found Web Interface {master_url} of application ‘{application_id}’.n
Truncated example where ‘…’ represents additional text between segments: … google-dataproc-startup[000]: … activate-component-flink[0000]: …org.apache.flink.yarn.YarnClusterDescriptor… [] - Found Web Interface example-master-url:50000 of application ‘application_123456789000_0001’.
Returns the flink_master_url and dashboard link as a tuple.