apache_beam.options.pipeline_options_validator module

Pipeline options validator.

For internal use only; no backwards-compatibility guarantees.

class apache_beam.options.pipeline_options_validator.PipelineOptionsValidator(options, runner)[source]

Bases: object

Validates PipelineOptions.

Goes through a list of known PipelineOption subclassess and calls:

validate(validator)

if one is implemented. Aggregates a list of validation errors from all and returns an aggregated list.

OPTIONS = [<class 'apache_beam.options.pipeline_options.DebugOptions'>, <class 'apache_beam.options.pipeline_options.GoogleCloudOptions'>, <class 'apache_beam.options.pipeline_options.PortableOptions'>, <class 'apache_beam.options.pipeline_options.SetupOptions'>, <class 'apache_beam.options.pipeline_options.StandardOptions'>, <class 'apache_beam.options.pipeline_options.TestOptions'>, <class 'apache_beam.options.pipeline_options.TypeOptions'>, <class 'apache_beam.options.pipeline_options.WorkerOptions'>]
REQUIRED_ENVIRONMENT_OPTIONS = {'DOCKER': [], 'EXTERNAL': ['external_service_address'], 'LOOPBACK': [], 'PROCESS': ['process_command']}
OPTIONAL_ENVIRONMENT_OPTIONS = {'DOCKER': ['docker_container_image'], 'EXTERNAL': [], 'LOOPBACK': [], 'PROCESS': ['process_variables']}
ERR_MISSING_OPTION = 'Missing required option: %s.'
ERR_MISSING_GCS_PATH = 'Missing GCS path option: %s.'
ERR_INVALID_GCS_PATH = 'Invalid GCS path (%s), given for the option: %s.'
ERR_INVALID_GCS_BUCKET = 'Invalid GCS bucket (%s), given for the option: %s. See https://developers.google.com/storage/docs/bucketnaming for more details.'
ERR_INVALID_GCS_OBJECT = 'Invalid GCS object (%s), given for the option: %s.'
ERR_INVALID_JOB_NAME = 'Invalid job_name (%s); the name must consist of only the characters [-a-z0-9], starting with a letter and ending with a letter or number'
ERR_INVALID_PROJECT_NUMBER = 'Invalid Project ID (%s). Please make sure you specified the Project ID, not project number.'
ERR_INVALID_PROJECT_ID = 'Invalid Project ID (%s). Please make sure you specified the Project ID, not project description.'
ERR_INVALID_ENDPOINT = 'Invalid url (%s) for dataflow endpoint. Please provide a valid url.'
ERR_INVALID_NOT_POSITIVE = 'Invalid value (%s) for option: %s. Value needs to be positive.'
ERR_INVALID_TEST_MATCHER_TYPE = 'Invalid value (%s) for option: %s. Please extend your matcher object from hamcrest.core.base_matcher.BaseMatcher.'
ERR_INVALID_TEST_MATCHER_UNPICKLABLE = 'Invalid value (%s) for option: %s. Please make sure the test matcher is unpicklable.'
ERR_INVALID_TRANSFORM_NAME_MAPPING = 'Invalid transform name mapping format. Please make sure the mapping is string key-value pairs. Invalid pair: (%s:%s)'
ERR_INVALID_ENVIRONMENT = 'Option %s is not compatible with environment type %s.'
ERR_ENVIRONMENT_CONFIG = 'Option environment_config is incompatible with option(s) %s.'
ERR_MISSING_REQUIRED_ENVIRONMENT_OPTION = 'Option %s is required for environment type %s.'
ERR_NUM_WORKERS_TOO_HIGH = 'num_workers (%s) cannot exceed max_num_workers (%s)'
ERR_REPEATABLE_OPTIONS_NOT_SET_AS_LIST = '(%s) is a string. Programmatically set PipelineOptions like (%s) options need to be specified as a list.'
GCS_URI = '(?P<SCHEME>[^:]+)://(?P<BUCKET>[^/]+)(/(?P<OBJECT>.*))?'
GCS_BUCKET = '^[a-z0-9][-_a-z0-9.]+[a-z0-9]$'
GCS_SCHEME = 'gs'
JOB_PATTERN = '[a-z]([-a-z0-9]*[a-z0-9])?'
PROJECT_ID_PATTERN = '[a-z][-a-z0-9:.]+[a-z0-9]'
PROJECT_NUMBER_PATTERN = '[0-9]*'
validate()[source]

Calls validate on subclassess and returns a list of errors.

validate will call validate method on subclasses, accumulate the returned list of errors, and returns the aggregate list.

Returns:

Aggregate list of errors after all calling all possible validate methods.

is_service_runner()[source]

True if pipeline will execute on the Google Cloud Dataflow service.

is_full_string_match(pattern, string)[source]

Returns True if the pattern matches the whole string.

validate_gcs_path(view, arg_name)[source]

Validates a GCS path against gs://bucket/object URI format.

validate_cloud_options(view)[source]

Validates job_name and project arguments.

validate_sdk_container_image_options(view)[source]
validate_container_prebuilding_options(view)[source]
validate_num_workers(view)[source]

Validates that Dataflow worker number is valid.

validate_worker_region_zone(view)[source]

Validates Dataflow worker region and zone arguments are consistent.

validate_optional_argument_positive(view, arg_name)[source]

Validates that an optional argument (if set) has a positive value.

validate_test_matcher(view, arg_name)[source]

Validates that on_success_matcher argument if set.

Validates that on_success_matcher is unpicklable and is instance of hamcrest.core.base_matcher.BaseMatcher.

validate_environment_options(view)[source]

Validates portable environment options.

validate_repeatable_argument_passed_as_list(view, arg_name)[source]

Validates that repeatable PipelineOptions like dataflow_service_options or experiments are specified as a list when set programmatically. This way, users do not inadvertently specify it as a string, mirroring the way they are set via the command lineRepeatable options, which are as passed a list.

validate_endpoint_url(endpoint_url)[source]