apache_beam.ml.gcp.visionml module

A connector for sending API requests to the GCP Vision API.

class apache_beam.ml.gcp.visionml.AnnotateImage(features, retry=None, timeout=120, max_batch_size=None, min_batch_size=None, client_options=None, context_side_input=None, metadata=None)[source]

Bases: PTransform

A PTransform for annotating images using the GCP Vision API. ref: https://cloud.google.com/vision/docs/

Batches elements together using util.BatchElements PTransform and sends each batch of elements to the GCP Vision API. Element is a Union[str, bytes] of either an URI (e.g. a GCS URI) or bytes base64-encoded image data. Accepts an AsDict side input that maps each image to an image context.

Parameters:

features – (List[vision.Feature]) Required. The Vision API features to detect
retry – (google.api_core.retry.Retry) Optional. A retry object used to retry requests. If None is specified (default), requests will not be retried.
timeout – (float) Optional. The time in seconds to wait for the response from the Vision API. Default is 120.
max_batch_size – (int) Optional. Maximum number of images to batch in the same request to the Vision API. Default is 5 (which is also the Vision API max). This parameter is primarily intended for testing.
min_batch_size – (int) Optional. Minimum number of images to batch in the same request to the Vision API. Default is None. This parameter is primarily intended for testing.
client_options – (Union[dict, google.api_core.client_options.ClientOptions]) Optional. Client options used to set user options on the client. API Endpoint should be set through client_options.

context_side_input –

(beam.pvalue.AsDict) Optional. An AsDict of a PCollection to be passed to the _ImageAnnotateFn as the image context mapping containing additional image context and/or feature-specific parameters. Example usage:

image_contexts =
  [(''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
  ``vision.ImageContext()``]),
  (''gs://cloud-samples-data/vision/ocr/sign.jpg'', Union[dict,
  ``vision.ImageContext()``]),]

context_side_input =
  (
    p
    | "Image contexts" >> beam.Create(image_contexts)
  )

visionml.AnnotateImage(features,
  context_side_input=beam.pvalue.AsDict(context_side_input)))

metadata – (Optional[Sequence[Tuple[str, str]]]): Optional. Additional metadata that is provided to the method.

MAX_BATCH_SIZE = 5

MIN_BATCH_SIZE = 1

expand(pvalue)[source]

class apache_beam.ml.gcp.visionml.AnnotateImageWithContext(features, retry=None, timeout=120, max_batch_size=None, min_batch_size=None, client_options=None, metadata=None)[source]

Bases: AnnotateImage

A PTransform for annotating images using the GCP Vision API. ref: https://cloud.google.com/vision/docs/ Batches elements together using util.BatchElements PTransform and sends each batch of elements to the GCP Vision API.

Element is a tuple of:

(Union[str, bytes],
Optional[``vision.ImageContext``])

where the former is either an URI (e.g. a GCS URI) or bytes base64-encoded image data.

Parameters:

features – (List[vision.Feature]) Required. The Vision API features to detect
retry – (google.api_core.retry.Retry) Optional. A retry object used to retry requests. If None is specified (default), requests will not be retried.
timeout – (float) Optional. The time in seconds to wait for the response from the Vision API. Default is 120.
max_batch_size – (int) Optional. Maximum number of images to batch in the same request to the Vision API. Default is 5 (which is also the Vision API max). This parameter is primarily intended for testing.
min_batch_size – (int) Optional. Minimum number of images to batch in the same request to the Vision API. Default is None. This parameter is primarily intended for testing.
client_options – (Union[dict, google.api_core.client_options.ClientOptions]) Optional. Client options used to set user options on the client. API Endpoint should be set through client_options.
metadata – (Optional[Sequence[Tuple[str, str]]]): Optional. Additional metadata that is provided to the method.

expand(pvalue)[source]