apache_beam.ml.gcp.cloud_dlp module

PTransforms that implement Google Cloud Data Loss Prevention functionality.

class apache_beam.ml.gcp.cloud_dlp.MaskDetectedDetails(project=None, deidentification_template_name=None, deidentification_config=None, inspection_template_name=None, inspection_config=None, timeout=None)[source]

Bases: PTransform

Scrubs sensitive information detected in text. The PTransform returns a PCollection of str Example usage:

pipeline | MaskDetectedDetails(project='example-gcp-project',
  deidentification_config={
      'info_type_transformations: {
          'transformations': [{
              'primitive_transformation': {
                  'character_mask_config': {
                      'masking_character': '#'
                  }
              }
          }]
      }
  }, inspection_config={'info_types': [{'name': 'EMAIL_ADDRESS'}]})

Initializes a MaskDetectedDetails transform.

Parameters:
  • project – Optional. GCP project name in which inspection will be performed

  • deidentification_template_name (str) – Either this or deidentification_config required. Name of deidentification template to be used on detected sensitive information instances in text.

  • deidentification_config – (Union[dict, google.cloud.dlp_v2.types.DeidentifyConfig]): Configuration for the de-identification of the content item. If both template name and config are supplied, config is more important.

  • inspection_template_name (str) – This or inspection_config required. Name of inspection template to be used to detect sensitive data in text.

  • inspection_config – (Union[dict, google.cloud.dlp_v2.types.InspectConfig]): Configuration for the inspector used to detect sensitive data in text. If both template name and config are supplied, config takes precedence.

  • timeout (float) – Optional. The amount of time, in seconds, to wait for the request to complete.

expand(pcoll)[source]
class apache_beam.ml.gcp.cloud_dlp.InspectForDetails(project=None, inspection_template_name=None, inspection_config=None, timeout=None)[source]

Bases: PTransform

Inspects input text for sensitive information. the PTransform returns a PCollection of List[google.cloud.dlp_v2.proto.dlp_pb2.Finding] Example usage:

pipeline | InspectForDetails(project='example-gcp-project',
          inspection_config={'info_types': [{'name': 'EMAIL_ADDRESS'}]})

Initializes a InspectForDetails transform.

Parameters:
  • project – Optional. GCP project name in which inspection will be performed

  • inspection_template_name (str) – This or inspection_config required. Name of inspection template to be used to detect sensitive data in text.

  • inspection_config – (Union[dict, google.cloud.dlp_v2.types.InspectConfig]): Configuration for the inspector used to detect sensitive data in text. If both template name and config are supplied, config takes precedence.

  • timeout (float) – Optional. The amount of time, in seconds, to wait for the request to complete.

expand(pcoll)[source]