apache_beam.ml.transforms.handlers module¶
- class apache_beam.ml.transforms.handlers.TFTProcessHandler(*, artifact_location: str, transforms: Sequence[TFTOperation] | None = None, artifact_mode: str = 'produce')[source]¶
Bases:
ProcessHandler
[NamedTuple
|Row
|Dict
[str
,str
|float
|int
|bytes
|ndarray
],Row
|Dict
[str
,ndarray
]]A handler class for processing data with TensorFlow Transform (TFT) operations.
- get_raw_data_feature_spec(input_types: Dict[str, type]) Dict[str, tensorflow.io.VarLenFeature] [source]¶
Return a DatasetMetadata object to be used with tft_beam.AnalyzeAndTransformDataset. :param input_types: A dictionary of column names and types.
- Returns:
A DatasetMetadata object.
- convert_raw_data_feature_spec_to_dataset_metadata(raw_data_feature_spec) tensorflow_transform.tf_metadata.dataset_metadata.DatasetMetadata [source]¶
- get_raw_data_metadata(input_types: Dict[str, type]) tensorflow_transform.tf_metadata.dataset_metadata.DatasetMetadata [source]¶
- write_transform_artifacts(transform_fn, location)[source]¶
Write transform artifacts to the given location. :param transform_fn: A transform_fn object. :param location: A location to write the artifacts.
- Returns:
A PCollection of WriteTransformFn writing a TF transform graph.
- process_data_fn(inputs: Dict[str, tensorflow_transform.common_types.ConsistentTensorType]) Dict[str, tensorflow_transform.common_types.ConsistentTensorType] [source]¶
This method is used in the AnalyzeAndTransformDataset step. It applies the transforms to the inputs in sequential order on the columns provided for a given transform. :param inputs: A dictionary of column names and data.
- Returns:
A dictionary of column names and transformed data.
- expand(raw_data: PCollection[NamedTuple | Row | Dict[str, str | float | int | bytes | ndarray]]) PCollection[Row | Dict[str, ndarray]] [source]¶
This method also computes the required dataset metadata for the tft AnalyzeDataset/TransformDataset step.
This method uses tensorflow_transform’s Analyze step to produce the artifacts and Transform step to apply the transforms on the data. Artifacts are only produced if the artifact_mode is set to produce. If artifact_mode is set to consume, then the artifacts are read from the artifact_location, which was previously used to store the produced artifacts.