apache_beam.transforms.managed module
Managed Transforms.
This module builds and instantiates turnkey transforms that can be managed by the underlying runner. This means the runner can upgrade the transform to a more optimal/updated version without requiring the user to do anything. It may also replace the transform with something entirely different if it chooses to. By default, however, the specified transform will remain unchanged.
Using Managed Transforms
Managed turnkey transforms have a defined configuration and can be built using
an inline dict
like so:
results = p | beam.managed.Read(
beam.managed.ICEBERG,
config={"table": "foo",
"catalog_name": "bar",
"catalog_properties": {
"warehouse": "path/to/warehouse",
"catalog-impl": "org.apache.my.CatalogImpl"}})
A YAML configuration file can also be used to build a Managed transform. Say we have the following config.yaml file:
topic: "foo"
bootstrap_servers: "localhost:1234"
format: "AVRO"
Simply provide the location to the file like so:
input_rows = p | beam.Create(...)
input_rows | beam.managed.Write(
beam.managed.KAFKA,
config_url="path/to/config.yaml")
Available transforms
Available transforms are:
Kafka Read and Write
Iceberg Read and Write
Note: inputs and outputs need to be PCollection(s) of Beam
apache_beam.pvalue.Row
elements.
Note: Today, all managed transforms are essentially cross-language transforms, and Java’s ManagedSchemaTransform is used under the hood.