apache_beam.transforms.managed module

Managed Transforms.

This module builds and instantiates turnkey transforms that can be managed by the underlying runner. This means the runner can upgrade the transform to a more optimal/updated version without requiring the user to do anything. It may also replace the transform with something entirely different if it chooses to. By default, however, the specified transform will remain unchanged.

Using Managed Transforms

Managed turnkey transforms have a defined configuration and can be built using an inline dict like so:

results = p | beam.managed.Read(
                  beam.managed.ICEBERG,
                  config={"table": "foo",
                          "catalog_name": "bar",
                          "catalog_properties": {
                              "warehouse": "path/to/warehouse",
                              "catalog-impl": "org.apache.my.CatalogImpl"}})

A YAML configuration file can also be used to build a Managed transform. Say we have the following config.yaml file:

topic: "foo"
bootstrap_servers: "localhost:1234"
format: "AVRO"

Simply provide the location to the file like so:

input_rows = p | beam.Create(...)
input_rows | beam.managed.Write(
                  beam.managed.KAFKA,
                  config_url="path/to/config.yaml")

Available transforms

Available transforms are:

  • Kafka Read and Write

  • Iceberg Read and Write

Note: inputs and outputs need to be PCollection(s) of Beam apache_beam.pvalue.Row elements.

Note: Today, all managed transforms are essentially cross-language transforms, and Java’s ManagedSchemaTransform is used under the hood.

class apache_beam.transforms.managed.Read(source: str, config: Dict[str, Any] | None = None, config_url: str | None = None, expansion_service=None)[source]

Bases: PTransform

Read using Managed Transforms

expand(input)[source]
default_label() str[source]
class apache_beam.transforms.managed.Write(sink: str, config: Dict[str, Any] | None = None, config_url: str | None = None, expansion_service=None)[source]

Bases: PTransform

Write using Managed Transforms

expand(input)[source]
default_label() str[source]