apache_beam.transforms.enrichment module
- class apache_beam.transforms.enrichment.EnrichmentSourceHandler[source]
Bases:
Caller
[InputT
,OutputT
]Wrapper class for apache_beam.io.requestresponse.Caller.
Ensure that the implementation of
__call__
method returns a tuple of beam.Row objects.- get_cache_key(request: InputT) str [source]
Returns the request to be cached. This is how the response will be looked up in the cache as well.
Implement this method to provide the key for the cache. By default, the entire request is stored as the cache key.
For example, in BigTableEnrichmentHandler, the row key for the element is returned here.
- class apache_beam.transforms.enrichment.Enrichment(source_handler: ~apache_beam.transforms.enrichment.EnrichmentSourceHandler, join_fn: ~typing.Callable[[~typing.Dict[str, ~typing.Any], ~typing.Dict[str, ~typing.Any]], ~apache_beam.pvalue.Row] = <function cross_join>, timeout: float | None = 30, repeater: ~apache_beam.io.requestresponse.Repeater = <apache_beam.io.requestresponse.ExponentialBackOffRepeater object>, throttler: ~apache_beam.io.requestresponse.PreCallThrottler = <apache_beam.io.requestresponse.DefaultThrottler object>)[source]
Bases:
PTransform
[PCollection
[InputT
],PCollection
[OutputT
]]A
apache_beam.transforms.enrichment.Enrichment
transform to enrich elements in a PCollection.Uses the
apache_beam.transforms.enrichment.EnrichmentSourceHandler
to enrich elements by joining the metadata from external source.Processes an input
PCollection
of beam.Row by applying aapache_beam.transforms.enrichment.EnrichmentSourceHandler
to each element and returning the enrichedPCollection
.- Parameters:
source_handler – Handles source lookup and metadata retrieval. Implements the
apache_beam.transforms.enrichment.EnrichmentSourceHandler
join_fn – A lambda function to join original element with lookup metadata. Defaults to CROSS_JOIN.
timeout – (Optional) timeout for source requests. Defaults to 30 seconds.
repeater – provides method to repeat failed requests to API due to service errors. Defaults to
apache_beam.io.requestresponse.ExponentialBackOffRepeater
to repeat requests with exponential backoff.throttler – provides methods to pre-throttle a request. Defaults to
apache_beam.io.requestresponse.DefaultThrottler
for client-side adaptive throttling usingapache_beam.io.components.adaptive_throttler.AdaptiveThrottler
.
- expand(input_row: PCollection[InputT]) PCollection[OutputT] [source]
- with_redis_cache(host: str, port: int, time_to_live: int | timedelta = 86400, *, request_coder: Coder | None = None, response_coder: Coder | None = None, **kwargs)[source]
Configure the Redis cache to use with enrichment transform.
- Parameters:
host (str) – The hostname or IP address of the Redis server.
port (int) – The port number of the Redis server.
time_to_live – (Union[int, timedelta]) The time-to-live (TTL) for records stored in Redis. Provide an integer (in seconds) or a datetime.timedelta object.
request_coder – (Optional[coders.Coder]) coder for requests stored in Redis.
response_coder – (Optional[coders.Coder]) coder for decoding responses received from Redis.
kwargs – Optional additional keyword arguments that are required to connect to your redis server. Same as redis.Redis().