CombinePerKey
Pydoc |
Combines all elements for each key in a collection.
See more information in the Beam Programming Guide.
Examples
In the following examples, we create a pipeline with a PCollection of produce.
Then, we apply CombinePerKey in multiple ways to combine all the elements in the PCollection.
CombinePerKey accepts a function that takes a list of values as an input, and combines them for each key.
Example 1: Combining with a predefined function
We use the function
sum
which takes an iterable of numbers and adds them together.
Example 2: Combining with a function
We define a function saturated_sum which takes an iterable of numbers and adds them together, up to a predefined maximum number.
Example 3: Combining with a lambda function
We can also use lambda functions to simplify Example 2.
Example 4: Combining with multiple arguments
You can pass functions with multiple arguments to CombinePerKey.
They are passed as additional positional arguments or keyword arguments to the function.
In this example, the lambda function takes values and max_value as arguments.
Example 5: Combining with a CombineFn
The more general way to combine elements, and the most flexible, is with a class that inherits from CombineFn.
CombineFn.create_accumulator(): This creates an empty accumulator. For example, an empty accumulator for a sum would be0, while an empty accumulator for a product (multiplication) would be1.CombineFn.add_input(): Called once per element. Takes an accumulator and an input element, combines them and returns the updated accumulator.CombineFn.merge_accumulators(): Multiple accumulators could be processed in parallel, so this function helps merging them into a single accumulator.CombineFn.extract_output(): It allows to do additional calculations before extracting a result.
Related transforms
You can use the following combiner transforms:
See also GroupBy which allows you to combine more than one field at once.
Pydoc |
Last updated on 2025/10/26
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!


Pydoc