ParDo

Pydoc Pydoc




A transform for generic parallel processing. A ParDo transform considers each element in the input PCollection, performs some processing function (your user code) on that element, and emits zero or more elements to an output PCollection.

See more information in the Beam Programming Guide.

Examples

In the following examples, we explore how to create custom DoFns and access the timestamp and windowing information.

Example 1: ParDo with a simple DoFn

The following example defines a simple DoFn class called SplitWords which stores the delimiter as an object field. The process method is called once per element, and it can yield zero or more output elements.

Example 2: ParDo with timestamp and window information

In this example, we add new parameters to the process method to bind parameter values at runtime.

Example 3: ParDo with DoFn methods

A DoFn can be customized with a number of methods that can help create more complex behaviors. You can customize what a worker does when it starts and shuts down with setup and teardown. You can also customize what to do when a bundle of elements starts and finishes with start_bundle and finish_bundle.

Known issues:

Pydoc Pydoc