apache_beam.coders.coders module¶
Collection of useful coders.
Only those coders listed in __all__ are part of the public API of this module.
## On usage of pickle, dill and pickler in Beam
In Beam, we generally we use pickle for pipeline elements and dill for more complex types, like user functions.
pickler is Beam’s own wrapping of dill + compression + error handling. It serves also as an API to mask the actual encoding layer (so we can change it from dill if necessary).
We created _MemoizingPickleCoder to improve performance when serializing complex user types for the execution of SDF. Specifically to address BEAM-12781, where many identical BoundedSource instances are being encoded.
- class apache_beam.coders.coders.Coder[source]¶
Bases:
object
Base class for coders.
- is_deterministic() bool [source]¶
Whether this coder is guaranteed to encode values deterministically.
A deterministic coder is required for key coders in GroupByKey operations to produce consistent results.
For example, note that the default coder, the PickleCoder, is not deterministic: the ordering of picked entries in maps may vary across executions since there is no defined order, and such a coder is not in general suitable for usage as a key coder in GroupByKey operations, since each instance of the same key may be encoded differently.
- Returns:
Whether coder is deterministic.
- as_deterministic_coder(step_label, error_message=None)[source]¶
Returns a deterministic version of self, if possible.
Otherwise raises a value error.
- estimate_size(value)[source]¶
Estimates the encoded size of the given value, in bytes.
Dataflow estimates the encoded size of a PCollection processed in a pipeline step by using the estimated size of a random sample of elements in that PCollection.
The default implementation encodes the given value and returns its byte size. If a coder can provide a fast estimate of the encoded size of a value (e.g., if the encoding has a fixed size), it can provide its estimate here to improve performance.
- Parameters:
value – the value whose encoded size is to be estimated.
- Returns:
The estimated encoded size of the given value.
- get_impl()[source]¶
For internal use only; no backwards-compatibility guarantees.
Returns the CoderImpl backing this Coder.
- classmethod register_urn(urn, parameter_type)[source]¶
- classmethod register_urn(urn, parameter_type, fn)
Registers a urn with a constructor.
For example, if ‘beam:fn:foo’ had parameter type FooPayload, one could write RunnerApiFn.register_urn(‘bean:fn:foo’, FooPayload, foo_from_proto) where foo_from_proto took as arguments a FooPayload and a PipelineContext. This function can also be used as a decorator rather than passing the callable in as the final parameter.
A corresponding to_runner_api_parameter method would be expected that returns the tuple (‘beam:fn:foo’, FooPayload)
- to_runner_api(context: PipelineContext) beam_runner_api_pb2.Coder [source]¶
- class apache_beam.coders.coders.AvroGenericCoder(schema)[source]¶
Bases:
FastCoder
A coder used for AvroRecord values.
- class apache_beam.coders.coders.BooleanCoder[source]¶
Bases:
FastCoder
- to_runner_api_parameter(unused_context)¶
- class apache_beam.coders.coders.BytesCoder[source]¶
Bases:
FastCoder
Byte string coder.
- to_runner_api_parameter(unused_context)¶
- class apache_beam.coders.coders.DillCoder[source]¶
Bases:
_PickleCoderBase
Coder using dill’s pickle functionality.
- class apache_beam.coders.coders.FastPrimitivesCoder(fallback_coder: Coder = PickleCoder)[source]¶
Bases:
FastCoder
Encodes simple primitives (e.g. str, int) efficiently.
For unknown types, falls back to another coder (e.g. PickleCoder).
- class apache_beam.coders.coders.FloatCoder[source]¶
Bases:
FastCoder
A coder used for double-precision floating-point values.
Note that the name “FloatCoder” is in reference to Python’s
float
built-in which is generally implemented using C doubles. SeeSinglePrecisionFloatCoder
for a single-precision version of this coder.- to_runner_api_parameter(unused_context)¶
- class apache_beam.coders.coders.IterableCoder(elem_coder: Coder)[source]¶
Bases:
ListLikeCoder
Coder of iterables of homogeneous objects.
- to_runner_api_parameter(unused_context)¶
- class apache_beam.coders.coders.ListCoder(elem_coder: Coder)[source]¶
Bases:
ListLikeCoder
Coder of Python lists.
- class apache_beam.coders.coders.MapCoder(key_coder: Coder, value_coder: Coder)[source]¶
Bases:
FastCoder
- class apache_beam.coders.coders.NullableCoder(value_coder: Coder)[source]¶
Bases:
FastCoder
- to_runner_api_parameter(unused_context)¶
- class apache_beam.coders.coders.PickleCoder[source]¶
Bases:
_PickleCoderBase
Coder using Python’s pickle functionality.
- class apache_beam.coders.coders.ProtoCoder(proto_message_type: Type[Message])[source]¶
Bases:
FastCoder
A Coder for Google Protocol Buffers.
It supports both Protocol Buffers syntax versions 2 and 3. However, the runtime version of the python protobuf library must exactly match the version of the protoc compiler what was used to generate the protobuf messages.
ProtoCoder is registered in the global CoderRegistry as the default coder for any protobuf Message object.
- class apache_beam.coders.coders.ProtoPlusCoder(proto_plus_message_type: Type[Message])[source]¶
Bases:
FastCoder
A Coder for Google Protocol Buffers wrapped using the proto-plus library.
ProtoPlusCoder is registered in the global CoderRegistry as the default coder for any proto.Message object.
- class apache_beam.coders.coders.ShardedKeyCoder(key_coder: Coder)[source]¶
Bases:
FastCoder
A coder for sharded key.
- to_runner_api_parameter(unused_context)¶
- class apache_beam.coders.coders.SinglePrecisionFloatCoder[source]¶
Bases:
FastCoder
A coder used for single-precision floating-point values.
- class apache_beam.coders.coders.SingletonCoder(value)[source]¶
Bases:
FastCoder
A coder that always encodes exactly one value.
- class apache_beam.coders.coders.StrUtf8Coder[source]¶
Bases:
Coder
A coder used for reading and writing strings as UTF-8.
- to_runner_api_parameter(unused_context)¶
- class apache_beam.coders.coders.TimestampCoder[source]¶
Bases:
FastCoder
A coder used for timeutil.Timestamp values.
- class apache_beam.coders.coders.TupleCoder(components: Iterable[Coder])[source]¶
Bases:
FastCoder
Coder of tuple objects.
- classmethod from_type_hint(typehint: typehints.TupleConstraint, registry: CoderRegistry) TupleCoder [source]¶
- class apache_beam.coders.coders.TupleSequenceCoder(elem_coder: Coder)[source]¶
Bases:
FastCoder
Coder of homogeneous tuple objects.
- classmethod from_type_hint(typehint: Any, registry: CoderRegistry) TupleSequenceCoder [source]¶
- class apache_beam.coders.coders.VarIntCoder[source]¶
Bases:
FastCoder
Variable-length integer coder.
- to_runner_api_parameter(unused_context)¶
- class apache_beam.coders.coders.WindowedValueCoder(wrapped_value_coder: Coder, window_coder: Coder | None = None)[source]¶
Bases:
FastCoder
Coder for windowed values.
- to_runner_api_parameter(unused_context)¶
- class apache_beam.coders.coders.ParamWindowedValueCoder(payload, components)[source]¶
Bases:
WindowedValueCoder
A coder used for parameterized windowed values.