apache_beam.typehints.schemas module¶
Support for mapping python types to proto Schemas and back again.
Imposes a mapping between common Python types and Beam portable schemas (https://s.apache.org/beam-schemas):
Python Schema
np.int8 <-----> BYTE
np.int16 <-----> INT16
np.int32 <-----> INT32
np.int64 <-----> INT64
int ------> INT64
np.float32 <-----> FLOAT
np.float64 <-----> DOUBLE
float ------> DOUBLE
bool <-----> BOOLEAN
str <-----> STRING
bytes <-----> BYTES
ByteString ------> BYTES
Timestamp <-----> LogicalType(urn="beam:logical_type:micros_instant:v1")
Decimal <-----> LogicalType(urn="beam:logical_type:fixed_decimal:v1")
Mapping <-----> MapType
Sequence <-----> ArrayType
NamedTuple <-----> RowType
beam.Row ------> RowType
One direction mapping of Python types from Beam portable schemas:
- bytes
<—— LogicalType(urn=”beam:logical_type:fixed_bytes:v1”) <—— LogicalType(urn=”beam:logical_type:var_bytes:v1”)
- str
<—— LogicalType(urn=”beam:logical_type:fixed_char:v1”) <—— LogicalType(urn=”beam:logical_type:var_char:v1”)
- Timestamp
<—— LogicalType(urn=”beam:logical_type:millis_instant:v1”)
Note that some of these mappings are provided as conveniences,
but they are lossy and will not survive a roundtrip from python to Beam schemas
and back. For example, the Python type int
will map to INT64
in
Beam schemas but converting that back to a Python type will yield
np.int64
.
nullable=True
on a Beam FieldType
is represented in Python by
wrapping the type in Optional
.
This module is intended for internal use only. Nothing defined here provides any backwards-compatibility guarantee.
- apache_beam.typehints.schemas.named_fields_to_schema(names_and_types: ~typing.Dict[str, type] | ~typing.Sequence[~typing.Tuple[str, type]], schema_id: str | None = None, schema_options: ~typing.Sequence[~typing.Tuple[str, ~typing.Any]] | None = None, field_options: ~typing.Dict[str, ~typing.Sequence[~typing.Tuple[str, ~typing.Any]]] | None = None, schema_registry: ~apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>)[source]¶
- apache_beam.typehints.schemas.typing_to_runner_api(type_: type, schema_registry: ~apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) FieldType [source]¶
- apache_beam.typehints.schemas.typing_from_runner_api(fieldtype_proto: ~org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, schema_registry: ~apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) type [source]¶
- apache_beam.typehints.schemas.value_to_runner_api(type_proto: ~org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, value, schema_registry: ~apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) FieldValue [source]¶
- apache_beam.typehints.schemas.value_from_runner_api(type_proto: ~org.apache.beam.model.pipeline.v1.schema_pb2.FieldType, value_proto: ~org.apache.beam.model.pipeline.v1.schema_pb2.FieldValue, schema_registry: ~apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) FieldValue [source]¶
- apache_beam.typehints.schemas.option_to_runner_api(option: ~typing.Tuple[str, ~typing.Any], schema_registry: ~apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) Option [source]¶
- apache_beam.typehints.schemas.option_from_runner_api(option_proto: ~org.apache.beam.model.pipeline.v1.schema_pb2.Option, schema_registry: ~apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) Tuple[str, Any] [source]¶
- apache_beam.typehints.schemas.schema_field(name: str, field_type: FieldType | type, description: str | None = None) Field [source]¶
- class apache_beam.typehints.schemas.SchemaTranslation(schema_registry: ~apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>)[source]¶
Bases:
object
- atomic_value_from_runner_api(atomic_type: <google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper object at 0x7fb71b1df700>, atomic_value: ~org.apache.beam.model.pipeline.v1.schema_pb2.AtomicTypeValue)[source]¶
- apache_beam.typehints.schemas.named_tuple_from_schema(schema, schema_registry: ~apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) type [source]¶
- apache_beam.typehints.schemas.named_tuple_to_schema(named_tuple, schema_registry: ~apache_beam.typehints.schema_registry.SchemaTypeRegistry = <apache_beam.typehints.schema_registry.SchemaTypeRegistry object>) Schema [source]¶
- apache_beam.typehints.schemas.schema_from_element_type(element_type: type) Schema [source]¶
Get a schema for the given PCollection element_type.
Returns schema as a list of (name, python_type) tuples
- apache_beam.typehints.schemas.named_fields_from_element_type(element_type: type) List[Tuple[str, type]] [source]¶
- apache_beam.typehints.schemas.union_schema_type(element_types)[source]¶
Returns a schema whose fields are the union of each corresponding field.
element_types must be a set of schema-aware types whose fields have the same naming and ordering.
- class apache_beam.typehints.schemas.LogicalType[source]¶
Bases:
Generic
[LanguageT
,RepresentationT
,ArgT
]- classmethod language_type() type [source]¶
Return the language type this LogicalType encodes.
The returned type should match LanguageT
- classmethod representation_type() type [source]¶
Return the type of the representation this LogicalType uses to encode the language type.
The returned type should match RepresentationT
- classmethod argument_type() type [source]¶
Return the type of the argument used for variations of this LogicalType.
The returned type should match ArgT
- to_representation_type(value: LanguageT) RepresentationT [source]¶
Convert an instance of LanguageT to RepresentationT.
- to_language_type(value: RepresentationT) LanguageT [source]¶
Convert an instance of RepresentationT to LanguageT.
- classmethod register_logical_type(logical_type_cls)[source]¶
Register an implementation of LogicalType.
- classmethod from_typing(typ: type) LogicalType [source]¶
Construct an instance of a registered LogicalType implementation given a typing.
Raises ValueError if no registered LogicalType implementation can encode the given typing.
- classmethod from_runner_api(logical_type_proto: LogicalType) LogicalType [source]¶
Construct an instance of a registered LogicalType implementation given a proto LogicalType.
Raises ValueError if no LogicalType registered for the given URN.
- class apache_beam.typehints.schemas.NoArgumentLogicalType[source]¶
Bases:
LogicalType
[LanguageT
,RepresentationT
,None
]
- class apache_beam.typehints.schemas.PassThroughLogicalType[source]¶
Bases:
LogicalType
[LanguageT
,LanguageT
,ArgT
]A base class for LogicalTypes that use the same type as the underlying representation type.
- class apache_beam.typehints.schemas.MicrosInstantRepresentation(seconds, micros)¶
Bases:
tuple
Create new instance of MicrosInstantRepresentation(seconds, micros)
- micros: int64¶
Alias for field number 1
- seconds: int64¶
Alias for field number 0
- class apache_beam.typehints.schemas.MillisInstant[source]¶
Bases:
NoArgumentLogicalType
[Timestamp
,int64
]Millisecond-precision instant logical type handles values consistent with that encoded by
InstantCoder
in the Java SDK.This class handles
apache_beam.utils.timestamp.Timestamp
language type asMicrosInstant
, but it only provides millisecond precision, because it is aimed to handle data encoded by Java sdk’s InstantCoder which has same precision level.Timestamp is handled by MicrosInstant by default. In some scenario, such as read from cross-language transform with rows containing InstantCoder encoded timestamps, one may need to override the mapping of Timetamp to MillisInstant. To do this, re-register this class with
register_logical_type()
.
- class apache_beam.typehints.schemas.MicrosInstant[source]¶
Bases:
NoArgumentLogicalType
[Timestamp
,MicrosInstantRepresentation
]Microsecond-precision instant logical type that handles
Timestamp
.- to_representation_type(value: Timestamp) MicrosInstantRepresentation [source]¶
- to_language_type(value: MicrosInstantRepresentation) Timestamp [source]¶
- class apache_beam.typehints.schemas.PythonCallable[source]¶
Bases:
NoArgumentLogicalType
[PythonCallableWithSource
,str
]A logical type for PythonCallableSource objects.
- to_representation_type(value: PythonCallableWithSource) str [source]¶
- to_language_type(value: str) PythonCallableWithSource [source]¶
- class apache_beam.typehints.schemas.FixedPrecisionDecimalArgumentRepresentation(precision, scale)¶
Bases:
tuple
Create new instance of FixedPrecisionDecimalArgumentRepresentation(precision, scale)
- precision: int32¶
Alias for field number 0
- scale: int32¶
Alias for field number 1
- class apache_beam.typehints.schemas.DecimalLogicalType[source]¶
Bases:
NoArgumentLogicalType
[Decimal
,bytes
]A logical type for decimal objects handling values consistent with that encoded by
BigDecimalCoder
in the Java SDK.
- class apache_beam.typehints.schemas.FixedPrecisionDecimalLogicalType(precision=-1, scale=0)[source]¶
Bases:
LogicalType
[Decimal
,DecimalLogicalType
,FixedPrecisionDecimalArgumentRepresentation
]A wrapper of DecimalLogicalType that contains the precision value.
- class apache_beam.typehints.schemas.FixedBytes(length: int32)[source]¶
Bases:
PassThroughLogicalType
[bytes
,int32
]A logical type for fixed-length bytes.
- class apache_beam.typehints.schemas.VariableBytes(max_length: int32 = 2147483647)[source]¶
Bases:
PassThroughLogicalType
[bytes
,int32
]A logical type for variable-length bytes with specified maximum length.
- class apache_beam.typehints.schemas.FixedString(length: int32)[source]¶
Bases:
PassThroughLogicalType
[str
,int32
]A logical type for fixed-length string.
- class apache_beam.typehints.schemas.VariableString(max_length: int32 = 2147483647)[source]¶
Bases:
PassThroughLogicalType
[str
,int32
]A logical type for variable-length string with specified maximum length.