apache_beam.dataframe.schemas module¶
Utilities for relating schema-aware PCollections and DataFrame transforms.
The utilities here enforce the type mapping defined in
apache_beam.typehints.pandas_type_compatibility
.
- class apache_beam.dataframe.schemas.BatchRowsAsDataFrame(*args, proxy=None, **kwargs)[source]¶
Bases:
PTransform
A transform that batches schema-aware PCollection elements into DataFrames
Batching parameters are inherited from
BatchElements
.
- apache_beam.dataframe.schemas.generate_proxy(element_type: type) DataFrame [source]¶
Generate a proxy pandas object for the given PCollection element_type.
Currently only supports generating a DataFrame proxy from a schema-aware PCollection or a Series proxy from a primitively typed PCollection.
- class apache_beam.dataframe.schemas.UnbatchPandas(proxy, include_indexes=False)[source]¶
Bases:
PTransform
A transform that explodes a PCollection of DataFrame or Series. DataFrame is converterd to a schema-aware PCollection, while Series is converted to its underlying type.
- Parameters:
include_indexes – (optional, default: False) When unbatching a DataFrame if include_indexes=True, attempt to include index columns in the output schema for expanded DataFrames. Raises an error if any of the index levels are unnamed (name=None), or if any of the names are not unique among all column and index names.
- apache_beam.dataframe.schemas.element_type_from_dataframe(proxy: DataFrame, include_indexes: bool = False) type [source]¶
Generate an element_type for an element-wise PCollection from a proxy pandas object. Currently only supports converting the element_type for a schema-aware PCollection to a proxy DataFrame.
Currently only supports generating a DataFrame proxy from a schema-aware PCollection.