apache_beam.dataframe.partitionings module
- class apache_beam.dataframe.partitionings.Partitioning[source]
Bases:
object
A class representing a (consistent) partitioning of dataframe objects.
- is_subpartitioning_of(other: Partitioning) bool [source]
Returns whether self is a sub-partition of other.
Specifically, returns whether something partitioned by self is necissarily also partitioned by other.
- class apache_beam.dataframe.partitionings.Index(levels=None)[source]
Bases:
Partitioning
A partitioning by index (either fully or partially).
If the set of “levels” of the index to consider is not specified, the entire index is used.
These form a partial order, given by
Singleton() < Index([i]) < Index([i, j]) < … < Index() < Arbitrary()
The ordering is implemented via the is_subpartitioning_of method, where the examples on the right are subpartitionings of the examples on the left above.
- class apache_beam.dataframe.partitionings.Singleton(reason=None)[source]
Bases:
Partitioning
A partitioning of all the data into a single partition.
- property reason
- class apache_beam.dataframe.partitionings.JoinIndex(ancestor=None)[source]
Bases:
Partitioning
A partitioning that lets two frames be joined. This can either be a hash partitioning on the full index, or a common ancestor with no intervening re-indexing/re-partitioning.
It fits into the partial ordering as
Index() < JoinIndex(x) < JoinIndex() < Arbitrary()
with
JoinIndex(x) and JoinIndex(y)
being incomparable for nontrivial x != y.
Expressions desiring to make use of this index should simply declare a requirement of JoinIndex().