apache_beam.dataframe.partitionings module

class apache_beam.dataframe.partitionings.Partitioning[source]

Bases: object

A class representing a (consistent) partitioning of dataframe objects.

is_subpartitioning_of(other: Partitioning) bool[source]

Returns whether self is a sub-partition of other.

Specifically, returns whether something partitioned by self is necissarily also partitioned by other.

partition_fn(df: Frame, num_partitions: int) Iterable[Tuple[Any, Frame]][source]

A callable that actually performs the partitioning of a Frame df.

This will be invoked via a FlatMap in conjunction with a GroupKey to achieve the desired partitioning.

test_partition_fn(df)[source]
class apache_beam.dataframe.partitionings.Index(levels=None)[source]

Bases: Partitioning

A partitioning by index (either fully or partially).

If the set of “levels” of the index to consider is not specified, the entire index is used.

These form a partial order, given by

Singleton() < Index([i]) < Index([i, j]) < … < Index() < Arbitrary()

The ordering is implemented via the is_subpartitioning_of method, where the examples on the right are subpartitionings of the examples on the left above.

is_subpartitioning_of(other)[source]
partition_fn(df, num_partitions)[source]
check(dfs)[source]
class apache_beam.dataframe.partitionings.Singleton(reason=None)[source]

Bases: Partitioning

A partitioning of all the data into a single partition.

property reason
is_subpartitioning_of(other)[source]
partition_fn(df, num_partitions)[source]
check(dfs)[source]
class apache_beam.dataframe.partitionings.JoinIndex(ancestor=None)[source]

Bases: Partitioning

A partitioning that lets two frames be joined. This can either be a hash partitioning on the full index, or a common ancestor with no intervening re-indexing/re-partitioning.

It fits into the partial ordering as

Index() < JoinIndex(x) < JoinIndex() < Arbitrary()

with

JoinIndex(x) and JoinIndex(y)

being incomparable for nontrivial x != y.

Expressions desiring to make use of this index should simply declare a requirement of JoinIndex().

is_subpartitioning_of(other)[source]
test_partition_fn(df)[source]
check(dfs)[source]
class apache_beam.dataframe.partitionings.Arbitrary[source]

Bases: Partitioning

A partitioning imposing no constraints on the actual partitioning.

is_subpartitioning_of(other)[source]
test_partition_fn(df)[source]
check(dfs)[source]