apache_beam.transforms.combiners module

A library of basic combiner PTransform subclasses.

class apache_beam.transforms.combiners.Count[source]

Bases: object

Combiners for counting elements.

class Globally(has_defaults=True)[source]

Bases: CombinerWithoutDefaults

combiners.Count.Globally counts the total number of elements.

expand(pcoll)[source]
class PerKey(label: str | None = None)[source]

Bases: PTransform

combiners.Count.PerKey counts how many elements each unique key has.

expand(pcoll)[source]
class PerElement(label: str | None = None)[source]

Bases: PTransform

combiners.Count.PerElement counts how many times each element occurs.

expand(pcoll)[source]
class apache_beam.transforms.combiners.Mean[source]

Bases: object

Combiners for computing arithmetic means of elements.

class Globally(has_defaults=True)[source]

Bases: CombinerWithoutDefaults

combiners.Mean.Globally computes the arithmetic mean of the elements.

expand(pcoll)[source]
class PerKey(label: str | None = None)[source]

Bases: PTransform

combiners.Mean.PerKey finds the means of the values for each key.

expand(pcoll)[source]
class apache_beam.transforms.combiners.Sample[source]

Bases: object

Combiners for sampling n elements without replacement.

class FixedSizeGlobally(n)[source]

Bases: CombinerWithoutDefaults

Sample n elements from the input PCollection without replacement.

expand(pcoll)[source]
display_data()[source]
default_label()[source]
class FixedSizePerKey(n)[source]

Bases: PTransform

Sample n elements associated with each key without replacement.

expand(pcoll)[source]
display_data()[source]
default_label()[source]
class apache_beam.transforms.combiners.Top[source]

Bases: object

Combiners for obtaining extremal elements.

class Of(n, key=None, reverse=False)[source]

Bases: CombinerWithoutDefaults

Returns the n greatest elements in the PCollection.

This transform will retrieve the n greatest elements in the PCollection to which it is applied, where “greatest” is determined by a function supplied as the key or reverse arguments.

Creates a global Top operation.

The arguments ‘key’ and ‘reverse’ may be passed as keyword arguments, and have the same meaning as for Python’s sort functions.

Parameters:
  • n – number of elements to extract from pcoll.

  • key – (optional) a mapping of elements to a comparable key, similar to the key argument of Python’s sorting methods.

  • reverse – (optional) whether to order things smallest to largest, rather than largest to smallest

default_label()[source]
expand(pcoll)[source]
class PerKey(n, key=None, reverse=False)[source]

Bases: PTransform

Identifies the N greatest elements associated with each key.

This transform will produce a PCollection mapping unique keys in the input PCollection to the n greatest elements with which they are associated, where “greatest” is determined by a function supplied as the key or reverse arguments.

Creates a per-key Top operation.

The arguments ‘key’ and ‘reverse’ may be passed as keyword arguments, and have the same meaning as for Python’s sort functions.

Parameters:
  • n – number of elements to extract from pcoll.

  • key – (optional) a mapping of elements to a comparable key, similar to the key argument of Python’s sorting methods.

  • reverse – (optional) whether to order things smallest to largest, rather than largest to smallest

default_label()[source]
expand(pcoll)[source]

Expands the transform.

Raises TypeCheckError: If the output type of the input PCollection is not compatible with Tuple[A, B].

Parameters:

pcoll – PCollection to process

Returns:

the PCollection containing the result.

static Largest(n, has_defaults=True, key=None)[source]

Obtain a list of the greatest N elements in a PCollection.

static Smallest(n, has_defaults=True, key=None)[source]

Obtain a list of the least N elements in a PCollection.

static LargestPerKey(n, key=None)[source]

Identifies the N greatest elements associated with each key.

static SmallestPerKey(n, *, key=None)[source]

Identifies the N least elements associated with each key.

class apache_beam.transforms.combiners.ToDict(has_defaults=True)[source]

Bases: CombinerWithoutDefaults

A global CombineFn that condenses a PCollection into a single dict.

PCollections should consist of 2-tuples, notionally (key, value) pairs. If multiple values are associated with the same key, only one of the values will be present in the resulting dict.

expand(pcoll)[source]
class apache_beam.transforms.combiners.ToList(has_defaults=True)[source]

Bases: CombinerWithoutDefaults

A global CombineFn that condenses a PCollection into a single list.

expand(pcoll)[source]
class apache_beam.transforms.combiners.ToSet(has_defaults=True)[source]

Bases: CombinerWithoutDefaults

A global CombineFn that condenses a PCollection into a set.

expand(pcoll)[source]
class apache_beam.transforms.combiners.Latest[source]

Bases: object

Combiners for computing the latest element

class Globally(has_defaults=True)[source]

Bases: CombinerWithoutDefaults

Compute the element with the latest timestamp from a PCollection.

static add_timestamp(element, timestamp=TimestampParam)[source]
expand(pcoll)[source]
class PerKey(label: str | None = None)[source]

Bases: PTransform

Compute elements with the latest timestamp for each key from a keyed PCollection

static add_timestamp(element, timestamp=TimestampParam)[source]
expand(pcoll)[source]
class apache_beam.transforms.combiners.CountCombineFn(*unused_args, **unused_kwargs)[source]

Bases: CombineFn

CombineFn for computing PCollection size.

create_accumulator()[source]
add_input(accumulator, element)[source]
add_inputs(accumulator, elements)[source]
merge_accumulators(accumulators)[source]
extract_output(accumulator)[source]
class apache_beam.transforms.combiners.MeanCombineFn(*unused_args, **unused_kwargs)[source]

Bases: CombineFn

CombineFn for computing an arithmetic mean.

create_accumulator()[source]
add_input(sum_count, element)[source]
merge_accumulators(accumulators)[source]
extract_output(sum_count)[source]
for_input_type(input_type)[source]
class apache_beam.transforms.combiners.SampleCombineFn(n)[source]

Bases: CombineFn

CombineFn for all Sample transforms.

setup()[source]
create_accumulator()[source]
add_input(heap, element)[source]
merge_accumulators(heaps)[source]
compact(heap)[source]
extract_output(heap)[source]
teardown()[source]
class apache_beam.transforms.combiners.TopCombineFn(n, key=None, reverse=False)[source]

Bases: CombineFn

CombineFn doing the combining for all of the Top transforms.

This CombineFn uses a key or reverse operator to rank the elements.

Parameters:
  • key – (optional) a mapping of elements to a comparable key, similar to the key argument of Python’s sorting methods.

  • reverse – (optional) whether to order things smallest to largest, rather than largest to smallest

display_data()[source]
create_accumulator(*args, **kwargs)[source]
add_input(accumulator, element, *args, **kwargs)[source]
merge_accumulators(accumulators, *args, **kwargs)[source]
compact(accumulator, *args, **kwargs)[source]
extract_output(accumulator, *args, **kwargs)[source]
class apache_beam.transforms.combiners.ToDictCombineFn(*unused_args, **unused_kwargs)[source]

Bases: CombineFn

CombineFn for to_dict.

create_accumulator()[source]
add_input(accumulator, element)[source]
merge_accumulators(accumulators)[source]
extract_output(accumulator)[source]
class apache_beam.transforms.combiners.ToListCombineFn(*unused_args, **unused_kwargs)[source]

Bases: CombineFn

CombineFn for to_list.

create_accumulator()[source]
add_input(accumulator, element)[source]
merge_accumulators(accumulators)[source]
extract_output(accumulator)[source]
class apache_beam.transforms.combiners.ToSetCombineFn(*unused_args, **unused_kwargs)[source]

Bases: CombineFn

CombineFn for ToSet.

create_accumulator()[source]
add_input(accumulator, element)[source]
merge_accumulators(accumulators)[source]
extract_output(accumulator)[source]
class apache_beam.transforms.combiners.LatestCombineFn(*unused_args, **unused_kwargs)[source]

Bases: CombineFn

CombineFn to get the element with the latest timestamp from a PCollection.

create_accumulator()[source]
add_input(accumulator, element)[source]
merge_accumulators(accumulators)[source]
extract_output(accumulator)[source]