apache_beam.transforms.window module
Windowing concepts.
A WindowInto transform logically divides up or groups the elements of a PCollection into finite windows according to a windowing function (derived from WindowFn).
The output of WindowInto contains the same elements as input, but they have been logically assigned to windows. The next GroupByKey(s) transforms, including one within a composite transform, will group by the combination of keys and windows.
Windowing a PCollection allows chunks of it to be processed individually, before the entire PCollection is available. This is especially important for PCollection(s) with unbounded size, since the full PCollection is never available at once, since more data is continually arriving. For PCollection(s) with a bounded size (aka. conventional batch mode), by default, all data is implicitly in a single window (see GlobalWindows), unless WindowInto is applied.
For example, a simple form of windowing divides up the data into fixed-width time intervals, using FixedWindows.
Seconds are used as the time unit for the built-in windowing primitives here. Integer or floating point seconds can be passed to these primitives.
Internally, seconds, with microsecond granularity, are stored as timeutil.Timestamp and timeutil.Duration objects. This is done to avoid precision errors that would occur with floating point representations.
Custom windowing function classes can be created, by subclassing from WindowFn.
- class apache_beam.transforms.window.TimestampCombiner[source]
Bases:
object
Determines how output timestamps of grouping operations are assigned.
- OUTPUT_AT_EOW = 1
- OUTPUT_AT_EARLIEST = 3
- OUTPUT_AT_LATEST = 2
- OUTPUT_AT_EARLIEST_TRANSFORMED = 'OUTPUT_AT_EARLIEST_TRANSFORMED'
- class apache_beam.transforms.window.WindowFn[source]
Bases:
RunnerApiFn
An abstract windowing function defining a basic assign and merge.
- class AssignContext(timestamp: int | float | Timestamp, element: Any | None = None, window: BoundedWindow | None = None)[source]
Bases:
object
Context passed to WindowFn.assign().
- abstract assign(assign_context: AssignContext) Iterable[BoundedWindow] [source]
Associates windows to an element.
- Parameters:
assign_context – Instance of AssignContext.
- Returns:
An iterable of BoundedWindow.
- class MergeContext(windows: Iterable[BoundedWindow])[source]
Bases:
object
Context passed to WindowFn.merge() to perform merging, if any.
- merge(to_be_merged: Iterable[BoundedWindow], merge_result: BoundedWindow) None [source]
- abstract merge(merge_context: MergeContext) None [source]
Returns a window that is the result of merging a set of windows.
- get_transformed_output_time(window: BoundedWindow, input_timestamp: Timestamp) Timestamp [source]
Given input time and output window, returns output time for window.
If TimestampCombiner.OUTPUT_AT_EARLIEST_TRANSFORMED is used in the Windowing, the output timestamp for the given window will be the earliest of the timestamps returned by get_transformed_output_time() for elements of the window.
- Parameters:
window – Output window of element.
input_timestamp – Input timestamp of element as a timeutil.Timestamp object.
- Returns:
Transformed timestamp.
- to_runner_api_parameter(context)
- class apache_beam.transforms.window.BoundedWindow(end: int | float | Timestamp)[source]
Bases:
object
A window for timestamps in range (-infinity, end).
- end
End of window.
- class apache_beam.transforms.window.IntervalWindow[source]
Bases:
_IntervalWindowBase
,BoundedWindow
A window for timestamps in range [start, end).
- start
Start of window as seconds since Unix epoch.
- end
End of window as seconds since Unix epoch.
- intersects(other: IntervalWindow) bool [source]
- union(other: IntervalWindow) IntervalWindow [source]
- class apache_beam.transforms.window.TimestampedValue(value: V, timestamp: int | float | Timestamp)[source]
Bases:
Generic
[V
]A timestamped value having a value and a timestamp.
- value
The underlying value.
- timestamp
Timestamp associated with the value as seconds since Unix epoch.
- class apache_beam.transforms.window.GlobalWindow[source]
Bases:
BoundedWindow
The default window into which all data is placed (via GlobalWindows).
- class apache_beam.transforms.window.NonMergingWindowFn[source]
Bases:
WindowFn
- merge(merge_context: MergeContext) None [source]
- class apache_beam.transforms.window.GlobalWindows[source]
Bases:
NonMergingWindowFn
A windowing function that assigns everything to one global window.
- classmethod windowed_batch(batch: ~typing.Any, timestamp: ~apache_beam.utils.timestamp.Timestamp = Timestamp(-9223372036854.775000), pane_info: ~apache_beam.utils.windowed_value.PaneInfo = PaneInfo(first: True, last: True, timing: UNKNOWN, index: 0, nonspeculative_index: 0)) WindowedBatch [source]
- classmethod windowed_value(value: ~typing.Any, timestamp: ~apache_beam.utils.timestamp.Timestamp = Timestamp(-9223372036854.775000), pane_info: ~apache_beam.utils.windowed_value.PaneInfo = PaneInfo(first: True, last: True, timing: UNKNOWN, index: 0, nonspeculative_index: 0)) WindowedValue [source]
- assign(assign_context: AssignContext) List[GlobalWindow] [source]
- static from_runner_api_parameter(unused_fn_parameter, unused_context) GlobalWindows [source]
- class apache_beam.transforms.window.FixedWindows(size: int | float | Duration, offset: int | float | Timestamp = 0)[source]
Bases:
NonMergingWindowFn
A windowing function that assigns each element to one time interval.
The attributes size and offset determine in what time interval a timestamp will be slotted. The time intervals have the following formula: [N * size + offset, (N + 1) * size + offset)
- size
Size of the window as seconds.
- offset
Offset of this window as seconds. Windows start at t=N * size + offset where t=0 is the UNIX epoch. The offset must be a value in range [0, size). If it is not it will be normalized to this range.
Initialize a
FixedWindows
function for a given size and offset.- Parameters:
- assign(context: AssignContext) List[IntervalWindow] [source]
- static from_runner_api_parameter(fn_parameter, unused_context) FixedWindows [source]
- class apache_beam.transforms.window.SlidingWindows(size: int | float | Duration, period: int | float | Duration, offset: int | float | Timestamp = 0)[source]
Bases:
NonMergingWindowFn
A windowing function that assigns each element to a set of sliding windows.
The attributes size and offset determine in what time interval a timestamp will be slotted. The time intervals have the following formula: [N * period + offset, N * period + offset + size)
- size
Size of the window as seconds.
- period
Period of the windows as seconds.
- offset
Offset of this window as seconds since Unix epoch. Windows start at t=N * period + offset where t=0 is the epoch. The offset must be a value in range [0, period). If it is not it will be normalized to this range.
- assign(context: AssignContext) List[IntervalWindow] [source]
- static from_runner_api_parameter(fn_parameter, unused_context) SlidingWindows [source]
- class apache_beam.transforms.window.Sessions(gap_size: int | float | Duration)[source]
Bases:
WindowFn
A windowing function that groups elements into sessions.
A session is defined as a series of consecutive events separated by a specified gap size.
- gap_size
Size of the gap between windows as floating-point seconds.
- assign(context: AssignContext) List[IntervalWindow] [source]
- merge(merge_context: MergeContext) None [source]