back to collapsed details

Additional common features not yet part of the Beam model

Drain
Checkpoint
Key-ordered delivery
Google Cloud DataflowApache FlinkApache Spark (RDD/DStream based)Apache Spark Structured Streaming (Dataset based)Apache SamzaApache NemoHazelcast JetTwister2Python Direct FnRunnerGo Direct Runner

Partially :


Dataflow has a native drain operation, but it does not work in the presence of event time timer loops. Final implemention pending model support.

Partially :


Flink supports taking a "savepoint" of the pipeline and shutting the pipeline down after its completion.

:


:


:


:


:


:


No :


Partially :


Flink has a native savepoint capability.

Partially :


Spark has a native savepoint capability.

No :


not implemented

Partially :


Samza has a native checkpoint capability.

:


:


:


Partially :


Dataflow performs different shuffling algorithms for batch and streaming. Dataflow guarantees key-ordered delivery in streaming, though not in batch.

Partially :


Flink may perform different shuffling algorithms for batch and streaming. Flink guarantees key-ordered delivery in streaming, though not in batch.

Unverified :


Unverified :


Unverified :


Partially :


Samza may perform different shuffling algorithms for batch and streaming. Samza guarantees key-ordered delivery in streaming, though not in batch.

Unverified :


Unverified :


Unverified :


Last updated on 2024/03/28

Have you found everything you were looking for?

Was it all useful and clear? Is there anything that you would like to change? Let us know!