Apache Beam Java SDK quickstart
This quickstart shows you how to run an example pipeline written with the Apache Beam Java SDK, using the Direct Runner. The Direct Runner executes pipelines locally on your machine.
If you’re interested in contributing to the Apache Beam Java codebase, see the Contribution Guide.
On this page:
Set up your development environment
Use sdkman
to install the Java Development Kit (JDK).
You can use either Gradle or Apache Maven to run this quickstart:
Clone the GitHub repository
Clone or download the
apache/beam-starter-java GitHub
repository and change into the beam-starter-java
directory.
Run the quickstart
Gradle: To run the quickstart with Gradle, run the following command:
Maven: To run the quickstart with Maven, run the following command:
The output is similar to the following:
The lines might appear in a different order.
Explore the code
The main code file for this quickstart is App.java (GitHub). The code performs the following steps:
- Create a Beam pipeline.
- Create an initial
PCollection
. - Apply a transform to the
PCollection
. - Run the pipeline, using the Direct Runner.
Create a pipeline
The code first creates a Pipeline
object. The Pipeline
object builds up the
graph of transformations to be executed.
var options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
var pipeline = Pipeline.create(options);
The PipelineOptions
object lets you set various options for the pipeline. The
fromArgs
method shown in this example parses command-line arguments, which
lets you set pipeline options through the command line.
Create an initial PCollection
The PCollection
abstraction represents a potentially distributed,
multi-element data set. A Beam pipeline needs a source of data to populate an
initial PCollection
. The source can be bounded (with a known, fixed size) or
unbounded (with unlimited size).
This example uses the
Create.of
method to create a PCollection
from an in-memory array of strings. The
resulting PCollection
contains the strings “Hello”, “World!”, and a
user-provided input string.
return pipeline
.apply("Create elements", Create.of(Arrays.asList("Hello", "World!", inputText)))
Apply a transform to the PCollection
Transforms can change, filter, group, analyze, or otherwise process the
elements in a PCollection
. This example uses the
MapElements
transform, which maps the elements of a collection into a new collection:
.apply("Print elements",
MapElements.into(TypeDescriptors.strings()).via(x -> {
System.out.println(x);
return x;
}));
where
into
specifies the data type for the elements in the output collection.via
defines a mapping function that is called on each element of the input collection to create the output collection.
In this example, the mapping function is a lambda that just returns the
original value. It also prints the value to System.out
as a side effect.
Run the pipeline
The code shown in the previous sections defines a pipeline, but does not process any data yet. To process data, you run the pipeline:
pipeline.run().waitUntilFinish();
A Beam runner runs a Beam pipeline on a specific platform. This example uses the Direct Runner, which is the default runner if you don’t specify one. The Direct Runner runs the pipeline locally on your machine. It is meant for testing and development, rather than being optimized for efficiency. For more information, see Using the Direct Runner.
For production workloads, you typically use a distributed runner that runs the pipeline on a big data processing system such as Apache Flink, Apache Spark, or Google Cloud Dataflow. These systems support massively parallel processing.
Next Steps
- Learn more about the Beam SDK for Java and look through the Java SDK API reference.
- Take a self-paced tour through our Learning Resources.
- Dive in to some of our favorite Videos and Podcasts.
- Join the Beam users@ mailing list.
Please don’t hesitate to reach out if you encounter any issues!
Last updated on 2024/11/14
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!