Distinct
Produces a collection containing distinct elements of the input collection.
On some data sets, it might be more efficient to compute an approximate
answer using ApproximateUnique
, which also allows for determining distinct
values for each key.
Examples
Example 1: Find the distinct element from a PCollection
of String
.
static final String[] WORDS_ARRAY = new String[]{
"hi", "hi", "sue",
"sue", "bob"
};
static final List<String> WORDS = Arrays.asList(WORDS_ARRAY);
PCollection<String> input =
pipeline.apply(Create.of(WORDS)).withCoder(StringUtf8Coder.of());
PCollection<String> distinctWords = input.apply(Distinct.create());
Example 2: Find the distinct element from a PCollection
of Integer
.
Related transforms
- Count counts the number of elements within each aggregation.
- ApproximateUnique estimates the number of distinct elements in a collection.
Last updated on 2025/01/20
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!