Distinct

Javadoc Javadoc


Produces a collection containing distinct elements of the input collection.

On some data sets, it might be more efficient to compute an approximate answer using ApproximateUnique, which also allows for determining distinct values for each key.

Examples

Example 1: Find the distinct element from a PCollection of String.

static final String[] WORDS_ARRAY = new String[]{
            "hi", "hi", "sue",
            "sue",  "bob"
    };
static final List<String> WORDS = Arrays.asList(WORDS_ARRAY);

PCollection<String> input =
        pipeline.apply(Create.of(WORDS)).withCoder(StringUtf8Coder.of());

PCollection<String> distinctWords = input.apply(Distinct.create());

Example 2: Find the distinct element from a PCollection of Integer.