Latest
Estimates the number of distinct elements in a data stream using the HyperLogLog++ algorithm. The respective transforms to create and merge sketches, and to extract from them, are:
HllCount.Initaggregates inputs into HLL++ sketches.HllCount.MergePartialmerges HLL++ sketches into a new sketch.HllCount.Extractextracts the estimated count of distinct elements from HLL++ sketches.
You can read more about what a sketch is at https://github.com/google/zetasketch.
Examples
Example 1: creates a long-type sketch for a PCollection<Long> with a custom precision:
Example 2: creates a bytes-type sketch for a PCollection<KV<String, byte[]>>:
Example 3: merges existing sketches in a PCollection<byte[]> into a new sketch,
which summarizes the union of the inputs that were aggregated in the merged sketches:
Example 4: estimates the count of distinct elements in a PCollection<String>:
Example 5: extracts the count distinct estimate from an existing sketch:
Related transforms
- ApproximateUnique
estimates the number of distinct elements or values in key-value pairs (but does not expose sketches; also less accurate than
HllCount).
Last updated on 2025/10/26
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!


Javadoc