SparkReceiver IO
SparkReceiverIO is a transform for reading data from an Apache Spark Receiver as an unbounded source.
Spark Receivers support
SparkReceiverIO currently supports Apache Spark Receiver.
Requirements for Spark Receiver:
- Version of Spark should be 2.4.*.
- Spark Receivershould support work with offsets.
- Spark Receivershould implement HasOffset interface.
- Records should have the numeric field that represents record offset.
For more details please see SparkReceiverIO readme.
Streaming reading using SparkReceiverIO
In order to read from Spark Receiver you will need to pass:
- getOffsetFn, which is- SerializableFunctionthat defines how to get- Longrecord offset from a record.
- receiverBuilder, which is needed for building instances of- Spark Receiverthat use Apache Beam mechanisms instead of Spark environment.
You can easily create receiverBuilder object by passing the following parameters:
- Class of your Spark Receiver.
- Constructor arguments needed to create an instance of your Spark Receiver.
For example:
//In this example, MyReceiver accepts a MyConfig object as its only constructor parameter.
MyConfig myPluginConfig = new MyConfig(authToken, apiServerUrl);
Object[] myConstructorArgs = new Object[] {myConfig};
ReceiverBuilder<String, MyReceiver<String>> myReceiverBuilder =
  new ReceiverBuilder<>(MyReceiver.class)
    .withConstructorArgs(myConstructorArgs);Then you will be able to pass this receiverBuilder object to SparkReceiverIO.
For example:
Read data with optional parameters
Optionally you can pass the following optional parameters:
- pullFrequencySec, which is delay in seconds between polling for new records updates.
- startOffset, which is inclusive start offset from which the reading should be started.
- timestampFn, which is a- SerializableFunctionthat defines how to get an- Instant timestampfrom a record.
For example:
Examples for specific Spark Receiver
CDAP Hubspot Receiver
ReceiverBuilder<String, HubspotReceiver<String>> hubspotReceiverBuilder =
  new ReceiverBuilder<>(HubspotReceiver.class)
    .withConstructorArgs(hubspotConfig);
SparkReceiverIO.Read<String> readTransform =
  SparkReceiverIO.<String>read()
    .withGetOffsetFn(GetOffsetUtils.getOffsetFnForHubspot())
    .withSparkReceiverBuilder(hubspotReceiverBuilder)
p.apply("readFromHubspotReceiver", readTransform);Last updated on 2025/10/31
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!
 

 
