SparkReceiver IO
SparkReceiverIO is a transform for reading data from an Apache Spark Receiver as an unbounded source.
Spark Receivers support
SparkReceiverIO
currently supports Apache Spark Receiver.
Requirements for Spark Receiver
:
- Version of Spark should be 2.4.*.
Spark Receiver
should support work with offsets.Spark Receiver
should implement HasOffset interface.- Records should have the numeric field that represents record offset.
For more details please see SparkReceiverIO readme.
Streaming reading using SparkReceiverIO
In order to read from Spark Receiver
you will need to pass:
getOffsetFn
, which isSerializableFunction
that defines how to getLong
record offset from a record.receiverBuilder
, which is needed for building instances ofSpark Receiver
that use Apache Beam mechanisms instead of Spark environment.
You can easily create receiverBuilder
object by passing the following parameters:
- Class of your
Spark Receiver
. - Constructor arguments needed to create an instance of your
Spark Receiver
.
For example:
//In this example, MyReceiver accepts a MyConfig object as its only constructor parameter.
MyConfig myPluginConfig = new MyConfig(authToken, apiServerUrl);
Object[] myConstructorArgs = new Object[] {myConfig};
ReceiverBuilder<String, MyReceiver<String>> myReceiverBuilder =
new ReceiverBuilder<>(MyReceiver.class)
.withConstructorArgs(myConstructorArgs);
Then you will be able to pass this receiverBuilder
object to SparkReceiverIO
.
For example:
Read data with optional parameters
Optionally you can pass the following optional parameters:
pullFrequencySec
, which is delay in seconds between polling for new records updates.startOffset
, which is inclusive start offset from which the reading should be started.timestampFn
, which is aSerializableFunction
that defines how to get anInstant timestamp
from a record.
For example:
Examples for specific Spark Receiver
CDAP Hubspot Receiver
ReceiverBuilder<String, HubspotReceiver<String>> hubspotReceiverBuilder =
new ReceiverBuilder<>(HubspotReceiver.class)
.withConstructorArgs(hubspotConfig);
SparkReceiverIO.Read<String> readTransform =
SparkReceiverIO.<String>read()
.withGetOffsetFn(GetOffsetUtils.getOffsetFnForHubspot())
.withSparkReceiverBuilder(hubspotReceiverBuilder)
p.apply("readFromHubspotReceiver", readTransform);
Last updated on 2025/01/20
Have you found everything you were looking for?
Was it all useful and clear? Is there anything that you would like to change? Let us know!