I am working on a Beam application that uses KafkaIO as an input
KafkaIO.<Long, GenericRecord>read()
.withBootstrapServers("bootstrapServers")
.withTopic("topicName")
.withConsumerConfigUpdates(confs)
.withKeyDeserializer(LongDeserializer.class)
.withValueDeserializer((Deserializer.class)
.commitOffsetsInFinalize()
.withoutMetadata();
I am trying to understand how exactly the commitOffsetsInFinalize()
works.
How can the streaming job be finalized?
The last step in the pipeline is a custom DoFn that writes the messages to DynamoDb
. Is there any way to manually call some finalize()
method there, so that the offsets are committed after each successful execution of the DoFn
?
Also I am having hard time understanding whats the relation between the checkpoints and the finalization ? If no checkpoint is enabled on the pipeline, will I still be able to finalize and get the commitOffsetsInFinalize()
to work?
p.s The way the pipeline is right now, even with the commitOffsetsInFinalize()
each message that is read, regardless whether there is a failure downstream is being committed, hence causing a data lose.
Thank you!