Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: proposal to implement a subscription factory with relative offset. #886

Closed

Conversation

jewertow
Copy link
Contributor

@jewertow jewertow commented Sep 2, 2019

Purpose

Introduction of a subscription factory with relative offset definitions.

References

See #538

Changes

  • Add RelativeOffsetSubscriptions.assignmentForLastN

Background Context

There has been a need to query certain Kafka topics with relative offsets, for example last offset, or last offset minus 100, or last offset minus 10 minutes.

@lightbend-cla-validator
Copy link

Hi @jewertow,

Thank you for your contribution! We really value the time you've taken to put this together.

Before we proceed with reviewing this pull request, please sign the Lightbend Contributors License Agreement:

http://www.lightbend.com/contribute/cla

@jewertow jewertow marked this pull request as ready for review September 2, 2019 18:55
@jewertow
Copy link
Contributor Author

jewertow commented Sep 2, 2019

@ennru I will be grateful if you tell me if this is a right way.

@ennru
Copy link
Member

ennru commented Sep 3, 2019

Hi @jewertow
Thank you for contributing to Alpakka Kafka!
The implementation needs to get more involved as this so that it can be sent created without a reference to the actor. It would introduce new sub-classes to SubscriptionRequest that are sent to the actor which internally does the math.

@jewertow
Copy link
Contributor Author

jewertow commented Sep 29, 2019

I will continue working on this PR when I finish #900.

@jewertow jewertow force-pushed the wip/538-relative-offset-definitions branch from c4e764c to 1becd6a Compare November 27, 2019 22:46
@jewertow
Copy link
Contributor Author

@ennru
For now I have broaden my knowledge about the project so I prepared new proposal which is only a rough idea of target implementation. Please check it out and give me a feedback.

@ennru
Copy link
Member

ennru commented Dec 2, 2019

Thank you for getting back to this.

I think it would be interesting to explore how the new PartitionAssignmentHandler can be used to implement any kind of non-standard seeking. The idea would be to provide a handler which would react on assignments and apply custom seeking via the provided RestrictedConsumer.

import scala.jdk.CollectionConverters._

object RelativeOffsetAssignmentHandler {
def apply(backOffset: Long): RelativeOffsetAssignmentHandler =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seeks all assigned partitions to offset = endOffset - backOffset.

def apply(backOffset: Long): RelativeOffsetAssignmentHandler =
new RelativeOffsetAssignmentHandler(None, Some(backOffset))

def apply(topicPartition: TopicPartition, backOffset: Long): RelativeOffsetAssignmentHandler =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seeks given partition to offset = end offset - backOffset and all others are assigned to beginning offset.

def apply(topicPartition: TopicPartition, backOffset: Long): RelativeOffsetAssignmentHandler =
new RelativeOffsetAssignmentHandler(Some(Map(topicPartition -> backOffset)))

def apply(topicPartitionBackOffset: Map[TopicPartition, Long]): RelativeOffsetAssignmentHandler =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It allows to seek multiple partitions to offset = end offset - specified offset. All others assigned partitions are seeking to beginning offset.

def apply(topicPartitionBackOffset: Map[TopicPartition, Long]): RelativeOffsetAssignmentHandler =
new RelativeOffsetAssignmentHandler(Some(topicPartitionBackOffset))

def apply(tps: Set[TopicPartition], backOffset: Long): RelativeOffsetAssignmentHandler = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is intended to seek back all the given partitions with the same offset. All others assigned partitions are seeking to beginning offset.

@jewertow
Copy link
Contributor Author

jewertow commented Jan 27, 2020

@ennru
I couldn't work on it recently, but I get back to the issue and I proposed new implementation based on PartitionAssignmentHandler. Please check it out.

@jewertow
Copy link
Contributor Author

jewertow commented Feb 7, 2020

@seglo what do you think about it?

@ennru
Copy link
Member

ennru commented Feb 9, 2020

I would expect this relative offset seeking is interesting only when a partition is assigned the first time after an Alpakka Kafka consumer is started.
With what you suggest right now it would be applied even when there is a regular rebalance which "brings back" a partition to the consumer.

@jewertow
Copy link
Contributor Author

I forgot about rebalances. I will investigate it again.

Copy link
Contributor

@seglo seglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. I think it would be nice for this feature to include support for a time offset as Martynas originally suggested in #538. i.e. "I want to resume streaming from 10 minutes in the past"

There has been a need to query certain Kafka topics with relative offsets, for example last offset, or last offset minus 100, or last offset minus 10 minutes.

This can be done by looking up the offset for wallclock time - time interval to go back. Then you can lookup for the earliest offset(s) around that time with KafkaConsumer.offsetForTimes.

https://kafka.apache.org/24/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#offsetsForTimes-java.util.Map-java.time.Duration-

topicPartitionsBackOffset match {
case Some(partitionOffset) =>
val tps = partitionOffset.keys.filter(assignedTps.contains)
consumer.endOffsets(tps.toSet.asJava).asScala.foreach { tpOffset =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be better expressed with pattern matching on the tuple, and perhaps a for expression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e.

for {
  (partition, offset) <- consumer.endOffsets(tps.toSet.asJava).asScala
} {
  // ...
}

Comment on lines +46 to +47
backOffset.foreach { offset =>
assignedTps.foreach { tp =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A for expression would be nice here to reduce the nesting. i.e.

for (offset <- backOffset; tp <- assignedTps) {
  ...

@jewertow jewertow closed this Feb 23, 2020
@seglo
Copy link
Contributor

seglo commented Feb 24, 2020

@jewertow Just curious why you closed the PR. Are you still interested in pursuing this?

@jewertow
Copy link
Contributor Author

jewertow commented Feb 24, 2020

@seglo Yes, I want to implement it in the near future, but first I want to finish few other tasks in other projects that I opened. Open but inactive PR is annoying and disrupts the status of work in the project.

@seglo
Copy link
Contributor

seglo commented Feb 24, 2020

Alright. Hope to see you back soon :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants