Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add points in the middle frames in OnlineTAPIR #114

Open
PinxueGuo opened this issue Aug 13, 2024 · 3 comments
Open

Add points in the middle frames in OnlineTAPIR #114

PinxueGuo opened this issue Aug 13, 2024 · 3 comments

Comments

@PinxueGuo
Copy link

Thank you for the nice work!

May I ask if OnlineTAPIR (torch_causal_tapir_demo.ipynb) is possible to add point in the intermediate frames except for frame 0, just like StandardTAPIR (torch_tapir_demo.ipynb). Not quite sure how to construct causal_state.

@yangyi02
Copy link
Collaborator

For tracking points from another query frame, you can extract the query features from your query frame, then run the OnlineTAPIR inference pipeline.

i.e. query_features = online_model_init(frames[None, query_frame_index:query_frame_index + 1], query_points[None])

The OnlineTAPIR is supposed to running autoregressively from the beginning of the video and there is no need to change the causal_state initialization.

@PinxueGuo
Copy link
Author

Thank you for your reply.

Do you mean that the presence of new query points in subsequent frames should be determined during the causal state initialization at the beginning of the video? However, in my setting, it’s impossible to determine at the outset whether new query points will be added in later frames.

For instance, I might select 3 query points in the first frame, but by the 10th frame, I might want to add 2 more query points starting from that frame. So at the 11th frame there should be 5 tracking points, 3 of which start at frame 1, and 2 of which start at frame 10.

Does this mean that if I want to add 2 new points in the 10th frame, I need to discard the query features of the 1st frame and the historical clues from frames 1-9, and instead, re-extract the query features based on the 3 predicted points in the 10th frame (which could introduce errors)? Additionally, since the number of points has increased from 3 to 5, would the shape of the causal state also need to be adjusted by reinitializing it with the 5 points?

@yangyi02
Copy link
Collaborator

yangyi02 commented Nov 30, 2024

Hi @PinxueGuo

We just update our colab notebooks to allow select points in the middle frames in Online TAPIR. Please give it try at your convinience.
Screenshot 2024-11-30 at 10 15 35

Note that this is not the same as your request of adding points during your tracking, but it allows you to select points in the middle frames before the model starts tracking.

Regarding to your last question, here are the two brief answers:

  1. If you select all query points at once (no matter which frame), the causal state will be initialized for all of them. Online tracking will start from the first frame even when some middle frame points are invisible yet. Causal state will get updated at every frame. The reason we can do this is because the query features are extracted from the middle frame before tracking starts, so it is a bit like knowing the query feature in the future. However, tracking is still conducted in a causal/online fashion.
  2. If you already start tracking a few query points, and then add a few more query points in the middle during tracking. You can simply initialize the causal states for the new query points while maintain existing ones. You don't need to start from the first frame for the new query points. The reason is because TAPIR/TAPNet track points independently. There is no correlation between different query points. Our live demo demonstrates this.

Thanks for your question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants