Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential off-by-one error in attention mask generation #76

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dibyaghosh
Copy link
Collaborator

@dibyaghosh dibyaghosh commented Apr 15, 2024

See #75

TL;DR: There is a small bug in the attention masking code; this should not practically affect anyone using the released model or training their own models (unless you're doing some special attention mask scheme), but we will fix it soon in an update.

The issue: If you have multiple timestep groups, the bug causes the first token in the second group to be misclassified as being in the first group (similarly, 1st token of 3rd group is misclassified as being in group 2, so on). If your model relies on different timestep groups not being able to attend to each other (this is a pretty non-standard case), then this could cause undesired information leakage.

For most people (if you are using the released model checkpoints, if you are using our config for pretraining), it should not affect any of your use cases. There might be some weird behavior if you try specifying readouts to a non-standard value in

readouts: Optional[Sequence[str]] = None,

WenchangGaoT pushed a commit to WenchangGaoT/octo1 that referenced this pull request May 10, 2024
…idation

Make Validation Metrics more meaningful on RTX
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant