Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compression Task Bug #9

Open
xumingyu2021 opened this issue Dec 11, 2024 · 1 comment
Open

Compression Task Bug #9

xumingyu2021 opened this issue Dec 11, 2024 · 1 comment

Comments

@xumingyu2021
Copy link

When I run the following cmd:

python -m train --task compression --layers mh-attention swiglu mh-attention swiglu

Then the error occur:

ValueError: Expected input batch_size (163840) to match target batch_size (16384).

@mcleish7
Copy link

Hi,

I also have this issue and traced it to the position_ids not being set when passed into .encode here. This means that the position_ids are None when returned from .encode.
This then means None is passed to decode. position_ids is then used to select from the dec_position_embeds here meaning all (max_length) position embeddings are taken with an extra dimension added because of the None. However, I think only sequence length number of embeddings are supposed to be taken.
A simple fix for this is to add a one liner into the forward method to over write position_ids like this:

    def forward(self, input_ids: torch.Tensor) -> torch.Tensor:
        encoding, _ = self.encode(input_ids)
        position_ids = torch.arange(0, input_ids.shape[1])  # temp fix
        token_logits = self.decode(encoding, position_ids)
        return token_logits

I don't know if this is the correct fix, but it seems to make it work and seems logical from reading the code. Would love to know if it is correct @athms :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants