Compression Task Bug #9

xumingyu2021 · 2024-12-11T10:00:22Z

When I run the following cmd:

python -m train --task compression --layers mh-attention swiglu mh-attention swiglu

Then the error occur:

ValueError: Expected input batch_size (163840) to match target batch_size (16384).

mcleish7 · 2024-12-24T14:24:19Z

Hi,

I also have this issue and traced it to the position_ids not being set when passed into .encode here. This means that the position_ids are None when returned from .encode.
This then means None is passed to decode. position_ids is then used to select from the dec_position_embeds here meaning all (max_length) position embeddings are taken with an extra dimension added because of the None. However, I think only sequence length number of embeddings are supposed to be taken.
A simple fix for this is to add a one liner into the forward method to over write position_ids like this:

    def forward(self, input_ids: torch.Tensor) -> torch.Tensor:
        encoding, _ = self.encode(input_ids)
        position_ids = torch.arange(0, input_ids.shape[1])  # temp fix
        token_logits = self.decode(encoding, position_ids)
        return token_logits

I don't know if this is the correct fix, but it seems to make it work and seems logical from reading the code. Would love to know if it is correct @athms :)

xumingyu2021 closed this as completed Dec 11, 2024

xumingyu2021 reopened this Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression Task Bug #9

Compression Task Bug #9

xumingyu2021 commented Dec 11, 2024

mcleish7 commented Dec 24, 2024

Compression Task Bug #9

Compression Task Bug #9

Comments

xumingyu2021 commented Dec 11, 2024

mcleish7 commented Dec 24, 2024