You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I also have this issue and traced it to the position_ids not being set when passed into .encodehere. This means that the position_ids are None when returned from .encode.
This then means None is passed to decode. position_ids is then used to select from the dec_position_embedshere meaning all (max_length) position embeddings are taken with an extra dimension added because of the None. However, I think only sequence length number of embeddings are supposed to be taken.
A simple fix for this is to add a one liner into the forward method to over write position_ids like this:
I don't know if this is the correct fix, but it seems to make it work and seems logical from reading the code. Would love to know if it is correct @athms :)
When I run the following cmd:
python -m train --task compression --layers mh-attention swiglu mh-attention swiglu
Then the error occur:
ValueError: Expected input batch_size (163840) to match target batch_size (16384).
The text was updated successfully, but these errors were encountered: