You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The one-word summary's embedding seems meaningful from the paper, as suggested by your Figure 1, which shows that the text/image/interleaved embedding is close to the relevant word.
I wonder how did you plot the Figure 1, specifically, the inputs for it. From your code the text/image/interleaved embedding refers to the last hidden state generated with one-word summary prompt. But how do you get corresponding embeddings for those single words (e.g. 'Dog', 'Cat') in this case? I assume it can't simply be the word embedding. Or is it just an illustrative figure?
Best
The text was updated successfully, but these errors were encountered:
The single words refer to the embeddings from LM heads. Figure 1 is drawn based on the similarity matrix between embeddings and these single word embeddings (which correspond to next token probabilities for these words).
Hi there,
Thank you for publishing this interesting work.
The one-word summary's embedding seems meaningful from the paper, as suggested by your Figure 1, which shows that the text/image/interleaved embedding is close to the relevant word.
I wonder how did you plot the Figure 1, specifically, the inputs for it. From your code the text/image/interleaved embedding refers to the last hidden state generated with one-word summary prompt. But how do you get corresponding embeddings for those single words (e.g. 'Dog', 'Cat') in this case? I assume it can't simply be the word embedding. Or is it just an illustrative figure?
Best
The text was updated successfully, but these errors were encountered: