Regarding one-word summary #21

LezJ · 2025-03-14T00:47:06Z

Hi there,

Thank you for publishing this interesting work.

The one-word summary's embedding seems meaningful from the paper, as suggested by your Figure 1, which shows that the text/image/interleaved embedding is close to the relevant word.
I wonder how did you plot the Figure 1, specifically, the inputs for it. From your code the text/image/interleaved embedding refers to the last hidden state generated with one-word summary prompt. But how do you get corresponding embeddings for those single words (e.g. 'Dog', 'Cat') in this case? I assume it can't simply be the word embedding. Or is it just an illustrative figure?

Best

kongds · 2025-03-15T01:34:31Z

Thank you for your interest in our work.

The single words refer to the embeddings from LM heads. Figure 1 is drawn based on the similarity matrix between embeddings and these single word embeddings (which correspond to next token probabilities for these words).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding one-word summary #21

Regarding one-word summary #21

LezJ commented Mar 14, 2025 •

edited

Loading

kongds commented Mar 15, 2025

Regarding one-word summary #21

Regarding one-word summary #21

Comments

LezJ commented Mar 14, 2025 • edited Loading

kongds commented Mar 15, 2025

LezJ commented Mar 14, 2025 •

edited

Loading