Question about evaluate_posterior function #171

Hannibal046 · 2024-12-29T07:50:27Z

Hi, Teams,

Thanks for the great work!

I have a question about the evaluate_posterior function, especially about this line:

Line 387 in 35c78f6

qx = 1.0

Although it seems like a more strict version, it doesn't align with the original speculative sampling method. May I understand why?

The text was updated successfully, but these errors were encountered:

Liyuhui-12 · 2024-12-30T15:44:05Z

Because top-k sampling is used in the draft phase, when a previous draft token is rejected, the sampling probability of the current draft token is 1. This still aligns with the theoretical guarantees of speculative sampling, ensuring losslessness and greater efficiency. You can also refer to #50, which includes validation on the invariance of the distribution.

Hannibal046 · 2024-12-31T03:08:15Z

Thank you for the detailed explanation! Since the speculative decoding method in this implementation differs from the approach outlined in the paper, would it be possible for you to help me correct the following algorithm diagram? Do I understand the current implementation correctly?

This would help clarify the modifications and make it easier for everyone to understand the design choices.

Thanks in advance for your assistance!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about evaluate_posterior function #171

Question about evaluate_posterior function #171

Hannibal046 commented Dec 29, 2024

Liyuhui-12 commented Dec 30, 2024

Hannibal046 commented Dec 31, 2024

Question about evaluate_posterior function #171

Question about evaluate_posterior function #171

Comments

Hannibal046 commented Dec 29, 2024

Liyuhui-12 commented Dec 30, 2024

Hannibal046 commented Dec 31, 2024