Question about the parameter delta #33

zzzack66 · 2024-11-16T12:00:58Z

Thanks for your implementation of mamba-minimal. What a great job!
I'm really confused about the dimension of the parameter delta. I understand that delta is used for the discretization of A and B in SSM. However, I don't understand that why delta are first shaped in (b,l,dt_rank) and then project into (b,l,d_inner) as in the algorithm 2 of mamba in the paper. Why do we need to shape delta into (b,l,dt_rank) and then 𝜏Δ (Parameter+𝑠Δ (𝑥)).
(delta, B, C) = x_dbl.split(split_size=[self.args.dt_rank, n, n], dim=-1) # delta: (b, l, dt_rank). B, C: (b, l, n)
delta = F.softplus(self.dt_proj(delta)) # (b, l, d_in)
Can you explain the reason for this operation in the code? I'm looking forward to your reply.

fenglinzhu123 · 2024-12-13T14:19:34Z

I have same question。please explain to me if you have answer or advice.Thank you very much

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the parameter delta #33

Question about the parameter delta #33

zzzack66 commented Nov 16, 2024

fenglinzhu123 commented Dec 13, 2024

Question about the parameter delta #33

Question about the parameter delta #33

Comments

zzzack66 commented Nov 16, 2024

fenglinzhu123 commented Dec 13, 2024