Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the parameter delta #33

Open
zzzack66 opened this issue Nov 16, 2024 · 1 comment
Open

Question about the parameter delta #33

zzzack66 opened this issue Nov 16, 2024 · 1 comment

Comments

@zzzack66
Copy link

Thanks for your implementation of mamba-minimal. What a great job!
I'm really confused about the dimension of the parameter delta. I understand that delta is used for the discretization of A and B in SSM. However, I don't understand that why delta are first shaped in (b,l,dt_rank) and then project into (b,l,d_inner) as in the algorithm 2 of mamba in the paper. Why do we need to shape delta into (b,l,dt_rank) and then 𝜏Δ (Parameter+𝑠Δ (𝑥)).
(delta, B, C) = x_dbl.split(split_size=[self.args.dt_rank, n, n], dim=-1) # delta: (b, l, dt_rank). B, C: (b, l, n)
delta = F.softplus(self.dt_proj(delta)) # (b, l, d_in)
Can you explain the reason for this operation in the code? I'm looking forward to your reply.

@fenglinzhu123
Copy link

I have same question。please explain to me if you have answer or advice.Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants