You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your implementation of mamba-minimal. What a great job!
I'm really confused about the dimension of the parameter delta. I understand that delta is used for the discretization of A and B in SSM. However, I don't understand that why delta are first shaped in (b,l,dt_rank) and then project into (b,l,d_inner) as in the algorithm 2 of mamba in the paper. Why do we need to shape delta into (b,l,dt_rank) and then 𝜏Δ (Parameter+𝑠Δ (𝑥)).
(delta, B, C) = x_dbl.split(split_size=[self.args.dt_rank, n, n], dim=-1) # delta: (b, l, dt_rank). B, C: (b, l, n)
delta = F.softplus(self.dt_proj(delta)) # (b, l, d_in)
Can you explain the reason for this operation in the code? I'm looking forward to your reply.
The text was updated successfully, but these errors were encountered:
Thanks for your implementation of mamba-minimal. What a great job!
I'm really confused about the dimension of the parameter delta. I understand that delta is used for the discretization of A and B in SSM. However, I don't understand that why delta are first shaped in (b,l,dt_rank) and then project into (b,l,d_inner) as in the algorithm 2 of mamba in the paper. Why do we need to shape delta into (b,l,dt_rank) and then 𝜏Δ (Parameter+𝑠Δ (𝑥)).
(delta, B, C) = x_dbl.split(split_size=[self.args.dt_rank, n, n], dim=-1) # delta: (b, l, dt_rank). B, C: (b, l, n)
delta = F.softplus(self.dt_proj(delta)) # (b, l, d_in)
Can you explain the reason for this operation in the code? I'm looking forward to your reply.
The text was updated successfully, but these errors were encountered: