Skip to content

Commit

Permalink
Avoid a contiguous call in the quantized phi 3 model. (huggingface#2209)
Browse files Browse the repository at this point in the history
* Simplify the KvCache api.

* Avoid a contiguous call in the quantized phi3 model.
  • Loading branch information
LaurentMazare authored May 23, 2024
1 parent 45e235a commit d54e02d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion candle-transformers/src/models/quantized_phi3.rs
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ impl LayerWeights {
};
let att = candle_nn::ops::softmax_last_dim(&att)?;
// Convert to contiguous as matmul doesn't support strided vs for now.
att.matmul(&v.contiguous()?)?
att.matmul(&v)?
};
let y = y.transpose(1, 2)?.reshape(&[b_sz, seq_len, n_embd])?;
let y = self.attn_output.forward(&y)?;
Expand Down

0 comments on commit d54e02d

Please sign in to comment.