Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 268 Bytes

v0.13.1.md

File metadata and controls

7 lines (5 loc) · 268 Bytes

v0.13.1 (2024-07-10)

Fixed and Improvements

  • Bump llama.cpp version to b3334, supporting Deepseek V2 series models.
  • Turn on fast attention for Qwen2-1.5B model to fix the quantization error.
  • Properly set number of GPU layers (to zero) when device is CPU.