v2.0.0
What's Changed
- engine: stop and release model when engine release, and remove deprecated lock
- sampling: generate_op heavily modified, remove dependency on global tensors
- prefix cache: some bug fix, impove evict performance
- json mode: update lmfe-cpp patch, add process_logits, sampling with top_k top_p
- span-attention: move span_attn decoderReshape to init
- lora: add docs, fix typo
- ubuntu: add ubuntu dockerfile, fix install dir err
- bugifx: fix multi-batch rep_penlty bug
Full Changelog: v1.3.0...v2.0.0