v3.19.0
版本发布时间: 2023-08-31 22:36:37
OpenNMT/CTranslate2最新发布版本:v4.4.0(2024-09-09 17:21:54)
Changes
- Binary wheels for Python 3.7 are no longer built
New features
- Build wheels for Python 3.12
- Update the Transformers converter to support more model architectures:
- Falcon-RW
- DistilBERT
- Llama with linear RoPE scaling (e.g. Vicuna v1.5)
- Llama with a non default RoPE base period (e.g. CodeLlama)
- Accept the token type IDs as inputs for encoder models
- Add property
GenerationStepResult.hypothesis_id
to identify the different hypotheses when running random sampling withnum_hypotheses
> 1
Fixes and improvements
- Improve performance of 8-bit models on CPU:
- Vectorize the GEMM output dequantization
- Fuse the GEMM output dequantization with bias and activation
- Allow inputs shorter than 30 seconds in Whisper methods
- Fix incorrect
batch_id
values passed to the callback function - Fix a shape error in models using both MQA and relative positions
- Fix compilation error related to AVX512 when using GCC 7
- Call
.detach()
on PyTorch tensors before getting the Numpy array in converters