v4.2.0
版本发布时间: 2024-04-10 19:41:42
OpenNMT/CTranslate2最新发布版本:v4.4.0(2024-09-09 17:21:54)
New features
- Support Flash Attention (#1651)
- Implementation of gemm for FLOAT32 compute type with RUY backend (#1598)
- Conv1D quantization for only CPU (DNNL and CUDA backend is not supported) (#1601)
Fixes and improvements
- Fix bug tensor parallel (#1643)
- Use BestSampler when temperature is 0 (#1659)
- Fix bug gemma (#1660)
- Optimize loading/unloading time for Translator with cache (#1645)