v4.2.0
版本发布时间: 2024-04-10 19:41:42
OpenNMT/CTranslate2最新发布版本:v4.2.1(2024-04-24 18:04:01)
New features
- Support Flash Attention (#1651)
- Implementation of gemm for FLOAT32 compute type with RUY backend (#1598)
- Conv1D quantization for only CPU (DNNL and CUDA backend is not supported) (#1601)
Fixes and improvements
- Fix bug tensor parallel (#1643)
- Use BestSampler when temperature is 0 (#1659)
- Fix bug gemma (#1660)
- Optimize loading/unloading time for Translator with cache (#1645)