v0.0.21
版本发布时间: 2023-08-18 22:34:52
facebookresearch/xformers最新发布版本:v0.0.28.post1(2024-09-13 23:52:20)
[0.0.21] - 2023-08-18
Improved
- fMHA: Updated flash-attention to v2, with massive performance improvements for both the forward pass and backward pass. This implementation is now used by default when it's available
Bug fixes
- fMHA/cutlass: Fix potential race condition in the FW/BW passes
- fMHA/cutlass: Fix
attn_bias
stride overflow for very long sequences (>32k) -
LowerTriangularMask
is now backward compatible with older xformers versions
Breaking changes
-
memory_efficient_attention
now expects theattn_bias
argument to have a head dimension -
memory_efficient_attention
no longer broadcasts the batch/head dimensions ofattn_bias
. Please use.expand
if you need to broadcast the bias - Remove
causal_diagonal
argument fromBlockDiagonalCausalWithOffsetPaddedKeysMask
Added
- Binary wheels on pypi/conda now contain H100 kernels
- fMHA: Added backend specialized for decoding that does not use TensorCores - useful when not using multiquery
NOTE: Binary wheels are now provided only for PyTorch 2 with cuda 11.8. It is still possible to use xFormers with older versions of PyTorch by building from source or using conda.