0.0.2
版本发布时间: 2023-07-06 22:09:52
kyegomez/LongNet最新发布版本:0.4.8(2023-08-11 03:04:14)
Changelog
-
Flash Multi-head Attention Integration
- Initially, the code was using
torch.nn.MultiheadAttention
. It has been changed to use FlashMultiHeadAttention from theflash_attn
library. This change is expected to improve the efficiency of the model's attention mechanism.
- Initially, the code was using
-
GPU Support
- All computations are now done on a specified GPU device, defined at the beginning of the script. This significantly improves the model's speed and efficiency.
-
Use of 16-bit Floating Point Precision
- The data type for computations was changed from default (32-bit floating-point) to 16-bit floating-point (torch.float16). This reduces memory usage and improves the speed of computations on modern GPUs.
-
Added Dropout
- A dropout layer has been added after the attention operation in the DilatedAttention class. Dropout is a regularization technique that prevents overfitting by randomly setting a fraction of input units to 0 at each update during training time.
The changes were made in the following lines of code:
# Initialize dropout layer in the constructor self.dropout = nn.Dropout(dropout) # Apply dropout after performing attention in the forward function attn_output = self.dropout(attn_output)
-
Added Unit Tests and Benchmarks
- Unit tests and benchmarking code have been added to ensure the correctness and efficiency of the DilatedAttention class.
-
Documentation and Example Updates
- Updated the documentation and usage examples for the DilatedAttention class to reflect the above changes.
-
Twitter Thread
- Created a Twitter thread in the style of Richard Feynman to promote the project and its new updates.