0.0.2

kyegomez/LongNet

版本发布时间: 2023-07-06 22:09:52

kyegomez/LongNet最新发布版本:0.4.8(2023-08-11 03:04:14)

Changelog

Flash Multi-head Attention Integration
- Initially, the code was using torch.nn.MultiheadAttention. It has been changed to use FlashMultiHeadAttention from the flash_attn library. This change is expected to improve the efficiency of the model's attention mechanism.
GPU Support
- All computations are now done on a specified GPU device, defined at the beginning of the script. This significantly improves the model's speed and efficiency.
Use of 16-bit Floating Point Precision
- The data type for computations was changed from default (32-bit floating-point) to 16-bit floating-point (torch.float16). This reduces memory usage and improves the speed of computations on modern GPUs.
Added Dropout
- A dropout layer has been added after the attention operation in the DilatedAttention class. Dropout is a regularization technique that prevents overfitting by randomly setting a fraction of input units to 0 at each update during training time.
The changes were made in the following lines of code:
```
# Initialize dropout layer in the constructor
self.dropout = nn.Dropout(dropout)

# Apply dropout after performing attention in the forward function
attn_output = self.dropout(attn_output)
```
Added Unit Tests and Benchmarks
- Unit tests and benchmarking code have been added to ensure the correctness and efficiency of the DilatedAttention class.
Documentation and Example Updates
- Updated the documentation and usage examples for the DilatedAttention class to reflect the above changes.
Twitter Thread
- Created a Twitter thread in the style of Richard Feynman to promote the project and its new updates.

相关地址：原始地址下载(tar) 下载(zip)

查看：2023-07-06发行的版本