MyGit

0.0.2

kyegomez/LongNet

版本发布时间: 2023-07-06 22:09:52

kyegomez/LongNet最新发布版本:0.4.8(2023-08-11 03:04:14)

Changelog

  1. Flash Multi-head Attention Integration

    • Initially, the code was using torch.nn.MultiheadAttention. It has been changed to use FlashMultiHeadAttention from the flash_attn library. This change is expected to improve the efficiency of the model's attention mechanism.
  2. GPU Support

    • All computations are now done on a specified GPU device, defined at the beginning of the script. This significantly improves the model's speed and efficiency.
  3. Use of 16-bit Floating Point Precision

    • The data type for computations was changed from default (32-bit floating-point) to 16-bit floating-point (torch.float16). This reduces memory usage and improves the speed of computations on modern GPUs.
  4. Added Dropout

    • A dropout layer has been added after the attention operation in the DilatedAttention class. Dropout is a regularization technique that prevents overfitting by randomly setting a fraction of input units to 0 at each update during training time.

    The changes were made in the following lines of code:

    # Initialize dropout layer in the constructor
    self.dropout = nn.Dropout(dropout)
    
    # Apply dropout after performing attention in the forward function
    attn_output = self.dropout(attn_output)
    
  5. Added Unit Tests and Benchmarks

    • Unit tests and benchmarking code have been added to ensure the correctness and efficiency of the DilatedAttention class.
  6. Documentation and Example Updates

    • Updated the documentation and usage examples for the DilatedAttention class to reflect the above changes.
  7. Twitter Thread

    • Created a Twitter thread in the style of Richard Feynman to promote the project and its new updates.

相关地址:原始地址 下载(tar) 下载(zip)

查看:2023-07-06发行的版本