AdityaNG/kan-gpt

Fork: 54 Star: 705 (更新于 2024-12-01 09:36:31)

license: MIT

Language: Python .

The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling

最后发布版本： 1.2.0 ( 2024-09-10 02:48:34)

官方网址 GitHub网址

介绍
版本
相关

KAN-GPT

PyPI - Downloads

The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling

Install it from PyPI

pip install kan_gpt

Citation

If you find our work useful cite us!

@misc{GANESH2024KANGPT,
  author       = {Aditya Nalgunda Ganesh},
  title        = {KAN-GPT: The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling},
  year         = {2024},
  month        = {May},
  note         = {Release 1.0.0, 9th May 2024},
  url          = {https://github.com/AdityaNG/kan-gpt/}
}

Usage

Refer to the KAN_GPT.ipynb and kan_gpt/prompt.py for usage examples. The following is an outline of how to use the model:

from kan_gpt.model import GPT
from transformers import GPT2Tokenizer

model_config = GPT.get_default_config()
model_config.model_type = "gpt2"
model_config.vocab_size = 50257
model_config.block_size = 1024
model = GPT(model_config)

tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

prompt = "Bangalore is often described as the "

prompt_encoded = tokenizer.encode(
  text=prompt, add_special_tokens=False
)

x = torch.tensor(prompt_encoded).unsqueeze(0)

model.eval()
y = model.generate(x, 50)  # sample 50 tokens

result = tokenizer.decode(y[0])

print(result)

# Bangalore is often described as the Silicon Valley of India.
# The city has witnessed rapid growth in the past two decades.....

Setup for Development

# Download Repo
git clone https://github.com/AdityaNG/kan-gpt
cd kan-gpt
git pull

# Download Dataset
python3 -m kan_gpt.download_dataset --dataset tinyshakespeare
python3 -m kan_gpt.download_dataset --dataset mnist
python3 -m kan_gpt.download_dataset --dataset webtext

# Install dependencies for development
pip install -r requirements.txt
pip install -e .

Train

Use the following dummy script to make sure everything is working as expected

WANDB_MODE=offline CUDA_VISIBLE_DEVICE="" python3 -m kan_gpt.train --architecture MLP --batch_size 1 --dummy_dataset --device cpu --max_iters 200
WANDB_MODE=offline CUDA_VISIBLE_DEVICE="" python3 -m kan_gpt.train --architecture KAN --batch_size 1 --dummy_dataset --device cpu --max_iters 200

Then make use of the training script

python -m kan_gpt.train

Prompt

You can prompt the model to produce text as follows

python -m kan_gpt.prompt --prompt "Bangalore is often described as the " --model_path (checkpoint)

Results

We train and compare KAN-GPT with an equivalent MLP-GPT model on the Tiny Shakespeare dataset. We observe that the KAN-GPT performs slightly better than the MLP-GPT. We are looking into further experiments to dive deeper. The results are shown below:

Metrics

TODOs

Integrate minGPT and pykan
Dataset downloading script for WebText
PyTorch Dataset parser for WebText
PyTorch Dataset parser for tinyshakespeare
Mini training POC for KAN-GPT
- Integrate KAN training logic from KAN.train_kan
- Train a dummy batch w/o any memory issues
Mini training POC for MLP-GPT
Train MLP-GPT on the webtext dataset as a baseline
Train KAN-GPT on the webtext dataset as a baseline
Metrics comparing KAN-GPT and MLP-GPT
Auto Save checkpoints
Auto Save checkpoints to W&B
Auto Download model weights from git / huggingface
W&B hyperparam sweep script
Script to load checkpoint in interactive mode
Reduce requrements.txt constraints
Define pydantic model for training and sweep args
Pruning the package, get rid of unused code
Training script to PyTorch Lighting
Documentation: mkdocs gh-deploy
Integrate with efficient-kan
Test Cases
- KAN: Forward-Backward test
- GPT: Forward-Backward test
- KAN_GPT: Forward-Backward test
- EFFICIENT_KAN: Forward-Backward test