Skip to content

This projects implements a toy-example of GPT-2 with additional bells and whistles like Mixture-of-Experts and MAMBA blocks.

License

Notifications You must be signed in to change notification settings

mvish7/GPT_Playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPT_Playground

This projects implements a toy-example of GPT-2 with additional bells and whistles like Mixture-of-Experts and MAMBA blocks. To get started, this repo uses Karpathy's nanoGPT repository.

Training data

To train a language model of GPT variant, this repo uses Harry Potter books. The dataset is already preprocessed, and it can be found on Kaggle.

ToDo:
[ ] Implement MoE blocks to convert Standard GPT into an Sparse MoE based language model
[ ] Implement MAMBA block as an alternative to regular transformer block
[ ] Implement evaluation mechanism (perplexity)

About

This projects implements a toy-example of GPT-2 with additional bells and whistles like Mixture-of-Experts and MAMBA blocks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages