Skip to content

attentionmech/gpt.c

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gpt.c

Implementing decoder-only GPT style transformer in C

Note: unfinished dev

demo

the computational graph can be plotted as well using graphviz (since it's all in slots array)

How to Run

dataset is within the repo itself

gcc gpt.c; ./a.out

Development

  • Currently it's very slow, need update the codebase with CUDA; the last training run is present in assets/train.log
  • loss graph visualisation: loss

Tasks:

  • Implement matrix operations
  • Build a basic feed-forward neural network
  • Develop backpropagation
  • Gradient descent
  • Implement ReLU and Softmax
  • Loss function MSE
  • XOR Test
  • Add memory management (object tracking, cleanup) (slot system, objects occupy limited slots)
  • Construct forward and backward pass logic
  • MNIST Test
  • Implement Batching (major speedups)
  • Implemented GELU, Leaky RELU (all done as part of testing)
  • Implement iterative stack based backward pass (didn't do much benefit/ so removed)
  • Test the MLP with character prediction (Issues encounters: network stabiliy)
  • Tinystories Test
  • Implement n dimensional tensors
  • Implement Self-Attention Mechanism
  • Build a tokenization system (BPE)
  • Stack Transformer blocks (works by repetition of layers)
  • Multi-Head Attention
  • positional Encoding
  • learnable embeddings (one-hot X matrix = embedding)
  • adam optim
  • add dropout
  • LEARN CUDA
  • add back seq_len param to attention and ffn
  • add masking in attention
  • residual
  • layernorms
  • handle resume/restart of training
  • allow inference from saved file

issues encountered:

  • the build model function is messy, can simply with a matrix abstraction; otherwise rest of the features would be hard to implement correctly; good point to learn cuda and implement matmuls
  • dropout at 0 is not behaving correctly, which means there is something wrong in impl of it
  • too much object reallocation, design needs to change
  • Gradients are not converging properly
  • MNIST Test failed because of memory leaks.
  • Slow network convergence for large MLP
  • Network facing vanishing gradient issue
  • vanishing gradients after adding attention;

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published