gpt.c

Implementing decoder-only GPT style transformer in C

Note: unfinished dev

the computational graph can be plotted as well using graphviz (since it's all in slots array)

dataset is within the repo itself

gcc gpt.c; ./a.out

Currently it's very slow, need update the codebase with CUDA; the last training run is present in assets/train.log
loss graph visualisation: loss

the build model function is messy, can simply with a matrix abstraction; otherwise rest of the features would be hard to implement correctly; good point to learn cuda and implement matmuls
~~dropout at 0 is not behaving correctly, which means there is something wrong in impl of it~~
~~too much object reallocation, design needs to change~~
~~Gradients are not converging properly~~
~~MNIST Test failed because of memory leaks.~~
~~Slow network convergence for large MLP~~
~~Network facing vanishing gradient issue~~
~~vanishing gradients after adding attention;~~

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
assets		assets
dataset		dataset
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gpt.c		gpt.c

Provide feedback