This is a reimplementation of GPT2 (small) inference from scratch with no runtime dependencies (seriously, check package.json). Implementation was designed to be an educational exercise and was heavily based off of the wonderful picoGPT repository.
Because there are no accelerated math libraries under the hood (read: there's no javascript equivalent of numpy), this implementation is potato-slow: on the order of a couple tokens per minute on a Macbook Air M2.
(base) ben@Bens-MacBook-Air transformers % yarn start
yarn run v1.22.19
warning package.json: No license field
$ ts-node main.ts
loading gpt
done
Block done
Chosen token: [ 'peanut butter and', ' jelly' ]
...
Block done
Chosen token: [
'peanut butter and jelly.\n' +
'\n' +
"I'm not sure if I'm going to be able to get this recipe to work for me, but I'm going to try it",
'.'
]
- Clone the repository (weights are included, about 500mb or so, so it may take a bit)
- Run
node main.js
(noyarn install
needed unless you want to edit the code)
The novel part of this codebase is that it demonstrates type-safe tensor operations. Said another way: you no longer have to run your code to track and verify the shape of your tensors!
In the screenshot above, I'm multiplying a 3x4 matrix with a 4x5 matrix. The typesystem concludes that the output matrix is (correctly 3x5).
But what happens if I screw up and try to multiply two matrixes that don't share the inner dimension?
As you can see, typescript can detect this and provide an error message that there are differing types, 4 and 5. This way you can catch your error the instant you make it.
Another challenge is that you might have a dynamically sized tensor (i.e. based on the length of the input sequence). Typing this dimension as number
would destroy the aforementioned typechecking because one number
is indistinguishable for another number
. The solution to this is to use branded types like so:
In this example, the first dimension of tensorA is only known at runtime. But because we can tag it as Var<'Sequence Length'>
, all future uses will typecheck just as they would above: if you tried to intermingle differently branded numbers, typescript would yell.
Note that I've only implemented the bare minimum of tensor math to get GPT2 to work, representing a tiny fraction of what you would get out of something like numpy. Notably, there's nothing like broadcasting implemented.
PotatoGPT uses fully type-safe tensors everywhere.
Many years ago, I wrote an autodifferentiating compiler in clojure so I have a reasonable idea of how training works. The primary point of this exercise was to prove to myself that I understand the architecture of GPTs.
The weights were generated using the python notebook found in this repository. I spent a while trying to figure out if I could read the raw tensorflow checkpoints but doing so without third party libraries is tough. Tensorflow stores weights in a vaguely LevelDB-like SSTable format "for efficiency" but after reading a bunch of enterprise-grade Google C++, I decided to just write each tensor to disk in a structured format. This ended up being smaller than the "optimized" tensorflow checkpoint format and way faster to load than anything else I tried ¯_(ツ)_/¯
I'm not sure! Python looks to have generic types, but without conditional types it is tricky to have any meaningful dynamic behavior. I would love to be proven wrong here.