I learned how to build a gpt architecture in Karpathy's video https://www.youtube.com/watch?v=kCc8FmEb1nY and want to apply what I learn to making some small projects, therefore, I would like to build a Lu Xun stye gpt. After searching on the internet for this, I found that the existing AI applications related to Lu Xun are mostly based on prompt engineering or supervised fine-tuning. As a result, I build this repo and the Lu Xun style gpt model.
I will describe my process building a LuXun-Style-GPT in the first part:
- I collect all the widely spread Lu Xun written articles, reviews and letters from github. Thanks to the contributer who correct all the Lu Xun written texts. The github repo can be found in https://github.com/PzzAg6/Xun-Lu-s-article-collection.
- I will build the transformer decoder architecture which is actually based on Attention is all you need and Improving Language Understanding by Generative Pre-Training.
- Tune the hyperparameter and train the model.