This repository contains an educational project aimed at introducing high school students 🎓 to the concept 💡 of neural networks through the use of "supercazzola" 😄 lines from Mario Monicelli's movies, "Amici Miei" (My Friends) 🎥.
The llama-supercazzola
project offers students 🎓 a unique opportunity to explore the workings of neural networks and language models 🤖. Using a dataset containing lines of gibberish from the "Amici Miei" movies, the project allows students 🎓 to understand how a neural network can be trained to generate text in response to specific inputs. The project leverages the llama.cpp framework for both inference and training tasks on llama models.
- Language model training and inference using
llama.cpp
framework. - Generation of short "supercazzola" in response to every questions 🎉.
- Expansion of knowledge on neural networks and natural language 🌟🌟🌟🌟🌟.
- Convenient scripts for training and inference:
training.sh
andinference.sh
.
- C++ compiler and development environment
- llama.cpp framework and dependencies
- Clone the repository including its submodules to your local environment and enter the
llama-supercazzola
directory
$ git clone --recurse-submodules [email protected]:buzzobuono/llama-supercazzola.git
$ cd llama-supercazzola
- Build the
llama.cpp
framework for training and inference tasks.
$ cd `llama.cpp`
$ make
$ cd ..
- Execute
training.sh
🚀 for training the model
$ ./training.sh
Note: Please be aware that intermediate training checkpoints of the network might occupy a significant amount of disk space. It is advisable to monitor and manage the checkpoints regularly to avoid running out of available disk space. 💾💾💾
- Test your work executing
inference.sh
🚀 for a "supercazzola" example response:
$ ./inference.sh # interence on LATEST checkpoint
...
Antani un po' della sbrindolona come se fosse Antani come facevo?
Ho provato con
llama_print_timings: load time = 13,65 ms
llama_print_timings: sample time = 3,99 ms / 20 runs ( 0,20 ms per token, 5013,79 tokens per second)
llama_print_timings: prompt eval time = 7,19 ms / 7 tokens ( 1,03 ms per token, 973,44 tokens per second)
llama_print_timings: eval time = 79,87 ms / 19 runs ( 4,20 ms per token, 237,88 tokens per second)
llama_print_timings: total time = 98,41 ms / 26 tokens
Log end
or
$ ./inference.sh "Mi scusi?! Ma lei a clacsonato?" # interence on LATEST checkpoint with custom prompt
...
Mi scusi?! Ma lei a clacsonato?
No, volevo dire dei tre telefoni qual è quello col prefisso
llama_print_timings: load time = 13,23 ms
llama_print_timings: sample time = 3,99 ms / 20 runs ( 0,20 ms per token, 5012,53 tokens per second)
llama_print_timings: prompt eval time = 8,52 ms / 15 tokens ( 0,57 ms per token, 1760,77 tokens per second)
llama_print_timings: eval time = 86,27 ms / 19 runs ( 4,54 ms per token, 220,24 tokens per second)
llama_print_timings: total time = 105,97 ms / 34 tokens
Log end
for a custom prompt
You can also try intermediate checkpoints to better understand how the network progressivery increase its performance.
$ ./inference.sh "Mi scusi?! Ma lei a clacsonato?" models/ggml-supercazzola-512x16-f32-100.gguf # interence on a previous checkpoint
...
Mi scusi?! Ma lei a clacsonato? fosse Ant,olaistra c siapp come’ per ancheani? cheNo t laataoco
llama_print_timings: load time = 20,78 ms
llama_print_timings: sample time = 4,10 ms / 20 runs ( 0,20 ms per token, 4880,43 tokens per second)
llama_print_timings: prompt eval time = 11,57 ms / 15 tokens ( 0,77 ms per token, 1296,46 tokens per second)
llama_print_timings: eval time = 80,77 ms / 19 runs ( 4,25 ms per token, 235,23 tokens per second)
llama_print_timings: total time = 103,60 ms / 34 tokens
Log end
- You can also infer the model with the
llama.cpp
server.
TBD
and test thest the network with this simple REST call:
TBD
This project was conceived and developed with love by [buzzobuono] for educational purposes.
This project owes its feasibility and functionality to the exiting llama.cpp
repository by ggerganov, which provides essential capabilities for quantized llama neural network training and inference in C++.
We welcome contributions and improvements from interested. Feel free to contribute to the project if you have any ideas.
This project is released under the [MIT License] license. Refer to the LICENSE file for more information.