Code for Attention-Gate.
The current version is an early-stage implementation, and further improvements will be made in future updates.
The training script replaces modeling_llama_with_ag.py in the transformers library’s modeling_llama.py.
Continual pre-training uses the dataset redpajama_samples.jsonl.