This directory contains the core implementation of the Custom Language Model training pipeline.
Main fine-tuning script that orchestrates the training process.
prepare_model_and_tokenizer()
: Initializes model and tokenizerload_and_process_dataset()
: Handles dataset preparationtrain_model()
: Manages the training processfine_tune()
: Main entry point for fine-tuning
Entry point of the application that coordinates:
- Model fine-tuning
- Model saving
- Hugging Face Hub upload
Utilities for Hugging Face integration:
upload.py
: Handles model upload to Hugging Face Hub
Model-related implementations:
build_model.py
: Custom transformer model architecturesave_model.py
: Model and tokenizer saving utilities
Training-related utilities:
dataset_preparation.py
: Dataset loading and processingtrainer.py
: Training configuration