Welcome to the repository for my thesis project! This work focuses on developing, deploying, and evaluating large language models on Text-to-SQL. Below, you'll find a breakdown of the folder structure and what each file and directory is used for.
-
convert-to-pdf/
convert_to_pdf.ipynb
: A handy notebook that converts.py
and.ipynb
files into PDFs.
-
deployment/
prompts.txt
: This file contains all the prompts I used during the deployment and evaluation phases.streamlit-openai-chat.py
: This script sets up a Streamlit app that lets you chat with the language model in real time, using the LiteLLM Proxy server.
-
evaluation-pipeline/
clean_and_commented_json.py
: This script evaluates various language models on a Text-to-SQL task using DSPy. It configures the models, handles dataset splitting, runs evaluations with different metrics, and logs everything to an Excel file. It also includes some optimization techniques like LabeledFewShot and BootstrapFewShotWithRandomSearch.clean_and_commented_json_openai.py
: Same as the previous script, but this one’s specifically for working with OpenAI.evaluation/
evaluation.ipynb
: A notebook that dives into analyzing thelog_evaluations
from the evaluation pipeline.figures/
: A place to store all the figures generated during evaluation.latex/
: Likely for any LaTeX files I might need for creating report-ready figures or tables.log_evaluations.xlsx
: An Excel file that tracks logs for metrics like Match, Correctness, and Execution during evaluation.model_sizes.xlsx
: Lists the sizes of the different models I used.
logs/
: A directory where all the log files from the evaluation process are kept.optimized_programs/
: Contains the optimized programs generated by DSPy.utils_evaluate.py
: A script that adapts DSPy’s evaluation process to handle multiple metrics at once.
-
litellm-docker/
config_base.yaml
: The config file for the Student Models.config_evaluator.yaml
: The config file for the Judge Model.docker-compose-eval.yml
: Docker Compose setup for running LiteLLM for the Judge Model.docker-compose_base.yml
: Docker Compose setup for running LiteLLM for the Student Models.
-
phoenix-docker/
docker-compose.yml
: Docker Compose file for deploying the Phoenix logging framework.
-
unsloth_tunes/
Alpaca_+_Llama_3_8b_full_example_sql_edit_checkpoint.ipynb
: Fine-tunes the Llama-8b-text model with thegretelai/synthetic_text_to_sql
dataset.Llama_3_8b_chat_template_Unsloth_2x_faster_finetuning_edit.ipynb
: Fine-tunes the Llama-8b-instruct model using the same dataset.Phi_3_Medium_4K_Instruct_Unsloth_2x_faster_finetuning.ipynb
: Fine-tunes the Phi-3-medium-4k-instruct model, again with thegretelai/synthetic_text_to_sql
dataset.req_conda_unsloth.txt
: A list of the Conda environment requirements for the fine-tuning process.restore_artifacts.ipynb
: A notebook to restore artifacts that were backed up by WANDB.