Skip to content

tevfikcagridural/RAG_base

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Retrieval-Augmented Generation (RAG) File Structure Blueprint

Inspired from cookie-cutter-data-science.

This work suggests an overview of how to structure a directory for a Retrieval-Augmented Generation (RAG) system. The proposed structure is designed to organize the various components and workflows typically involved in experiments and development of the system.

Project Structure

rag-app/
├── data/                           # Only some samples. **It is recommended to separate data stores/databases from codebase**
│   ├── raw/                        # Raw, unprocessed data files
│   ├── evaluation                  # Evaluation data files
│   ├── processed/                  # Data files after processing
│   └── external/                   # External data sources
├── models/                         # For using local models 
│   ├── embeddings/                 # Pretrained/fine-tuned embedding model(s)
│   └── llm/                        # Pretrained/fine-tuned language model(s)
├── prompts/            
│   ├── prompt1.txt                 # Sample prompt file
│   ├── prompt2.txt                 # Another sample prompt file
├── reports/
│   ├── development_report.md       # Reports on experiments and/or analyses
│   └── figures/                    # Figures for reports
├── src/                            # Main source code of the application.
│   ├── data/
│   │   └── data_preparation.py     # Data preparation scripts
│   ├── models/
│   │   ├── embedding_models.py     # Embedding model(s) handling
│   │   └── llm_models.py           # Language model(s) handling
│   ├── services/
│   │   └── inference_service.py    # Data loading, indexing, and storing embeddings to vector database.
│   │   └── ingestion_service.py    # Retrieval and generation.
│   │   └── monitoring_service.py   # Monitoring service of the RAG system
│   └── app/                        # Main application entry point
│       └── main.py                 
├── tests/
│   ├── __init__.py
│   ├── test_data_preparation.py    # Unit tests for data preparation
│   ├── test_embedding_model.py     # Unit tests for embedding model
│   ├── test_llm_model.py           # Unit tests for language model
│   └── test_rag_service.py         # Unit tests for RAG services
├── notebooks/
│   └── data_exploration.ipynb      # Jupyter notebook for data exploration and analysis
│   ├── experiments.ipynb           # Jupyter notebook for experiments on different strategies
├── docker/                         # For containerized deployment 
│   ├── Dockerfile                  # Dockerfile for building the image
│   └── docker-compose.yml          # Docker Compose file for multi-container setups
├── .env                            # Environment variables
├── README.md                       # Project README file
├── requirements.txt                # Python dependencies

About

RAG System File Structure Blueprint

Topics

Resources

License

Stars

Watchers

Forks

Languages