This project focuses on generating question-answer pairs from text data using a language model (TheBloke/Mistral-7B-Instruct-v0.1-GGUF). It involves merging data from multiple CSV files, preprocessing the text, and leveraging an LLM to automatically create relevant question-answer pairs.
Wrote scripts to merge data from news articles and reviews stored in CSV files into a single dataset.
Implemented a pipeline to generate question-answer pairs from the merged text data using the Mistral-7B large language model.
Documented installation, usage, and project structure for user clarity.
clone the repository: git clone https://github.com/Iqra9999/Question-Answer-Generation-Pipeline-with-LLM
cd qa-generation
install the necessary packages: pip install -r requirements.txt
You can easily run the script with your own CSV files containing text data. Ensure your CSV files follow a similar format to the provided examples.
After merging your CSV files, you can generate question-answer pairs using the Mistral 7-B language model.
If your CSV file contains multiple columns and you want to generate question-answer pairs from specific columns, you can preprocess your data accordingly.
The project is structured as follows:
This file contains the script for merging multiple CSV files, preprocessing them, and generating question-answer pairs from text data.
Lists the required Python packages for easy installation.
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.