Answering UPSC current affairs questions using RAG LLM.
UPSC Civil Services Examination is one of the world's toughest examination. Its first stage: "Prelims" asks many questions which are related to news articles from the previous year. Millions of students prepare and appear for it.
In this project, we try to answer the prelims type questions using newspaper and conceptual articles sourced from the internet. For each question, a RAG-based implementation will first find the relevant documents and then pass them as context to an LLM to answer the queries.
First, we tried using LangChain with Cohere LLM on a static dataset which we had compiled in a Google Sheet. This was implemented on a Google Colab .ipynb
notebook.
Then we use Pathway's streaming framework to process streams of data as it is scraped from the internet. This is implemented using their Docker app, and we build a docker compose layer on top of that to run our data generators.
- Get an OpenAI API Key, and set it inside the
demo-question-answering/.env
as OPENAI_API_KEY - Install Docker and Docker Compose
- Run
docker compose up
and wait for the containers to spin up and start listening on port 8000.
- To see statistics:
curl -X 'POST' 'http://localhost:8000/v1/statistics'
- To find documents:
curl -X 'POST' \
'http://0.0.0.0:8000/v1/retrieve' \
-H 'accept: */*' \
-H 'Content-Type: application/json' \
-d '{
"query": "Which areas are affected by Cyclone Remal?",
"k": 4
}'
- To ask questions:
curl -X 'POST' \
'http://0.0.0.0:8000/v1/pw_ai_answer' \
-H 'accept: */*' \
-H 'Content-Type: application/json' \
-d '{
"prompt": "Which areas are affected by Cyclone Remal?"
}'
We have used the demo-question-answering
from Pathway's LLM-App Example and configured it to use 2 data streams from folders. We are using OpenAI's API.
-
Source- Coaching: These are free UPSC current-affairs materials uploaded daily on a popular coaching's websites (Vajiram and Ravi). They include some concepts which are relevant to the current happenings. This is helpful in answering slightly conceptual questions.
-
Source- News: We scrape newspaper articles from The Hindu's ePaper and save them to a folder. The articles are fetched from their internal CDN and we get high quality HTML content for them. This gives us the factual and complete news for each event.
Both these data sources are built as microservices with Docker, and their output directories are mounted as data source folders for the Pathway Docker App.
The entire setup can be run by just docker compose up
, and then using the inbuilt Pathway HTTP API for statistics and inference.