Skip to content
forked from ananyaem/UPSC-LLM

Answering UPSC current affairs questions using RAG LLM on LangChain.

License

Notifications You must be signed in to change notification settings

Reve75/UPSC-LLM

 
 

Repository files navigation

UPSC-LLM

Answering UPSC current affairs questions using RAG LLM.

UPSC Civil Services Examination is one of the world's toughest examination. Its first stage: "Prelims" asks many questions which are related to news articles from the previous year. Millions of students prepare and appear for it.

In this project, we try to answer the prelims type questions using newspaper and conceptual articles sourced from the internet. For each question, a RAG-based implementation will first find the relevant documents and then pass them as context to an LLM to answer the queries.

First, we tried using LangChain with Cohere LLM on a static dataset which we had compiled in a Google Sheet. This was implemented on a Google Colab .ipynb notebook.

Then we use Pathway's streaming framework to process streams of data as it is scraped from the internet. This is implemented using their Docker app, and we build a docker compose layer on top of that to run our data generators.

Run

  • Get an OpenAI API Key, and set it inside the demo-question-answering/.env as OPENAI_API_KEY
  • Install Docker and Docker Compose
  • Run
docker compose up

and wait for the containers to spin up and start listening on port 8000.

  • To see statistics:
curl -X 'POST' 'http://localhost:8000/v1/statistics'
  • To find documents:
curl -X 'POST' \
  'http://0.0.0.0:8000/v1/retrieve' \
  -H 'accept: */*' \
  -H 'Content-Type: application/json' \
  -d '{
  "query": "Which areas are affected by Cyclone Remal?",
  "k": 4
}'
  • To ask questions:
curl -X 'POST' \
  'http://0.0.0.0:8000/v1/pw_ai_answer' \
  -H 'accept: */*' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt": "Which areas are affected by Cyclone Remal?"
}'

Demo

Demo

How this works

We have used the demo-question-answering from Pathway's LLM-App Example and configured it to use 2 data streams from folders. We are using OpenAI's API.

  1. Source- Coaching: These are free UPSC current-affairs materials uploaded daily on a popular coaching's websites (Vajiram and Ravi). They include some concepts which are relevant to the current happenings. This is helpful in answering slightly conceptual questions.

  2. Source- News: We scrape newspaper articles from The Hindu's ePaper and save them to a folder. The articles are fetched from their internal CDN and we get high quality HTML content for them. This gives us the factual and complete news for each event.

Both these data sources are built as microservices with Docker, and their output directories are mounted as data source folders for the Pathway Docker App.

The entire setup can be run by just docker compose up, and then using the inbuilt Pathway HTTP API for statistics and inference.

About

Answering UPSC current affairs questions using RAG LLM on LangChain.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 78.8%
  • Python 19.9%
  • Dockerfile 1.3%