"Ask the Paper" is an intuitive platform designed for organizations, teams, and researchers to analyze and query their research materials. The project transforms interaction with text documents by enabling users to ask direct questions and receive insightful answers based on the content of those documents.
- Features
- Project Roles
- Functionality
- User Interface
- Technical Architecture
- Technologies Used
- Installation
- Usage
- Contributing
- License
- Document upload and processing for indexed text chunking.
- Semantic query handling utilizing advanced machine learning models.
- Conversational user experience for easy interaction with research materials.
The system comprises two primary roles:
The Admin is responsible for managing the research materials uploaded to the platform with the following functions:
- Uploading Documents: Upload research papers or text materials in PDF format.
- Data Processing: Process uploaded PDFs to extract text and divide it into manageable chunks.
- Storage Management: Store extracted text in FAISS and PKL formats for efficient retrieval and store files in Amazon S3.
Users interact with the system through a user-friendly interface:
- File Access: Check the S3 bucket for FAISS and PKL file availability.
- Questioning Interface: Ask questions related to the content of the research papers.
- Conversational Experience: Engage in a natural conversation to retrieve information.
- Document Upload and Processing: Upload research papers for conversion into indexed text chunks.
- Query Handling and Retrieval: Use Amazon Titan text embedding model for understanding and analyzing queries, utilizing FAISS for quick retrieval.
- Interactive User Interface: A web interface built using Streamlit for document upload and real-time responses.
The user interface is designed with simplicity and functionality:
- Upload Section: An area for admins to upload PDF files.
- Query Input Field: Users can type their questions for easy interaction.
- Response Display: Clear presentation of answers for quick understanding.
The architecture leverages various AWS tools and modern machine learning techniques for efficient document processing, storage, and retrieval. Key components include data flow, service integration, and overall system design.
- Amazon S3: For storing uploaded documents and processed files.
- Amazon Bedrock: Access to pre-trained language models for text processing.
- FAISS: Efficient similarity search among embedded text chunks.
- Langchain: Document handling and text chunking.
- Boto3: AWS SDK for Python for interactions with AWS services.
To set up the project, follow these steps:
-
Clone the repository:
git clone https://github.com/nimish-nimishmittal/ask-the-paper.git cd ask-the-paper
-
Install dependencies:
pip install -r requirements.txt
-
Configure your AWS credentials: Ensure you have your AWS credentials configured. You can do this using the AWS CLI:
aws configure
-
Start the application:
streamlit run app.py
-
Access the web interface through your browser (usually at
http://localhost:8501
). -
Admins can upload documents, and users can ask questions through the interface.
Contributions are welcome! If you have suggestions for improvements or want to report a bug, please open an issue or submit a pull request.
This project is licensed under the MIT License. See the LICENSE file for more details.
For further information, please refer to the documentation or contact the project maintainers.
Feel free to adjust any sections to better fit your project's specifics, such as the repository link or any additional instructions you might have!