SORA is an advanced research automation system that streamlines the collection, organization, and analysis of academic papers. It integrates with Zotero and Obsidian to create a seamless research workflow.
-
Automated Paper Collection
- ArXiv integration for AI/ML research papers
- Zotero integration for reference management
- Automatic PDF downloads
- Deduplication and metadata extraction
-
Intelligent Organization
- Automated categorization and tagging
- Smart folder structure
- Citation network mapping
- Metadata extraction and organization
-
Obsidian Integration
- Automatic note generation
- Knowledge graph creation
- Citation management
- Research workflow automation
- Clone the repository:
git clone https://github.com/Xcellect/SORA.git
cd sora
- Create and activate a virtual environment using UV:
uv venv sora
source .sora/bin/activate # On Unix/macOS
or
.sora\Scripts\activate # On Windows
- Install dependencies:
uv pip install -r requirements.txt
- Configure environment variables:
cp .env.example .env
- Edit .env with your Zotero credentials
- Collect papers from ArXiv:
python main.py --get 5 --source arxiv
- Collect papers from Zotero:
python main.py --get 5 --source zotero
- Organize collected papers and generate notes:
python main.py --organize
- Organize papers from a specific source:
python main.py --organize-only --source zotero
- View database contents:
python main.py --view
- Export database to CSV:
python main.py --export
- Sync database with PDF files:
python main.py --sync
- Clear collected papers and reset database:
python main.py --flush
- Clear paper metadata and Obsidian notes:
python main.py --flush-org
--force
: Overwrite existing papers and notes--source
: Specify source (arxiv or zotero)--organize-only
: Run organization without collection--get N
: Collect N papers per category
- Collect and organize new papers:
python main.py --get 5 --source arxiv --organize
- Update existing paper organization:
python main.py --organize-only --source zotero
- Fresh start with new papers:
python main.py --flush
python main.py --get 10 --source arxiv --organize
- Export database after collection:
python main.py --get 5 --source zotero
python main.py --export
Each organized paper includes:
- PDF file in year-based directory
- Detailed metadata JSON with analysis
- Obsidian note with:
- Paper metadata and URL
- Research context
- Key methods
- Technical contributions
- Implementation details
- Research impact
- Document structure
- Figures and tables summary
- Auto-generated tags
sora/
├── config/ # Configuration settings
├── features/
│ ├── collection/ # Paper collection functionality
│ ├── organization/ # Organization and analysis
│ └── shared/ # Shared utilities and models
├── notes/ # Automated Obsidian notes by LLMs
│ ├── Research Papers.md # Index file
│ ├── paper-title-author-year.md # Individual paper notes
├── notebooks # For advanced analysis on ipynb
├── data/ # Database and exported data
├── papers/ # Papers organized by publication year
│ ├── by_year/
│ ├── 2024/
│ ├── 2023/
│ ├── ...
│ ├── metadata/ # JSON metadata for further analysis
│ ├── pdf/ # Original PDFs
└── tests/ # Test suite
Default categories (can be modified in config/settings.py
):
- cs.AI (Artificial Intelligence)
- cs.LG (Machine Learning)
- cs.NE (Neural Computing)
- stat.ML (Statistics/Machine Learning)
Required environment variables:
ZOTERO_LIBRARY_ID
ZOTERO_API_KEY
pytest tests/
This project uses ruff
for linting:
ruff check .
- Python ≥ 3.10
- Key packages:
- arxiv
- pyzotero
- sqlalchemy
- aiohttp
- spacy
- scikit-learn
- networkx
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- ArXiv API for providing access to research papers
- Zotero for reference management capabilities
- Obsidian for knowledge management features
Aishik S. - @xcellect