SORA (Superintelligent Obsidian Research Automation)

SORA is an advanced research automation system that streamlines the collection, organization, and analysis of academic papers. It integrates with Zotero and Obsidian to create a seamless research workflow.

Features

Automated Paper Collection
- ArXiv integration for AI/ML research papers
- Zotero integration for reference management
- Automatic PDF downloads
- Deduplication and metadata extraction
Intelligent Organization
- Automated categorization and tagging
- Smart folder structure
- Citation network mapping
- Metadata extraction and organization
Obsidian Integration
- Automatic note generation
- Knowledge graph creation
- Citation management
- Research workflow automation

Installation

Clone the repository:

git clone https://github.com/Xcellect/SORA.git
cd sora

Create and activate a virtual environment using UV:

uv venv sora
source .sora/bin/activate  # On Unix/macOS

or

.sora\Scripts\activate  # On Windows

Install dependencies:

uv pip install -r requirements.txt

Configure environment variables:

cp .env.example .env

Edit .env with your Zotero credentials

Usage

Basic Commands

Collect papers from ArXiv:

python main.py --get 5 --source arxiv

Collect papers from Zotero:

python main.py --get 5 --source zotero

Organize collected papers and generate notes:

python main.py --organize

Organize papers from a specific source:

python main.py --organize-only --source zotero

Database Management

View database contents:

python main.py --view

Export database to CSV:

python main.py --export

Sync database with PDF files:

python main.py --sync

Cleanup Operations

Clear collected papers and reset database:

python main.py --flush

Clear paper metadata and Obsidian notes:

python main.py --flush-org

Additional Options

--force: Overwrite existing papers and notes
--source: Specify source (arxiv or zotero)
--organize-only: Run organization without collection
--get N: Collect N papers per category

Example Workflows

Collect and organize new papers:

python main.py --get 5 --source arxiv --organize

Update existing paper organization:

python main.py --organize-only --source zotero

Fresh start with new papers:

python main.py --flush
python main.py --get 10 --source arxiv --organize

Export database after collection:

python main.py --get 5 --source zotero
python main.py --export

Generated Content

Each organized paper includes:

PDF file in year-based directory
Detailed metadata JSON with analysis
Obsidian note with:
- Paper metadata and URL
- Research context
- Key methods
- Technical contributions
- Implementation details
- Research impact
- Document structure
- Figures and tables summary
- Auto-generated tags

Project Structure

sora/
├── config/             # Configuration settings
├── features/
│ ├── collection/       # Paper collection functionality
│ ├── organization/     # Organization and analysis
│ └── shared/           # Shared utilities and models
├── notes/                        # Automated Obsidian notes by LLMs
│ ├── Research Papers.md          # Index file
│ ├── paper-title-author-year.md  # Individual paper notes
├── notebooks           # For advanced analysis on ipynb
├── data/               # Database and exported data
├── papers/             # Papers organized by publication year
│ ├── by_year/ 
│   ├── 2024/
│   ├── 2023/
│   ├── ...
│ ├── metadata/         # JSON metadata for further analysis
│ ├── pdf/              # Original PDFs
└── tests/              # Test suite

Configuration

ArXiv Categories

Default categories (can be modified in config/settings.py):

cs.AI (Artificial Intelligence)
cs.LG (Machine Learning)
cs.NE (Neural Computing)
stat.ML (Statistics/Machine Learning)

Zotero Integration

Required environment variables:

ZOTERO_LIBRARY_ID
ZOTERO_API_KEY

Development

Running Tests

pytest tests/

Code Style

This project uses ruff for linting:

ruff check .

Dependencies

Python ≥ 3.10
Key packages:
- arxiv
- pyzotero
- sqlalchemy
- aiohttp
- spacy
- scikit-learn
- networkx

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

ArXiv API for providing access to research papers
Zotero for reference management capabilities
Obsidian for knowledge management features

Contact

Aishik S. - @xcellect

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SORA (Superintelligent Obsidian Research Automation)

Features

Installation

Usage

Basic Commands

Database Management

Cleanup Operations

Additional Options

Example Workflows

Generated Content

Project Structure

Configuration

ArXiv Categories

Zotero Integration

Development

Running Tests

Code Style

Dependencies

Contributing

License

Acknowledgments

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
config		config
data		data
features		features
notebooks		notebooks
papers		papers
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Xcellect/SORA

Folders and files

Latest commit

History

Repository files navigation

SORA (Superintelligent Obsidian Research Automation)

Features

Installation

Usage

Basic Commands

Database Management

Cleanup Operations

Additional Options

Example Workflows

Generated Content

Project Structure

Configuration

ArXiv Categories

Zotero Integration

Development

Running Tests

Code Style

Dependencies

Contributing

License

Acknowledgments

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages