A simple Flask-based web GUI that enables local AI (LLMs) inference using ollama for model serving. This project is currently in Alpha phase and open to any contributions. Created by @qusaismael.
- Features
- System Requirements & Recommendations
- Installation
- Usage
- Security Notice
- Troubleshooting
- Project Status
- Roadmap
- Contributing
- License
- Acknowledgments
- References
- Multiple Model Support: Easily switch between different local LLM models (e.g.,
deepseek-r1
,qwen2.5
,codellama
, etc.) - Streaming Responses: See tokens appear in real time using server-sent events (SSE)
- Markdown and Code Block Rendering: Code blocks with syntax highlighting and copy-to-clipboard
- Raw Output Toggle: Debug with raw text output visibility
- Cross-Platform: Works on Windows, Linux, and macOS
- Keyboard Shortcuts:
- Shift+Enter: New line
- Enter: Send message
-
Python 3.7+
Required for Flask compatibility -
pip/venv
For dependency management and environment isolation -
ollama
Installation required
Verify installation:ollama --version
-
Hardware:
- Minimum: 8GB RAM (for smaller models)
- Recommended: 16GB+ RAM + NVIDIA GPU (for larger models)
- Disk Space: 10GB+ for model storage
-
Clone Repository
git clone https://github.com/qusaismael/localllm.git cd localllm
-
Setup Virtual Environment
# Linux/macOS python3 -m venv venv source venv/bin/activate # Windows python -m venv venv venv\Scripts\activate
-
Install Dependencies
pip install -r requirements.txt
-
Configure Ollama
- Ensure Ollama is running:
ollama serve
- Download models first:
ollama pull deepseek-r1:14b
- Ensure Ollama is running:
-
Start Server
python app.py
Access at
http://localhost:5000
-
First-Time Setup
- Select model from available options
- If models aren't listed, ensure they're downloaded via Ollama
-
Basic Operations
- Type prompt & press Enter to send
- Toggle raw output for debugging
- Copy code blocks with one click
- Default binding:
0.0.0.0
(accessible on your network) - Not recommended for public internet exposure
- No authentication layer implemented
- Use firewall rules to restrict access if needed
Common Issues:
-
"Model not found" error
ollama pull <model-name>
-
Port conflict Modify
PORT
variable inapp.py
-
Slow responses
- Try smaller models first
- Check system resource usage
- Ensure GPU acceleration is enabled if available
-
Windows path issues Update
OLLAMA_PATH
inapp.py
to your installation path
Alpha Release
Current version: 0.1.0
Known Limitations:
- No conversation history
- Basic error handling
- Limited model configuration
- Conversation history support
- Model download UI
- Docker support
- System resource monitoring
Welcome! Please follow these steps:
- Fork repository
- Create feature branch
- Submit PR with description
Development Setup:
pip install -r requirements-dev.txt
pre-commit install
Guidelines:
- Follow PEP8 style
- Add tests for new features
- Update documentation accordingly
MIT License - See LICENSE for details
Created by @qusaismael
Open Source • Contributions Welcome!