TITAN-RL

English | 中文

This project is inspired by the groundbreaking work of Kimi k1.5: Scaling Reinforcement Learning with LLMs.

TITAN-RL is a distributed reinforcement learning framework that separates policy rollout, experience storage, and training into independent microservices. This design enables flexible scaling and efficient resource utilization.

Architecture

graph TD
    A[Training Service] -->|Policy Updates| B[Rollout Service]
    B -->|Experiences| C[Buffer Service]
    C -->|Sample Batch| A
    B -->|Interact| D[Environment]
    D -->|State/Reward| B
    A -->|Metrics| E[Weights & Biases]
    
    style A fill:#f96,stroke:#333,stroke-width:2px
    style B fill:#9cf,stroke:#333,stroke-width:2px
    style C fill:#9f9,stroke:#333,stroke-width:2px
    style D fill:#fc9,stroke:#333,stroke-width:2px
    style E fill:#c9f,stroke:#333,stroke-width:2px

The system consists of three main components:

Rollout Service: Handles environment interactions and experience collection
- Multiple workers can run in parallel
- Implements epsilon-greedy exploration
- Computes trajectory-level statistics
Buffer Service: Manages experience replay storage
- Thread-safe experience storage
- Efficient batch sampling
- Configurable buffer size
Training Service: Coordinates the training process
- Implements DQN algorithm
- Manages policy updates
- Tracks training metrics via wandb

The components communicate via gRPC, allowing for distributed deployment and easy scaling.

Features

Distributed architecture with gRPC communication
Modular design for easy algorithm implementation
Built-in support for Weights & Biases logging
Configurable via YAML files
Visualization tools for trained agents
Multi-threaded experience collection and training

Installation

# Clone the repository
git clone https://github.com/yourusername/titan-rl.git
cd titan-rl

# Install dependencies
pip install -r requirements.txt

# Generate protobuf code
bash generate_proto.sh

Usage

Start the services:

bash run.sh

This will:

Launch the rollout server
Start the buffer server
Begin the main training loop

Monitor training:

Visit your Weights & Biases dashboard to view training metrics
Check terminal output for episode rewards and training status

Visualize trained agent:

python visualize.py --model_path checkpoint.pth

Configuration

The system is configured via YAML files in the config directory. Example configuration:

rollout_server:
  num_workers: 1
  env_name: "CartPole-v1"
  epsilon: 0.1

policy:
  path: "models.simple_policy.SimplePolicy"
  kwargs:
    state_dim: 4
    action_dim: 2

train:
  lr: 0.001
  gamma: 0.99
  batch_size: 64
  min_buffer_size: 1000

Extending the Framework

Adding New Algorithms:
- Create a new trainer class in trainers/
- Implement the required interface methods
- Update configuration to use the new trainer
Custom Environments:
- TITAN-RL supports any Gymnasium-compatible environment
- Update the environment name in configuration
Custom Policies:
- Add new policy classes in models/
- Implement the required PyTorch interface

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with PyTorch and Gymnasium
Monitoring with Weights & Biases
Distributed communication via gRPC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TITAN-RL

English | 中文

Architecture

Features

Installation

Usage

Configuration

Extending the Framework

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
config		config
models		models
proto		proto
services		services
trainers		trainers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_ZH.md		README_ZH.md
generate_proto.sh		generate_proto.sh
main.py		main.py
run.sh		run.sh
visualize.py		visualize.py

License

s-JoL/TITAN-RL

Folders and files

Latest commit

History

Repository files navigation

TITAN-RL

English | 中文

Architecture

Features

Installation

Usage

Configuration

Extending the Framework

Contributing

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages