Dojo Subnet

Website · Docs · Whitepaper · HuggingFace · Getting Started · Twitter

Table of Contents

Introduction
- Benefits to participants contributing through Dojo
Prerequisites
- Validator
  - Required Software
  - System Requirements
- Miner
  - Required Software
  - System Requirements
Getting Started
- For Miners
- For Validators
Auto-updater
For Dojo developers
- Dataset Extraction
License

Introduction

The development of open-source AI models is often hindered by the lack of high-quality human-generated datasets. Closed-source AI developers, aiming to reduce data collection costs, have created significant social and economic equity challenges, with workers being paid less than $2 per hour for mentally and emotionally taxing tasks. The benefits of these models have been concentrated among a select few, exacerbating inequalities among contributors.

Enter Tensorplex Dojo Subnet — an open platform designed to crowdsource high-quality human-generated datasets. Powered by Bittensor, the Dojo Subnet addresses these challenges by allowing anyone to earn TAO by labeling data or providing human-preference data. This approach democratizes the collection of human preference data, addressing existing equity issues and paving the way for more inclusive and ethical AI development.

Key Features

To ensure the quality and integrity of the data collected, Dojo introduces several novel features:

Synthetic Task Generation: Unique tasks are generated by state-of-the-art Large Language Models (LLMs) to collect human feedback data, which can be used to improve open-source models.
Synthetic Ground Truth Validation Mechanism: Validators can synthetically generate partial ground truths, allowing them to determine the quality of responses provided by individual participants.
Obfuscation: Techniques to prevent sybil attacks and ensure contributions are genuinely human.

Use Cases

The Dojo Subnet offers multiple use cases:

Synthetically Generated Tasks: These tasks can bootstrap the human participant pool and can be used for model training or fine-tuning from the outset.
Cross-subnet Validation: Validators can use responses to rate the quality of outputs across other Bittensor subnets, thereby incentivizing miners to improve their performance.
External Data Acquisition: Entities outside the Bittensor ecosystem can tap into the subnet to acquire high-quality human-generated data.

By creating an open platform for gathering human-generated datasets, Tensorplex Dojo Subnet aims to solve the challenges of quality control, human verification, and sybil attack prevention while promoting a more equitable distribution of benefits in AI development.

Benefits to participants contributing through Dojo

Open platform: Anyone capable can contribute, ensuring broad participation and diverse data collection.
Flexible work environment: Participants enjoy the freedom to work on tasks at their convenience from any location.
Quick payment: Rewards are streamed consistently to participants, as long as they complete sufficient tasks within a stipulated deadline and have them accepted by the subnet.

Prerequisites

Validator

Required Software

pm2
docker
GNU make
openrouter api key

System Requirements

4 cores
16 GB RAM
2TB SSD

Miner

Required Software

pm2
docker
GNU make

System Requirements

2 cores
8 GB RAM
32GB SSD or 1TB SSD if decentralised

Getting Started

Important

This setup guide uses specific tools to ensure a smooth installation process:

fnm for managing Node.js & npm versions (required for PM2)
Docker and Docker Compose
Conda for using the auto-updater for validators or miners, this is recommended but you may use any python environment provider of choice.

Please ensure these prerequisites are installed on your system before proceeding with the installation steps, these are needed by both validators and miners.

Clone the project, set up and configure python virtual environment

# In this guide, we will utilize the ~/opt directory as our preferred location.
cd ~/opt

# Clone the project
git clone https://github.com/tensorplex-labs/dojo.git
cd dojo/

# setup conda env using miniconda, and follow the setup
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
# verify conda installation
conda info
# create python env and install dependencies
conda create -n dojo_py311 python=3.11
conda activate dojo_py311
make install

Install PM2, one way is through fnm

# for linux, a convenience script is available
./dojo/scripts/setup/install_pm2.sh

# for mac/linux (if you do not trust the bash script)
curl -fsSL https://fnm.vercel.app/install | bash
# for windows, choose 1 of the following,
# based on https://github.com/Schniz/fnm?#manually
cargo install fnm
choco install fnm
scoop install fnm
winget install Schniz.fnm

# run any post-install shell setup scripts
# based on https://github.com/Schniz/fnm?#shell-setup

# assuming we are using zsh
echo 'eval "$(fnm env --use-on-cd --shell zsh)"' >> ~/.zshrc
# you can tell what shell you're using by running:
echo $0

# verify fnm installation
fnm --version

# get npm & node, and verify npm installation
fnm install lst/iron && npm --version

# install pm2 and verify installation
npm install -g pm2 && pm2 --version

Install Docker & Docker Compose

For Docker installation, see https://docs.docker.com/engine/install/ for instructions

For Docker Compose installation, see https://docs.docker.com/compose/install/linux for instructions

# For linux, a convenience script is available
./dojo/scripts/setup/install_docker.sh

# Verify both docker and docker compose are installed
docker --version
docker compose version

# Validators, install docker loki plugin
docker plugin install grafana/loki-docker-driver:3.3.2-amd64 --alias loki --grant-all-permissions

Start local subtensor node (optional)

The included subtensor service only expose 30333 (p2p) to the public, 9933 and 9944 are only accesssible internally in the docker network, feel free to change the configuration if required.

# Mainnet
make subtensor-mainnet

# Testnet
make subtensor-testnet

Create your wallets if they aren't created yet

# run btcli
make btcli
# create your wallets
btcli wallet new_coldkey
btcli wallet new_hotkey

Get some TAO and ensure you have enough TAO to cover the registration cost

# for Testnet
# If using local subtensor, use ws://mainnet-lite:9944 (mainnet) or ws://testnet-lite:9944 (testnet)
btcli s list --subtensor.network test

# Output from the `btcli s list ...` command
NETUID    N    MAX_N   EMISSION  TEMPO  RECYCLE        POW       SUDO
 0      128   128.00   0.00%     10    τ1.00000     10.00 M     5C4hrfjw9DjXZTzV3MwzrrAr9P1MJhSrvWGWqi1eSuyUpnhM
...
 98     17    256.00   0.00%     360   τ0.00001  18446744.07 T  5GTAfh3YTcokxWdGc3DgLV5y3hHB4Bs5PQGqw9fEn1WrcwWP
...

Note

the "RECYCLE" column represents the subnet registration cost

Register to our subnet

# run the dockerized btcli
make btcli
# register your wallet to our subnet
# If using local subtensor, use ws://mainnet-lite:9944 (mainnet) or ws://testnet-lite:9944 (testnet)

# Mainnet
btcli s register --wallet.name coldkey --wallet.hotkey hotkey --netuid 52 --subtensor.network finney
# Testnet
btcli s register --wallet.name coldkey --wallet.hotkey hotkey --netuid 98 --subtensor.network test

For Miners

For Validators

Auto-updater

Warning

Please ensure that you stop the pm2 process while you are modifying the validator/miner code to avoid any unexpected code reverts, as the auto updater will stash your changes before pulling from the remote origin.

To start with the auto update for validators or miners, (strongly recommended):

Please ensure that you run the command in the python environment, if you haven't configured the python environment yet see Step 1 of Getting Started.

# activate python env
conda activate dojo_py311

# validator
pm2 start auto_update.py --name auto-update-validator --interpreter $(which python3) -- ---service validator

# miner
pm2 start auto_update.py --name auto-update-miner-centralised --interpreter $(which python3) -- --service miner

For Dojo developers

You most likely won't be running a dockerized version of the subnet code as you ship. Use the following guide to get up and running

Get uv or miniconda or whatever choice of backend. Here, we'll assume you're using uv.

curl -LsSf https://astral.sh/uv/install.sh | sh

Make sure you have a python version >=3.10

uv python list

Create a virtualenv

# i'm using 3.11 here, but you may use any >=3.10 version
uv venv dojo_venv --python=$(uv python find 3.11)

Activate virtualenv

# follows python-venv syntax
source dojo_venv/bin/activate

Install our dependencies

# install dev dependencies
make install-dev
# install test dependencies
make install-test

Dataset Extraction

The dataset should be in different parts, currently MAX_CHUNK_SIZE_MB is set to 50MB on the dataset service, due to limitations on the load balancer. Use the commands to combine all into a single dataset file:

aws s3 cp s3://amzn-s3-demo-bucket1/ <PATH_ON_LOCAL> --recursive --exclude "*" --include "hotkey_<vali_hotkey>_dataset_20250212*.jsonl"
cd <PATH_ON_LOCAL>
# to merge all chunks into a single dataset file
cat *.jsonl > hotkey_<vali_hotkey>_dataset_combined.jsonl

License

This repository is licensed under the MIT License.

# The MIT License (MIT)
# Copyright © 2023 Yuma Rao

# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software,
# and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

# The above copyright notice and this permission notice shall be included in all copies or substantial portions of
# the Software.

# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
# THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.

Name	Name	Last commit message	Last commit date
Latest commit semantic-release-bot chore(release): 1.9.0 [skip ci] Mar 20, 2025 d93fbcf · Mar 20, 2025 History 972 Commits
.circleci	.circleci	Merge branch 'main' into test/use-nox	Jul 26, 2024
.githooks	.githooks	chore: add euo pipefail	Jun 22, 2024
.github/workflows	.github/workflows	refactor: schema redesign and migration script (#77 )	Jan 9, 2025
assets	assets	chore: add logo	Aug 13, 2024
commons	commons	feat: analytics API (#114 ) (#162 )	Mar 20, 2025
contrib	contrib	Initial commit	Jan 3, 2024
database	database	fix: fix async issues and update score column correctly (#143 )	Mar 12, 2025
docker	docker	chore: remove dataset image (#126 )	Feb 14, 2025
dojo	dojo	feat: analytics API (#114 ) (#162 )	Mar 20, 2025
entrypoints	entrypoints	feat: analytics API (#114 ) (#162 )	Mar 20, 2025
migrations	migrations	perf: optimized migration script on miner-response (#116 )	Jan 23, 2025
neurons	neurons	feat: analytics API (#114 ) (#162 )	Mar 20, 2025
scripts	scripts	fix: fix async issues and update score column correctly (#143 )	Mar 12, 2025
simulator	simulator	refactor: schema redesign and migration script (#77 )	Jan 9, 2025
tests	tests	test: fixed the error in unittest (#118 )	Feb 12, 2025
.env.example	.env.example	feat: analytics API (#114 ) (#162 )	Mar 20, 2025
.gitignore	.gitignore	feat: analytics API (#114 ) (#162 )	Mar 20, 2025
.gitmodules	.gitmodules	build: remove all git submodules	Aug 19, 2024
.pre-commit-config.yaml	.pre-commit-config.yaml	chore: update precommit	Oct 11, 2024
CHANGELOG.md	CHANGELOG.md	chore(release): 1.9.0 [skip ci]	Mar 20, 2025
LICENSE	LICENSE	Initial commit	Jan 3, 2024
Makefile	Makefile	fix: fix async issues and update score column correctly (#143 )	Mar 12, 2025
README-miner.md	README-miner.md	fix: hotfix for miner's logs (#149 )	Mar 5, 2025
README-validator.md	README-validator.md	feat: docker revamp (#119 )	Feb 17, 2025
README.md	README.md	fix: use effective stake (root + alpha), set 5000 for effective stake (…	Feb 25, 2025
auto_update.py	auto_update.py	feat: docker revamp (#119 )	Feb 17, 2025
docker-compose.miner.yaml	docker-compose.miner.yaml	fix: use effective stake (root + alpha), set 5000 for effective stake (…	Feb 25, 2025
docker-compose.platform.yaml	docker-compose.platform.yaml	feat: docker revamp (#119 )	Feb 17, 2025
docker-compose.shared.yaml	docker-compose.shared.yaml	feat: docker revamp (#119 )	Feb 17, 2025
docker-compose.subtensor.yaml	docker-compose.subtensor.yaml	Added subtensor log retention	Oct 26, 2024
docker-compose.validator.yaml	docker-compose.validator.yaml	chore: update volumes for extract dataset service (#156 )	Mar 12, 2025
dojo_cli.py	dojo_cli.py	feat: enable commit reveal V3 (#100 )	Dec 20, 2024
e2e_setup.py	e2e_setup.py	chore: update asyncio task intervals, update docs	Oct 12, 2024
entrypoints.sh	entrypoints.sh	feat: analytics API (#114 ) (#162 )	Mar 20, 2025
main_miner.py	main_miner.py	refactor: async miner (#145 )	Mar 3, 2025
main_validator.py	main_validator.py	feat: shutdown validator when syn-API is down (#132 )	Mar 10, 2025
migration.py	migration.py	perf: optimized migration script on miner-response (#116 )	Jan 23, 2025
migration2.py	migration2.py	feat: analytics API (#114 ) (#162 )	Mar 20, 2025
noxfile.py	noxfile.py	test: fixed the error in unittest (#118 )	Feb 12, 2025
pyproject.toml	pyproject.toml	feat: analytics API (#114 ) (#162 )	Mar 20, 2025
release.config.js	release.config.js	feat!: semantic release	Oct 9, 2024
run.sh	run.sh	chore: add auto update script, update readme, update gitignore	Mar 4, 2024
schema.prisma	schema.prisma	perf: optimized migration script on miner-response (#116 )	Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dojo Subnet

Introduction

Benefits to participants contributing through Dojo

Prerequisites

Validator

Required Software

System Requirements

Miner

Required Software

System Requirements

Getting Started

For Miners

For Validators

Auto-updater

For Dojo developers

Dataset Extraction

License

About

Releases 33

Packages 1

Contributors 8

Languages

License

tensorplex-labs/dojo

Folders and files

Latest commit

History

Repository files navigation

Dojo Subnet

Introduction

Benefits to participants contributing through Dojo

Prerequisites

Validator

Required Software

System Requirements

Miner

Required Software

System Requirements

Getting Started

For Miners

For Validators

Auto-updater

For Dojo developers

Dataset Extraction

License

About

Resources

License

Stars

Watchers

Forks

Releases 33

Packages 1

Contributors 8

Languages