Website · Twitter · HuggingFace . FAQ
·
This repository hosts the source code for the Healthi subnet, which operates atop the Bittensor network. The primary goal of this subnet is to utilize AI models for predictive diagnostics based on electronic health records (EHRs).
In the rapidly advancing field of healthcare technology, AI integration is transforming preventive medicine, particularly through predictive diagnostics. The increasing availability of patient data, especially EHRs, presents a significant opportunity to leverage AI for predicting health outcomes. This subnet on the Bittensor network rewards miners based on their AI models' performance in clinical prediction tasks, such as disease forecasting using EHRs. Our network aims to utilize these high-performing AI models developed by miners to improve patient outcomes, enhance healthcare delivery, and promote personalized clinical risk management.
This repository requires Python 3.10 or higher and Ubuntu 22.04/Debian 12.
We recommend creating a Python virtual environment (venv); using conda may cause errors.
Installation (omit the first line if Bittensor is already installed):
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/opentensor/bittensor/master/scripts/install.sh)"
$ sudo apt update && sudo apt install jq npm python3.10-dev python3.10-venv git && sudo npm install pm2 -g && pm2 update
$ git clone https://github.com/Healthi-Labs/healthi-subnet.git
$ cd healthi-subnet
$ python3 -m venv .venv
If you are not familiar with Bittensor, you should first perform the following activities:
Mainnet
btcli subnet register --netuid 34 --wallet.name {cold_wallet_name} --wallet.hotkey {hot_wallet_name}
Testnet:
btcli subnet register --netuid 133 --wallet.name {cold_wallet_name} --wallet.hotkey {hot_wallet_name} --subtensor.network test
Note
Validators need to establish an internet connection with the miner. This requires ensuring that the port specified in --axon.port is reachable on the virtual machine via the internet. This involves either opening the port on the firewall or configuring port forwarding.
If you want to run on testnet, set netuid to 133, and add --validator_min_stake 0.0 --subtensor.network test.
Run miner:
$ cd healthi-subnet
$ source .venv/bin/activate
$ bash scripts/run.sh \
--name healthi_miner \
--install_only 0 \
--max_memory_restart 10G \
--branch main \
--netuid 34 \
--profile miner \
--wallet.name {cold_wallet_name} \
--wallet.hotkey {hot_wallet_name} \
--axon.port 12345
Run validator on testnet (validator updates automatically):
$ cd healthi-subnet
$ source .venv/bin/activate
$ bash scripts/run.sh \
--name healthi_validator \
--install_only 0 \
--max_memory_restart 5G \
--netuid 34 \
--profile validator \
--wallet.name {cold_wallet_name} \
--wallet.hotkey {hot_wallet_name}
To verify whether your miner is effectively responding to queries from validators, run the following commands (If you've recently started the miner, allow a few minutes):
cat ~/.pm2/logs/healthi-miner-out.log | grep "SUCCESS"
How does rewarding work?
Miners are rewarded based on the accuracy of their predictions for future health conditions derived from analyses of electronic health record (EHR) sequences. The top 20% of miners receive significantly higher rewards than the rest.
What is the expected data input and output as a miner?
As a miner, your input will consist of sequences of Electronic Health Records (EHR) encoded with International Statistical Classification of Diseases and Related Health Problems (ICD-10) codes. In the following example, the patient visited the hospital twice, receiving two diagnoses each time:
Example Input:
[['D693', 'I10'], ['Z966', 'A047']]The current disease prediction task involves estimating the likelihood of getting the following 14 diseases within one year. Outputs should be an array or list of probabilities in the order listed below:
- Hypertension
- Diabetes
- Asthma
- Chronic Obstructive Pulmonary Disease
- Atrial Fibrillation
- Coronary Heart Disease
- Stroke
- Anxiety and Depression
- Dementia
- Myocardial Infarction
- Chronic Kidney Disease
- Thyroid Disorder
- Heart Failure
- Cancer
[0.0027342219837009907, 0.012263162061572075, 0.01795087940990925, 0.016055596992373466, 0.010267915204167366, 0.0002267731324536726, 0.02317667566239834, 0.39082783460617065, 0.017462262883782387, 0.033581722527742386, 0.014757075347006321, 0.03425902500748634, 0.015123098157346249, 0.028889883309602737]
Compute Requirements
The computational requirements for participating as a miner or validator in our subnet are minimal. Our subnet does not necessitate GPU capabilities and runs effectively on a virtual private server (VPS) with 4 virtual CPUs and 16 GB RAM. Although miners are permitted to use GPU resources, achieving higher rewards within our subnet depends more on developing superior predictive models than on computational power.
Data source and how do we prevent data exploitation?
Our data originates from authentic inpatient records, which are anonymized using Generative Adversarial Networks (GANs) to preserve the original data distributions while ensuring patient confidentiality. To prevent data exploitation and enhance security, our API continuously generates unique, synthetic electronic health record sequences for validators, protecting against replay attacks.
How can I train my model?
Our base model is a small Transformer model equipped with a customized tokenizer for electronic health record (EHR) data. We recommend miners train their model based on our tokenizer. Training data is available at this link. Miners are also encouraged to use their own sourced EHR data for training.
Validator minimum staking requirements
Validators need to stake at least 10,000 Tao on the mainnet to query our data API, and Testnet validators won't have access to the data API, but they can still acquire data locally for testing purposes. Locally obtained data carries significantly less weight than data from the API.
How the synthetic EHR data generator is built?
The synthetic EHR data generator is built following a method detailed in a publication, which can be found here. We utilized datasets from both the US and the UK. The US hospital data is sourced from MIMIC-IV. The UK data is sourced from THIN. Due to the sensitivity of health data, models trained on EHR data cannot be made publicly available. This restriction is in place to prevent the potential reconstruction of the original dataset by malicious parties, similar to attempts to reverse engineer the training sets of large language models. As an example, you can find our data generator trained on synthetic EHR data here.