GitHub - mitmedialab/Health-LLM

Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data (CHIL 2024)

Large language models (LLMs) are capable of many natural language tasks, yet they are far from perfect. In health applications, grounding and interpreting domain-specific and non-linguistic data is important. This paper investigates the capacity of LLMs to make inferences about health based on contextual information (e.g. user demographics, health knowledge) and physiological data (e.g. resting heart rate, sleep minutes). We present a comprehensive evaluation of 12 publicly accessible state-of-the-art LLMs with prompting and fine-tuning techniques on four public health datasets (PMData, LifeSnaps, GLOBEM and AW_FB). Our experiments cover 10 consumer health prediction tasks in mental health, activity, metabolic, and sleep assessment. Our fine-tuned model, HealthAlpaca exhibits comparable performance to much larger models (GPT-3.5, GPT-4 and Gemini-Pro), achieving the best or second best performance in 7 out of 10 tasks. Ablation studies highlight the effectiveness of context enhancement strategies. Notably, we observe that our context enhancement can yield up to 23.8% improvement in performance. While constructing contextually rich prompts (combining user context, health knowledge and temporal information) exhibits synergistic improvement, the inclusion of health knowledge context in prompts significantly enhances overall performance.

Quick Start

Create a new virtual environment, e.g. with conda

~$ conda create -n healthllm python>=3.9

Install the required packages:

~$ pip install -r requirements.txt

Activate the environment:

~$ conda activate healthllm

Datasets

PMData: https://datasets.simula.no/pmdata/
LifeSnaps: https://github.com/Datalab-AUTH/LifeSnaps-EDA
GLOBEM: https://physionet.org/content/globem/1.1/
AW_FB: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ZS2Z2J

Fine-tune

~$ bash finetune.sh

Tip

Feel free to change the base model (--model) from medalpaca/medalpaca-{7,13}b to llama-{7,13}b-bf. If you need to change the training details, please refer to ./medalpaca/train.py

Inference

~$ python3 inference.py --model {gpt-3.5, gpt-4, gemini-pro, healthalpaca}

If our work is helpful to you, please kindly cite our paper as:

@misc{kim2024healthllm,
      title={Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data}, 
      author={Yubin Kim and Xuhai Xu and Daniel McDuff and Cynthia Breazeal and Hae Won Park},
      year={2024},
      eprint={2401.06866},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
medalpaca		medalpaca
pdf		pdf
LICENSE.md		LICENSE.md
README.md		README.md
finetune.sh		finetune.sh
framework.png		framework.png
gen_dataset.py		gen_dataset.py
inference.py		inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data (CHIL 2024)

Quick Start

Fine-tune

Inference

About

Releases

Packages

Languages

License

mitmedialab/Health-LLM

Folders and files

Latest commit

History

Repository files navigation

Health-LLM: Large Language Models for Health Prediction via Wearable Sensor Data (CHIL 2024)

Quick Start

Fine-tune

Inference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages