Skip to content

phenixace/TOMG-Bench

Repository files navigation

TOMG-Bench

arxiv.org Hugging Face Datasets PaperWithCode

Text-based Open Molecule Generation Benchmark

Authors: Jiatong Li*, Junxian Li*, Yunqing Liu, Dongzhan Zhou, and Qing Li (* Equal Contribution)

Introduction

In this paper, we propose Text-based Open Molecule Generation Benchmark (TOMG-Bench), the first benchmark to evaluate the open-domain molecule generation capability of LLMs. TOMG-Bench encompasses a dataset of three major tasks: molecule editing (MolEdit), molecule optimization (MolOpt), and customized molecule generation (MolCustom). Each task further contains three subtasks, with each subtask comprising 5,000 test samples. Given the inherent complexity of open molecule generation, we have also developed an automated evaluation system that helps measure both the quality and the accuracy of the generated molecules. Our comprehensive benchmarking of 25 LLMs reveals the current limitations and potential areas for improvement in text-guided molecule discovery. Furthermore, with the assistance of OpenMolIns, a specialized instruction tuning dataset proposed for solving challenges raised by TOMG-Bench, Llama3.1-8B could outperform all the open-source general LLMs, even surpassing GPT-3.5-turbo by 46.5% on TOMG-Bench.

Leaderboard

Model #Parameters A̅cc (%) wA̅cc (%)
Claude-3.5 (Anthropic, 2024b) - 51.10 35.92
Gemini-1.5-pro (Deepmind, 2024) - 52.25 34.80
GPT-4-turbo (Achiam et al., 2023) - 50.74 34.23
GPT-4o (Achiam et al., 2023) - 49.08 32.29
Claude-3 (Anthropic, 2024a) - 46.14 30.47
OpenMolIns-large (Llama-3.1-8B) 8B 43.1 27.22
OpenMolIns-xlarge (Galactica-125M) 125M 44.48 25.73
Llama3-70B-Instruct (Int4) (Dubey et al., 2024) 70B 38.54 23.93
OpenMolIns-large (Galactica-125M) 125M 39.28 23.42
OpenMolIns-medium (Galactica-125M) 125M 34.54 19.89
GPT-3.5-turbo (Achiam et al., 2023) - 28.93 18.58
OpenMolIns-small (Galactica-125M) 125M 24.17 15.18
Llama3.1-8B-Instruct (Dubey et al., 2024) 8B 26.26 14.09
Llama3-8B-Instruct (Dubey et al., 2024) 8B 26.40 13.75
chatglm-9B (GLM et al., 2024) 9B 18.50 13.13(7)
OpenMolIns-light (Galactica-125M) 125M 20.95 13.13(6)
OpenMolIns-large (Llama3.2-1B) 1B 14.11 8.10
yi-1.5-9B (Young et al., 2024) 9B 14.10 7.32
Mistral-7B-Instruct-v0.2 (Jiang et al., 2023) 7B 11.17 4.81
BioT5-base (Pei et al., 2023) 250M 24.19 4.21
MolT5-large (Edwards et al., 2022) 780M 23.11 2.89
Llama-3.1-1B-Instruct (Dubey et al., 2024) 1B 3.95 1.99
MolT5-base (Edwards et al., 2022) 250M 11.11 1.30(0)
MolT5-small (Edwards et al., 2022) 80M 11.55 1.29(9)
Qwen2-7B-Instruct (Yang et al., 2024) 7B 0.18 0.15

Pipeline

image

Dataset Categorization

This repository contains the code for the TOMG-Bench benchmark, which is a benchmark for evaluating LLMs' performance on Text-based Open Molecule Generation tasks. The benchmark consists of three main tasks. Each task is further divided into three subtasks, and each subtask is composed of 5000 data samples. Below is the dataset categorization:

Metrics

We adopt different evaluation metrics for different tasks. The evaluation metrics for each subtask are described in the corresponding subtask's README file.

The leaderboard is based on the weighted average accuracy metric, which is discussed in our paper.

Usage

  1. To query proprietary models, please refer to the query_openai.
  2. To evaluate the performance of an open-source general LLM, please refer to the run_query_vllm.
  3. To evaluate the performance of a ChEBI-20 fine-tuned LLM, please refer to the run_query_biot5 and run_query_molt5.
  4. To train on our OpenMolIns dataset, please refer to the train.
  5. To evaluate your model on our benchmark, please refer to the run_query_template.

Submit Your Model

If your model achieves amazing performance on our benchmark and you want to update the leaderboard, please send us your results (including raw files) via our emails. We will help you update the leaderboard once we have verified the results.

Reference

@misc{li2024tomgbenchevaluatingllmstextbased,
      title={TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation}, 
      author={Jiatong Li and Junxian Li and Yunqing Liu and Dongzhan Zhou and Qing Li},
      year={2024},
      eprint={2412.14642},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.14642}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published