Robust Data-centric Graph Structure Learning for Text Classification (DCGSL)

Author: Jun Zhuang.

Paper:

Companion Proceedings of the ACM Web Conference 2024 (WWW '24 Companion)

Abstract:

Over the past decades, text classification underwent remarkable evolution across diverse domains. Despite these advancements, most existing model-centric methods in text classification cannot generalize well on class-imbalanced datasets that contain high-similarity textual information. Instead of developing new model architectures, data-centric approaches enhance the performance by manipulating the data structure. In this study, we aim to investigate robust data-centric approaches that can help text classification in our collected dataset, the metadata of survey papers about Large Language Models (LLMs). In the experiments, we explore four paradigms and observe that leveraging arXiv's co-category information on graphs can help robustly classify the text data over the other three paradigms, conventional machine-learning algorithms, pre-trained language models' fine-tuning, and zero-shot / few-shot classifications using LLMs.

Dataset:

We first collected the metadata of 112 literature reviews about Large Language Models (LLMs).

Getting Started:

Prerequisites

Linux or macOS
CPU or NVIDIA GPU + CUDA CuDNN
Python 3.11
pytorch, dgl, transformers, numpy, scipy, sklearn, yaml

Clone this repo

git clone https://github.com/junzhuang-code/DCGSL.git
cd DCGSL

Install dependencies

For pip users, please type the command: pip install -r requirements.txt
For Conda users, you may create a new Conda environment using: conda env create -f environment.yml

Directories

data: contain graph data, survey data, and text corpus;
data_collection: contain scripts for scraping data and processing text corpus;
baselines: source code of GNNs and LMs;
config: the config files.

Processing Steps

Data Collection: Collect raw data from arXiv.
Data Construction: Generate co-graphs and text graphs.
Data Evaluation: Validate the dataset via GNNs and LMs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robust Data-centric Graph Structure Learning for Text Classification (DCGSL)

Author: Jun Zhuang.

Paper:

Abstract:

Dataset:

Getting Started:

Prerequisites

Clone this repo

Install dependencies

Directories

Processing Steps

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
baselines		baselines
config		config
data		data
data_collection		data_collection
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

License

junzhuang-code/DCGSL

Folders and files

Latest commit

History

Repository files navigation

Robust Data-centric Graph Structure Learning for Text Classification (DCGSL)

Author: Jun Zhuang.

Paper:

Abstract:

Dataset:

Getting Started:

Prerequisites

Clone this repo

Install dependencies

Directories

Processing Steps

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages