GitHub - WeSeewy/Chinese-Clickbait: [CSCWD'23] Detecting Clickbait in Chinese Social Media by Prompt Learning

Part-of-speech Enhanced Prompt Learning for clickbait detection

📝 Table of Contents

About
Getting Started
Usage
Authors
Contributing

🧐 About

This project implements a detection method for Chinese clickbait (标题党) news, which can determine whether the Chinese text belongs to the clickbait or not. Examples of clickbait and non-clickbait news are shown below:

Example	Label
当女生问“你在干啥”，这句回答100%勾起她的兴趣 When a girl asks "what are you doing", this answer is 100% to arouse her interest.	clickbait
北京发布做好复工复产疫情防控常态化工作通告 Beijing government issued the Notice on Normalized Covid-19 Prevention and Control Measures for Resumption of Work.	non-clickbait

The main idea is in the research paper Detecting Clickbait in Chinese Social Media by Prompt Learning, which is accepted for publication in CSCWD'23.

🏁 Getting Started

The project needs prepared work first before it can be used.

Prerequisites

The dependency frameworks of this project are as follows:

Pytorch - Deep Learning framework, please refer to the guide for installation, the version is 1.13.0
LTP - Chinese NLP tools, please refer to the guide for installation, the version is 4.2.11
Transformers - Pretrained Language Model framework, please refer to the guide for installation, the version is 4.24.0
OpenPrompt - Prompt Learning framework, please refer to the guide for installation, the version is 1.0.1

In addition, there are other dependency packages that need to be installed. If your python version is 3.10.8, you can install them directly by executing the following command:

pip install -r requirements.txt

If you are using another python version, please modify the requirements.txt to make it compatible.

Dataset

For this project, the WCD dataset is for training and testing, please download the all_labeled.csv file and copy it in the /data path. If you must change the dataset store location, please modify the DatasetPath variable in the main.py file.

🎈 Usage

Train

After completing the preparations, the following command can be used for training:

python main.py -s 0.01 0.5

The meaning of the two parameters for -s is using 1% of the dataset as the training set and 50% of the dataset as the test set.

The following command can be used to train models in extremely few-shot scenarios:

python main.py -f 16

The meaning of the parameter for -f is that only 16 clickbait and 16 non-clickbait samples are used for training.

In addition, the command supports other settings, as follows:

-bs: set training batch size
-lr: set training learning rate
-ep: set training epoch
-m: set mode，only support pepl and base，where pepl is our method，and base is the baseline
-plm: set the PLM architecture，support bert、roberta、ernie、Erlangshen

An example of a completely set training command is as follows:

python main.py -s 0.01 0.5 -bs 8 -lr 5e-5 -ep 3 -m pepl -plm bert

Predict

After the training, models will be stored in the /checkpoints folder. The input can be predicted by executing the following command, where the items.txt and news.txt files in the /data folder are samples to be predicted.

python use.py -p .\checkpoints\base_bert_fc_2.pt -m base -plm bert

The meanings of the parameters are shown below:

-p: set models store location，such as the base_bert_fc_2.pt
-m: set mode，only support pepl and base
-plm: set the PLM architecture，support bert、roberta、ernie、Erlangshen

Note: -m and -plm should be the same as the settings used in training phase

✍️ Authors

@WeSeewy - Idea & Develop
@caomingpei - Test & Doc

See also the list of contributors who participated in this project.

⛏️ Contributing

If you find problem(s) in this repo, please make issues.

If you want to contribute to this repo, please fork and create a new pull request.

The commit style should follow the convention:

[!TYPE:] message

The [!TYPE:] includes the following types:

!F: create new function
!B: bug fix
!D: update about the documents
!S: change code style
!R: refactor the code
!O: optimize the performance
!T: add test
!C: chores of update the dependency
!A: archive the related files

Example:

!D: configuring the git style

This example means this commit [!D:] is about update a document and the commit reason is configuring the git style.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
compat		compat
data		data
helper		helper
img		img
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.zh_CN.md		README.zh_CN.md
bert.py		bert.py
main.py		main.py
preprocess.py		preprocess.py
prompt.py		prompt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Part-of-speech Enhanced Prompt Learning for clickbait detection

📝 Table of Contents

🧐 About

🏁 Getting Started

Prerequisites

Dataset

🎈 Usage

Train

Predict

✍️ Authors

⛏️ Contributing

About

Releases

Packages

Contributors 2

Languages

License

WeSeewy/Chinese-Clickbait

Folders and files

Latest commit

History

Repository files navigation

Part-of-speech Enhanced Prompt Learning for clickbait detection

📝 Table of Contents

🧐 About

🏁 Getting Started

Prerequisites

Dataset

🎈 Usage

Train

Predict

✍️ Authors

⛏️ Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages