ABSA-SentiHood

This repository was made as a submission for IIIT Delhi MIDAS lab task.

ToDo List of Experiments:

About the Dataset

The dataset is found under the data section of the repository and contains the following files:

|
+-- data
    +-- sentihood-train.json
    +-- sentihood-dev.json
    +-- sentihood-test.json

The files contain the following number of examples:

file	Examples
Train	2977
Dev	747
Test	1491

Approach

The approach taken here was completely based on 1; the approach essentially frames the aspect-based sentiment analysis (ABSA) question as either a QA task or a NLI task. Given the constraints of time and efficiency, NLI task was chosen as the main focus of this approach, specifically NLI-M task (which stands for NLI multi-label classification)

Experiments

The models were trained with the following HF Trainer Arguments on a Google Colab Notebook (present in the notebooks folder). Note that, by default HFTrainer uses the AdamW optimizer.

TrainingArguments(output_dir = f"/content/drive/MyDrive/SentiHood/models/{model_name}",
                  overwrite_output_dir = True,
                  num_train_epochs = 5,
                  evaluation_strategy="steps",
                  eval_steps=500,
                  logging_steps=500, 
                  per_device_train_batch_size = 8,
                  per_device_eval_batch_size = 8,
                  learning_rate = 5e-5,
                  warmup_ratio = 0.1,
                  weight_decay = 0.01,
                  save_strategy = 'epoch',
                  seed = 42,
                  )

Model Name	Task Type	Test-Accuracy	Test-Recall	Test-Precision	Test-F1
bert-base-uncased	NLI-M	0.9822546389447798	0.8411322714846929	0.8733654497362174	0.8567007821344529
roberta-base	NLI-M	0.9830371115582384	0.8531581122933095	0.8630080220007996	0.8578191613396783
roberta-large (early-stopped;3 epochs)	NLI-M	0.9854124748490946	0.8731161037160114	0.8963398897511478	0.884449301687399

The final model chosen for submission is: roberta-large

Submission

Since the use of proper GPUs for training of LLMs is necessary, the code was entirely written and executed in Google Colab Notebooks. Said notebooks can be found under the notebooks folder.

The submissions for each model are contained in the submissions folder. Each submission is a .jsonl file having 1491 examples (in dictionaries) with the following structure.

[
    {
        'id': 61,
        'opinions': [{'aspect': 'live',
               'sentiment': 'Positive',
               'target_entity': 'LOCATION1'}],
        'text': 'I really like LOCATION1  I lived there for 3 months and had a lot of fun',
        'model_pred': [{'aspect': 'live',
               'sentiment': 'Positive',
               'target_entity': 'LOCATION1'}]
    }
...

References

[1] Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence

[2] HuggingFace Trainer

[3] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

[4] RoBERTa: A Robustly Optimized BERT Pretraining Approach

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
notebooks		notebooks
submissions		submissions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ABSA-SentiHood

About the Dataset

Approach

Experiments

Submission

References

License

About

Releases

Packages

Languages

License

bhavnicksm/ABSA-SentiHood

Folders and files

Latest commit

History

Repository files navigation

ABSA-SentiHood

About the Dataset

Approach

Experiments

Submission

References

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages