Course Project for STAT 154 in UC Berkeley
"Trade-off" explores a lightweight BERT model for Chinese Question Answering (QA), demonstrating its effectiveness and efficiency compared to Large Language Models (LLMs).
- DRCD: Delta Reading Comprehension Dataset.
- ODSQA: Open-Domain Spoken Question Answering Dataset.
- BERT and its variants (ALBERT, RoBERTa).
- Comparative analysis with larger LLMs (Qwen-7B, Baichuan 2).
Fine-tuning BERT for QA tasks, focusing on preprocessing, training, and postprocessing techniques.
BERT variants, particularly RoBERTa, showed high accuracy, outperforming some larger LLMs in specific tasks.
Expanding dataset size, diversifying QA tasks, and adjusting language settings for comprehensive analysis.