This project is the first attemp to do Political Bias classification of German news.
We clawled out data from various German news sites using news-please library. After that we manually cleaned the data and labeled it using Medienkompass. Data is organised as HuggingFace nlp library dataset.
Due to the copyright issues we can not publish the data, but provided the list of urls you can use to build this dataset by your own. To download all the data run:
NewsPlease.from_file('data/urls.txt')
Then run (under development):
python preprocess.ty -data_folder='path/to/your/downloaded/data'
Our system uses German BERT from HuggingFace Transformers library as the pre-trained model to fine-tune.
To train model run:
python train.py -data_folder="data" model_folder="model" -batch_size=8 -num_epochs=2
To test model run:
python test.py -data_folder="data" model_folder="model"
The web demo will be released soon.