This file explains the steps to reproduce the research carried out for my Master Thesis, titled "Refugees Welcome? A comparative sentiment analysis of tweets in Germany surrounding inflows of Syrians and Ukrainians".
The project aims to collect and analyze immigration-related tweets published in Germany in English and German during two time frames:
- The Syrian refugee inflow
- The Ukrainian refugee inflow
All steps are documented in the Notebooks:
- 01_Data-Collection_limited_Syrian.ipynb
- 01_Data-Collection_limited_Ukrainian.ipynb
- 02_Pre-processing_limited_merged.ipynb
- 03_Sentiment-Analysis_limited_merged.ipynb
- 04_Data-Preparation_limited_merged.ipynb
- 05_Exploration-and-Visualization_limited_merged.ipynb
- 06_Regression_limited_merged.ipynb
Please refer to requirements.txt
This project was created with:
- Python 3.9.13
Notebook 03_Sentiment-Analysis_merged.ipynb was carried out in:
- Google Colab
All other Notebooks were carried out in:
- JupyterLab 3.4.8
The collection of tweets during the first time frame--the Syrian refugee inflow--is documented in this notebook. Both English- and German-language tweets are obtained.
The collection of tweets during the second time frame--the Ukrainian refugee inflow--is documented in this notebook. Both English- and German-language tweets are obtained.
In this notebook, all collected tweets are pre-processed for sentiment analysis.
Sentiment analysis is carried out in this notebook. NOTE: This notebook was produced and run in Google Colab. Therefore, it is recommended you run it in Colab rather than Jupyter.
The data is prepared for exploration and eventual regression in this notebook.
Here, the data is explored and visualized.
The final step of the study is carried out in this notebook. A logistic regression is conducted to predict sentiment based on inflow (Syrian or Ukrainian) in addition to two control variables: (1) change in foreign share of the total population and (2) GDP volume growth.
Several adjustments were made based on the suggestions from my supervisor.