This package contains all the code and data used for reproducing the analyses underlying the articles published in VilaWeb's "La dada d'en Joe Brew" column. It is publicly available for the purposes of reproducibility and transparency.
To install this package, run the below from within the R console.
if(!require(devtools)) install.packages("devtools")
install_github('joebrew/vilaweb')
In order to reproduce the entire package, raw data will need to be downloaded from various sources.
Download into data-raw/cis/fichero_integrado
the "Barómetro Mensual - 2000-2018 all data" file from http://analisis.cis.es/fid/fidHistorico.jsp. This requires an account. Following log-in, go to the "Ficheros Integrados de Datos" page. Download the full data file. This will be downloaded as FID_637_06bcee7b-ea6b-4f41-a74e-37a941519966.zip
(or similar, depending on date downloaded). Extract the data as is in the folder in which it was downloaded.
For month specific files, download from the CIS page into data-raw/cis/monthly
into a folder with the following format: YYYY-MM
. Extract the file .The final path of each monthly survey will resemble something like this: 'data-raw/cis/monthly/2018-10/MD3226/3226.sav'
Download into data-raw/ceo
the "Matriu de dades fusionada a partir de 2014 (presdencial)" file as 2014_Microdades_anonimitzades_fusio_cine_pres.rar
from http://ceo.gencat.cat/ca/barometre/matrius-fusionada-BOP/. Extract the data as is in the folder in which it was downloaded.
Download into data-raw/icps
the "Sondeig d'opinió Catalunya 2018" data from https://www.icps.cat/recerca/sondeigs-i-dades/sondeigs/sondeigs-d-opinio-catalunya. This will require creating an account and password. Download both the 2017 and 2018 data into the 'data' sub-folder.
Download the following into data-raw/sentiment
: http://www.saifmohammad.com/WebDocs/NRC-Emotion-Lexicon-v0.92-InManyLanguages-web.xlsx
After downloading the above data, go into the data-raw
directory and run from the linux command line: Rscript create_data_files.R
.
Many of the analyses in this package are based on data from twitter. Data are retrieved from twitter using the python twint
package (install as per instructions at https://github.com/twintproject/twint). Data are stored locally in a Posgresql database. What follows are instructions for building the database so as to reproduce analyses with twitter data.
- Create a psql database named
twitter
. - Within twitter, create a table called
twitter
. - Go through the code in
analyses/set_up_database/set_up_database.R
to get the database set up. - Within R, run the
update_database()
function specifying the accounts via thepeople
argument. - To keep the database updated, run the
update_database()
function. - Generate periodic data dumps for back-up purposes:
pg_dump twitter > backup.sql
Having done the above, run Rscript build_package.R
from within the main directory to compile the package.