-
Lecturer: Peeter Tinits ([email protected]), Tallinn University, University of Tartu
-
Co-lecturer: Artyoms Šela ([email protected]), University of Tartu
-
Date: 28.08.
-
Room: Lossi 3-406
The increasing availability of textual data gives new opportunities for humanities and social sciences that we are only beginning to explore. The nature of the data can vary quite a bit ranging from old digitized newspapers to Twitter or forum posts that are born and live digitally. Provided that we can access the data, they allow quite diverse questions to be answered.
In this 5-hour tutorial, we will learn the basics of text mining in R following the tidyverse principles. R is a computing environment for statistical analysis and graphics that allows analyses performed to be easily reproduced later and by other researchers. Tidyverse is an opinionated set of packages that aim to make using R easy to read and learn.
Using reproducible analyses allows humanities and social sciences to increase transparency in the research process, make it easier to collaborate, and and easier to build on earlier research. Movements among researchers have shown the benefits of Open Science and Open Research Practices for our scientific knowledge about the world.
What exactly: We will use tidytext and ggplot2 packages to make simple visualizations of texts. We will compare word frequencies, do simple sentiment analysis, and find keywords in texts. We will explore these on novels, dramas, and/or song lyrics in English. Exploring your own texts is a possibility. The tutorial aims to give the basic techniques that would help you get started on research project of your own.
Requirements: Knowing R helps, but is not obligatory. Starting with tidyverse you may get a biased view of R, but the tutorial ought to be understandable with no prior experience in scripting.
The lessons will take place in a computer class with the required software installed. If using your own computer, install R (https://www.r-project.org) and Rstudio (https://www.rstudio.com) beforehand.
NB! It is highly recommended for everyone who has no previous experience with R to participate in the workshop "First steps in R" on August 26!
- Silge, Julia, and Robinson, David (2017) Text Mining with R. A tidy approach. O'Reilly Media.
- Grolemund, Garrett, and Wickham, Hadley (2017) R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O'Reilly Media.
Peeter Tinits is a would-be digital humanist and an open science aficionado with an interest in historical texts. In his research he has combined textual and non-textual data to study topics like the standardization of spelling norms, the structure of film production crews and writing techniques in Wikipedia. He is a firm believer that anyone can learn to code, and the humanities have a lot to gain from adopting reproducible research practices.
He is currently finishing his PhD in Tallinn University and has started to work on text mining historical newspapers to track large societal transitions in the University of Tartu.