Text mining in R and reproducible research - Peeter Tinits

Lecturer: Peeter Tinits (peeter.tinits@ut.ee), Tallinn University, University of Tartu
Co-lecturer: Artyoms Šela (artjoms.sela@ut.ee), University of Tartu
Date: 28.08.
Room: Lossi 3-406

Description

The increasing availability of textual data gives new opportunities for humanities and social sciences that we are only beginning to explore. The nature of the data can vary quite a bit ranging from old digitized newspapers to Twitter or forum posts that are born and live digitally. Provided that we can access the data, they allow quite diverse questions to be answered.

In this 5-hour tutorial, we will learn the basics of text mining in R following the tidyverse principles. R is a computing environment for statistical analysis and graphics that allows analyses performed to be easily reproduced later and by other researchers. Tidyverse is an opinionated set of packages that aim to make using R easy to read and learn.

Using reproducible analyses allows humanities and social sciences to increase transparency in the research process, make it easier to collaborate, and and easier to build on earlier research. Movements among researchers have shown the benefits of Open Science and Open Research Practices for our scientific knowledge about the world.

What exactly: We will use tidytext and ggplot2 packages to make simple visualizations of texts. We will compare word frequencies, do simple sentiment analysis, and find keywords in texts. We will explore these on novels, dramas, and/or song lyrics in English. Exploring your own texts is a possibility. The tutorial aims to give the basic techniques that would help you get started on research project of your own.

Requirements: Knowing R helps, but is not obligatory. Starting with tidyverse you may get a biased view of R, but the tutorial ought to be understandable with no prior experience in scripting.

The lessons will take place in a computer class with the required software installed. If using your own computer, install R (https://www.r-project.org) and Rstudio (https://www.rstudio.com) beforehand.

NB! It is highly recommended for everyone who has no previous experience with R to participate in the workshop "First steps in R" on August 26!

Materials

Silge, Julia, and Robinson, David (2017) Text Mining with R. A tidy approach. O'Reilly Media.
Grolemund, Garrett, and Wickham, Hadley (2017) R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O'Reilly Media.

About the instructor

Peeter Tinits is a would-be digital humanist and an open science aficionado with an interest in historical texts. In his research he has combined textual and non-textual data to study topics like the standardization of spelling norms, the structure of film production crews and writing techniques in Wikipedia. He is a firm believer that anyone can learn to code, and the humanities have a lot to gain from adopting reproducible research practices.

He is currently finishing his PhD in Tallinn University and has started to work on text mining historical newspapers to track large societal transitions in the University of Tartu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workshop_description.md

workshop_description.md

Text mining in R and reproducible research - Peeter Tinits

Description

Materials

About the instructor

Files

workshop_description.md

Latest commit

History

workshop_description.md

File metadata and controls

Text mining in R and reproducible research - Peeter Tinits

Description

Materials

About the instructor