GitHub - pgervila/Explore_bilingualism_in_cities: Explore linguistic choice through social media in large European cities where more than one language is spoken

Quantifying language choice with Twitter

The project aims to quantify language choice in bilingual environments through evidence gathered from Twitter users

It contains:

4 Python classes:
- StreamTweetData: class with methods to access the Twitter API to stream specific tweets from a number of followers of specified accounts linked to a country/city/account
- ProcessTweetData: class with methods to post-process tweets by language and do statistical analysis of the data.
- PlotTweetData: class with methods to visualize results from ProcessTweetData class
- InterCityComparison: class to visualize and compare processed data from different cities/countries
It is also possible to perform a linguistic random walk through the city and draw linguistic conclusions, by jumping between the networks of random residents in a given country or city

Steps to get started:

In order to use the code, users will need to create a Twitter account and provide OAuth settings and an access token: consumer_key, consumer_secret, access_token, access_token_secret in order to have access to Twitter API. These are automatically generated once registration on Twitter API is completed.

Keys and tokens must be stored as python variables in a file called twitter_pwd.py . A dummy file with fake keys and tokens is provided.

A number of relevant accounts for a number of cities are provided as class attributes of the main class. These cities are : Barcelona, Brussels, Kiev and Riga

StreamTweetData class must be initialized specifying both a file and a country/city. If the specified city is not among the hard-coded ones, a list of corresponding root-accounts must also be provided

Retrieve and save a number of followers from a given account using method 'get_account_network'. Specify if followers must be city or country-wide residents. repeat for all desired accounts
Retrieve and save a specified number of (re)tweets from each follower using 'get_tweets_from_followers'. Keep only tweets whose cleaned text is long enough for reliable lang detection. In addition, keep only tweets whose detected lang is the same as that specified in tweet metadata provided by API
Compute and store a list of all unique followers for a given city/ country using method 'filter_root_accs_unique_followers'.
Initialize ProcessTweetData class to post-process tweets data and create a pandas DataFrame that summarizes all information from tweets using method 'process_data'
Initialize PlotTweetData and use its plot methods to visualize stats per root account

Where to get help with this project

Contact me on [email protected] for any question or doubt concerning this repo. Check also my blog for more detailed descriptions

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
bilangcity		bilangcity
Readme.md		Readme.md
explore_city_langs.ipynb		explore_city_langs.ipynb
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quantifying language choice with Twitter

Steps to get started:

Where to get help with this project

About

Releases

Packages

Languages

pgervila/Explore_bilingualism_in_cities

Folders and files

Latest commit

History

Repository files navigation

Quantifying language choice with Twitter

Steps to get started:

Where to get help with this project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages