Skip to content

U.S. County level word and topic loading derived from a 10% Twitter sample from 2009-2015.

License

Notifications You must be signed in to change notification settings

rezanazari/county_tweet_lexical_bank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

County Tweet Lexical Bank

County level word and topic loading derived from a 10% Twitter sample from 2009-2015.

Read the full publication here.

Data

Available in both csv format and as a MySQL dump.

1grams

  • group_id: County FIPS code
  • feat: 1gram
  • value: Number of times the 1gram was used by the county
  • group_norm: Average number of times the feature was used by the county (value / number of users in county)

Facebook Topics

Facebook topics are available here.

  • group_id: County FIPS code
  • feat: Topic id
  • value: Number of times a word in the topic was used by the county
  • group_norm: Relative frequency of topic use by county

Citation

Please cite the following paper if you use this data.

@inproceedings{giorgi2018remarkable,
    title={The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions}, 
    author={Giorgi, Salvatore and Preotiuc-Pietro, Daniel and Buffone, Anneke and Rieman, Daniel and Ungar, Lyle H. and Schwartz, H. Andrew}, 
    booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing}, 
    year={2018}
}

License

Licensed under a GNU General Public License v3 (GPLv3).

About

U.S. County level word and topic loading derived from a 10% Twitter sample from 2009-2015.

Resources

License

Stars

Watchers

Forks

Packages

No packages published