County level word and topic loading derived from a 10% Twitter sample from 2009-2015.
Read the full publication here.
Available in both csv format and as a MySQL dump.
group_id
: County FIPS codefeat
: 1gramvalue
: Number of times the 1gram was used by the countygroup_norm
: Average number of times the feature was used by the county (value / number of users in county
)
Facebook topics are available here.
group_id
: County FIPS codefeat
: Topic idvalue
: Number of times a word in the topic was used by the countygroup_norm
: Relative frequency of topic use by county
Please cite the following paper if you use this data.
@inproceedings{giorgi2018remarkable,
title={The Remarkable Benefit of User-Level Aggregation for Lexical-based Population-Level Predictions},
author={Giorgi, Salvatore and Preotiuc-Pietro, Daniel and Buffone, Anneke and Rieman, Daniel and Ungar, Lyle H. and Schwartz, H. Andrew},
booktitle={Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing},
year={2018}
}
Licensed under a GNU General Public License v3 (GPLv3).