forked from WladimirSidorenko/SemEval-2016
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
101 lines (58 loc) · 3.51 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
******************************************************
* SemEval-2016 Task 4: Sentiment Analysis on Twitter *
* *
* TRAINING + DEV DATA *
* *
* http://alt.qcri.org/semeval2016/task4/ *
* *
******************************************************
TRAINING + DEV dataset for SemEval-2016 Task 4
Version 1.0: October 15, 2015
Task organizers:
* Preslav Nakov, Qatar Computing Research Institute, HBKU
* Alan Ritter, The Ohio State University
* Sara Rosenthal, Columbia University
* Fabrizio Sebastiani, Qatar Computing Research Institute, HBKU
* Veselin Stoyanov, Facebook
NOTES
1. Please note that by downloading the Twitter data you agree to abide by the Twitter terms of service (https://twitter.com/tos), and in particular you agree not to redistribute the data and to delete tweets that are marked deleted in the future.
2. The distribution consists of a set of Twitter status IDs with annotations for Subtasks A, B, C, D, and E: topic polarity and trends toward a topic. There are exactly 100 tweets provided per topic and a total of 100 topics. You should use the downloading script to obtain the corresponding tweets: https://github.com/aritter/twitter_download
3. The "neutral" label in the annotations stands for objective_OR_neutral.
FILES
data/train/src/100_topics_100_tweets.topic-two-point.subtask-BD.train.txt -- training input for subtasks B and D
data/train/src/100_topics_100_tweets.topic-five-point.subtask-CE.train.txt -- training input for subtasks C and E
data/dev/src/100_topics_100_tweets.topic-two-point.subtask-BD.dev.txt -- dev input for subtasks B and D
data/dev/src/100_topics_100_tweets.topic-five-point.subtask-CE.dev.txt -- dev input for subtasks C and E
INPUT DATA FORMAT
-----------------------SUBTASK A-----------------------------------------
The format for the training/dev file is as follows:
id<TAB>label
where "label" can be 'positive', 'neutral' or 'negative'.
-----------------------SUBTASKS B,D--------------------------------------
** Task we might deal with.
The format for the training/dev file is as follows:
topic<TAB>id<TAB>label
where "label" can be 'positive' or 'negative' (note: no 'neutral'!).
-----------------------SUBTASKS C,E--------------------------------------
* Task we are dealing with.
The format for the training/dev file is as follows:
topic<TAB>id<TAB>label
where "label" can be -2, -1, 0, 1, or 2,
corresponding to "strongly negative", "negative", "negative or neutral", "positive", and "strongly positive".
LICENSE
The accompanying dataset is released under a Creative Commons Attribution 3.0 Unported License
(http://creativecommons.org/licenses/by/3.0/).
CITATION
You can cite the folowing paper when referring to the dataset:
@InProceedings{Rosenthal-EtAl:2015:SemEval,
author = {Sara Rosenthal and Alan Ritter and Veselin Stoyanov and Svetlana Kiritchenko and Saif Mohammad and Preslav Nakov},
title = {SemEval-2015 Task 10: Sentiment Analysis in Twitter},
booktitle = {Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)},
year = {2015},
publisher = {Association for Computational Linguistics},
}
USEFUL LINKS:
Google group: [email protected]
SemEval-2016 Task 4 website: http://alt.qcri.org/semeval2016/task4/
SemEval-2016 website: http://alt.qcri.org/semeval2016/