This folder contains raw and meta data of EmoBank. In particular, it contains
raw.csv
: The raw textual data.meta.tsv
: The source and genre meta-data.reader.csv
: The gold ratings from the reader perspectivewriter.csv
: The gold ratings fromt the writer perspectiveemobank.csv
: Weighted average of reader and writer annotations. Use this file by default.
EmoBank comes with annotations according to two perspectives (reader and writer). However, for most use cases, this distinction may not be relevant. In these cases, I would advise to use the combination of both annotions perspectives to increase reliability. These combined ratings are stored for convenience in emobank.csv
and can be loaded like this:
import pandas as pd
eb = pd.read_csv("emobank.csv", index_col=0)
The columns V, A and D represent Valence (negative vs. positive), Arousal (calm vs. excited), and Dominance (being controlled vs. being in control). Each of those take numeric values from [1, 5]. Please refer to the paper for further details.
print(eb.shape)
eb.head()
(10062, 4)
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
V | A | D | text | |
---|---|---|---|---|
id | ||||
110CYL068_1036_1079 | 3.00 | 3.00 | 3.20 | Remember what she said in my last letter? " |
110CYL068_1079_1110 | 2.80 | 3.10 | 2.80 | If I wasn't working here. |
110CYL068_1127_1130 | 3.00 | 3.00 | 3.00 | .." |
110CYL068_1137_1188 | 3.44 | 3.00 | 3.22 | Goodwill helps people get off of public assist... |
110CYL068_1189_1328 | 3.55 | 3.27 | 3.46 | Sherry learned through our Future Works class ... |
Print most extreme sentences in either of three dimensions.
for d in ["V", "A", "D"]:
print("Min {}:\n{}".format(d, eb.loc[eb[d].argmin()]))
print()
print("Max {}:\n{}".format(d, eb.loc[eb[d].argmax()]))
print()
print()
Min V:
V 1.2
A 4.2
D 3.8
text "Fuck you"
Name: A_defense_of_Michael_Moore_12034_12044, dtype: object
Max V:
V 4.6
A 4.3
D 3.7
text lol Wonderful Simply Superb!
Name: vampires_4446_4474, dtype: object
Min A:
V 3.1
A 1.8
D 3.1
text I was feeling calm and private that night.
Name: Nathans_Bylichka_2070_2112, dtype: object
Max A:
V 4.3
A 4.4
D 3.4
text "My God, yes, yes, yes!"
Name: captured_moments_28728_28752, dtype: object
Min D:
V 2
A 3
D 1.78
text I shivered as I walked past the pale man’s bla...
Name: Nathans_Bylichka_40025_40116, dtype: object
Max D:
V 1.7
A 3.9
D 4.2
text “NO”
Name: defenders5_3431_3435, dtype: object
If you want to work with either the reader or the writer set of annotations individually, here is how to access those.
raw = pd.read_csv("raw.csv", index_col=0)
raw.head()
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
text | |
---|---|
id | |
Acephalous-Cant-believe_4_47 | I can't believe I wrote all that last year. |
Acephalous-Cant-believe_83_354 | Because I've been grading all damn day and am ... |
Acephalous-Cant-believe_355_499 | However, when I started looking through my arc... |
Acephalous-Cant-believe_500_515 | What do I mean? |
Acephalous-Cant-believe_517_626 | The posts I consider foundational to my curren... |
reader = pd.read_csv("reader.csv", index_col=0)
reader.head()
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
V | A | D | stdV | stdA | stdD | N | |
---|---|---|---|---|---|---|---|
id | |||||||
110CYL068_1036_1079 | 3.0 | 3.20 | 3.00 | 0.00 | 0.40 | 0.00 | 5 |
110CYL068_1079_1110 | 2.6 | 3.00 | 2.60 | 0.49 | 0.63 | 0.49 | 5 |
110CYL068_1110_1127 | 2.0 | 2.33 | 2.33 | 1.41 | 0.47 | 0.47 | 3 |
110CYL068_1127_1130 | 3.0 | 3.00 | 3.00 | 0.00 | 0.00 | 0.00 | 2 |
110CYL068_1137_1188 | 3.6 | 3.00 | 3.40 | 0.80 | 0.63 | 0.49 | 5 |
writer = pd.read_csv("writer.csv", index_col=0)
writer.head()
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
V | A | D | stdV | stdA | stdD | N | |
---|---|---|---|---|---|---|---|
id | |||||||
110CYL068_1036_1079 | 3.00 | 2.8 | 3.4 | 0.00 | 0.98 | 0.49 | 5 |
110CYL068_1079_1110 | 3.00 | 3.2 | 3.0 | 0.00 | 0.40 | 0.00 | 5 |
110CYL068_1127_1130 | 3.00 | 3.0 | 3.0 | 0.00 | 0.00 | 0.00 | 5 |
110CYL068_1137_1188 | 3.25 | 3.0 | 3.0 | 0.43 | 0.71 | 0.00 | 4 |
110CYL068_1189_1328 | 3.40 | 3.4 | 3.2 | 0.49 | 0.49 | 0.40 | 5 |
This code was used to generate emobank.csv
.
from pathlib import Path
import pandas as pd
def load_emobank(path):
"""
path..........The path to this folder.
"""
path = Path(path)
raw = pd.read_csv(path / "raw.csv", index_col=0)
writer = pd.read_csv(path / "writer.csv", index_col=0)
reader = pd.read_csv(path / "reader.csv", index_col=0)
common = sorted(list(set(writer.index).intersection(set(reader.index))))
# redefine reader, writer as arrays
N_reader = (reader.loc[common,"N"]).values.reshape((len(common),1))
N_writer = (writer.loc[common,"N"]).values.reshape((len(common),1))
reader = (reader.loc[common, ["V", "A","D"]]).values
writer = (writer.loc[common, ["V", "A","D"]]).values
#compute weighted average of annotations
combined = ( (reader * N_reader) + (writer * N_writer) ) / (N_reader + N_writer)
combined = pd.DataFrame(columns = ["V", "A", "D"], data=combined, index=common).round(2)
combined["text"] = raw.loc[common]
combined.index.rename("id", inplace=True)
assert combined.shape == (10062, 4)
return combined
import csv
eb = load_emobank(".") # This assumes that /.../EmoBank/corpus is your working directory.
# Otherwise make sure to insert the correct path to /.../EmoBank/corpus between the quotes.
eb.to_csv("emobank.csv", quoting = csv.QUOTE_NONNUMERIC)