Skip to content

Latest commit

 

History

History
 
 

corpus

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

0 Folder Content

This folder contains raw and meta data of EmoBank. In particular, it contains

  • raw.csv: The raw textual data.
  • meta.tsv: The source and genre meta-data.
  • reader.csv: The gold ratings from the reader perspective
  • writer.csv: The gold ratings fromt the writer perspective
  • emobank.csv: Weighted average of reader and writer annotations. Use this file by default.

1 Loading EmoBank

EmoBank comes with annotations according to two perspectives (reader and writer). However, for most use cases, this distinction may not be relevant. In these cases, I would advise to use the combination of both annotions perspectives to increase reliability. These combined ratings are stored for convenience in emobank.csv and can be loaded like this:

import pandas as pd
eb = pd.read_csv("emobank.csv", index_col=0)

1.1 Data Format

The columns V, A and D represent Valence (negative vs. positive), Arousal (calm vs. excited), and Dominance (being controlled vs. being in control). Each of those take numeric values from [1, 5]. Please refer to the paper for further details.

print(eb.shape)
eb.head()
(10062, 4)
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
V A D text
id
110CYL068_1036_1079 3.00 3.00 3.20 Remember what she said in my last letter? "
110CYL068_1079_1110 2.80 3.10 2.80 If I wasn't working here.
110CYL068_1127_1130 3.00 3.00 3.00 .."
110CYL068_1137_1188 3.44 3.00 3.22 Goodwill helps people get off of public assist...
110CYL068_1189_1328 3.55 3.27 3.46 Sherry learned through our Future Works class ...

1.2 Quick sanity check

Print most extreme sentences in either of three dimensions.

for d in ["V", "A", "D"]:
    print("Min {}:\n{}".format(d, eb.loc[eb[d].argmin()]))
    print()
    print("Max {}:\n{}".format(d, eb.loc[eb[d].argmax()]))
    print()
    print()
Min V:
V              1.2
A              4.2
D              3.8
text    "Fuck you"
Name: A_defense_of_Michael_Moore_12034_12044, dtype: object

Max V:
V                                4.6
A                                4.3
D                                3.7
text    lol Wonderful Simply Superb!
Name: vampires_4446_4474, dtype: object


Min A:
V                                              3.1
A                                              1.8
D                                              3.1
text    I was feeling calm and private that night.
Name: Nathans_Bylichka_2070_2112, dtype: object

Max A:
V                            4.3
A                            4.4
D                            3.4
text    "My God, yes, yes, yes!"
Name: captured_moments_28728_28752, dtype: object


Min D:
V                                                       2
A                                                       3
D                                                    1.78
text    I shivered as I walked past the pale man’s bla...
Name: Nathans_Bylichka_40025_40116, dtype: object

Max D:
V        1.7
A        3.9
D        4.2
text    “NO”
Name: defenders5_3431_3435, dtype: object

1.3 Loading Individual Parts

If you want to work with either the reader or the writer set of annotations individually, here is how to access those.

1.3.1 Raw Text

raw = pd.read_csv("raw.csv", index_col=0)
raw.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
text
id
Acephalous-Cant-believe_4_47 I can't believe I wrote all that last year.
Acephalous-Cant-believe_83_354 Because I've been grading all damn day and am ...
Acephalous-Cant-believe_355_499 However, when I started looking through my arc...
Acephalous-Cant-believe_500_515 What do I mean?
Acephalous-Cant-believe_517_626 The posts I consider foundational to my curren...

1.3.2 Reader Perspective Annotations

reader = pd.read_csv("reader.csv", index_col=0)
reader.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
V A D stdV stdA stdD N
id
110CYL068_1036_1079 3.0 3.20 3.00 0.00 0.40 0.00 5
110CYL068_1079_1110 2.6 3.00 2.60 0.49 0.63 0.49 5
110CYL068_1110_1127 2.0 2.33 2.33 1.41 0.47 0.47 3
110CYL068_1127_1130 3.0 3.00 3.00 0.00 0.00 0.00 2
110CYL068_1137_1188 3.6 3.00 3.40 0.80 0.63 0.49 5

1.3.3 Writer Perspective Annotations

writer = pd.read_csv("writer.csv", index_col=0)
writer.head()
<style> .dataframe thead tr:only-child th { text-align: right; }
.dataframe thead th {
    text-align: left;
}

.dataframe tbody tr th {
    vertical-align: top;
}
</style>
V A D stdV stdA stdD N
id
110CYL068_1036_1079 3.00 2.8 3.4 0.00 0.98 0.49 5
110CYL068_1079_1110 3.00 3.2 3.0 0.00 0.40 0.00 5
110CYL068_1127_1130 3.00 3.0 3.0 0.00 0.00 0.00 5
110CYL068_1137_1188 3.25 3.0 3.0 0.43 0.71 0.00 4
110CYL068_1189_1328 3.40 3.4 3.2 0.49 0.49 0.40 5

2 Function for Combining Individual Parts

This code was used to generate emobank.csv.

from pathlib import Path
import pandas as pd

def load_emobank(path):
    """
    path..........The path to this folder.
    """
    path = Path(path)
    raw = pd.read_csv(path / "raw.csv", index_col=0)
    writer = pd.read_csv(path / "writer.csv", index_col=0)
    reader = pd.read_csv(path / "reader.csv", index_col=0)

    common = sorted(list(set(writer.index).intersection(set(reader.index))))

    # redefine reader, writer as arrays
    N_reader = (reader.loc[common,"N"]).values.reshape((len(common),1))
    N_writer = (writer.loc[common,"N"]).values.reshape((len(common),1))

    reader = (reader.loc[common, ["V", "A","D"]]).values
    writer = (writer.loc[common, ["V", "A","D"]]).values

    #compute weighted average of annotations
    combined = ( (reader * N_reader) + (writer * N_writer) ) / (N_reader + N_writer)

    combined = pd.DataFrame(columns = ["V", "A", "D"], data=combined, index=common).round(2)
    combined["text"] = raw.loc[common]
    combined.index.rename("id", inplace=True)

    assert combined.shape == (10062, 4)
    return combined
import csv
eb = load_emobank(".")  # This assumes that /.../EmoBank/corpus is your working directory. 
                        # Otherwise make sure to insert the correct path to /.../EmoBank/corpus between the quotes.
eb.to_csv("emobank.csv", quoting = csv.QUOTE_NONNUMERIC)