Skip to content

Latest commit

 

History

History
105 lines (93 loc) · 9.65 KB

tags.md

File metadata and controls

105 lines (93 loc) · 9.65 KB

Labels

Part-of-speech

These are the possible values assigned for token.pos:

POS Explanation
ADJ adjective
ADP adposition
ADV adverb
AUX auxiliary verb
CCONJ coordinating conjunction
INTJ interjection
NOUN noun
NUM numeral
PRON pronoun
PROPN proper noun
PUNCT punctuation
SCONJ subordinating conjunction
SPACE space
SYM symbol
VERB verb
X other, e.g. foreing

Dependency tags

These are the possible values for token.dep:

dep Explanation
acl clausal modifier of noun
acl:relcl relative clause modifier
advcl adverbial clause modifier
advmod adverb modifier
amod adjectival modifier
appos apposition
aux auxiliary verb. One of the following: olla, ei, voida, pitää, saattaa, täytyä, joutua, aikoa, taitaa, tarvita, mahtaa
aux:pass passive auxiliary, only one possible verb: olla
case case marking
cc coordinating conjunction
ccomp clausal complement
cc:preconj preconjunct, constructs like sekä ... että
compound compound
compound:nn noun compound modifier
compound:prt phrasal particle
conj coordinated element
cop copula, auto on vihreä
cop:own copula for posessive clauses, minulla on kynä
csubj clausal subject
csubj:cop clausal copular subject
det determiner
dep unspecified dependency
discourse discourse element
fixed fixed multi-word expression
flat flat phrase without a clear head
flat:foreign foreign words
flat:name names
mark subordinating conjunction, complementizer, or comparative conjunction
nmod nominal modifier
nmod:gobj genitive object
nmod:gsubj genitive subject
nmod:poss genitive modifier
nsubj nominal subject
nsubj:cop nominal copular subject
nummod numeric modifier
obj direct object
obl oblique nominal
orphan orphaned dependent in gapping
parataxis parataxis
punct punctuation
root grammatical root of the sentence
vocative vocative modifier
xcomp open clausal complement
xcomp:ds clausal complement with different subject

Morphology

The morphology labels (token.morph) follow the UD for Finnish specification.

Named entities

The recognized named entities (token.ent_type) follow the OntoNotes scheme:

ent_type Explanation
CARDINAL Numerals that do not fall under another type
DATE Absolute or relative dates or periods
EVENT Named hurricanes, battles, wars, sports events, etc.
FAC Buildings, airports, highways, bridges, etc.
GPE Geo-political entity: Countries, cities, states
LANGUAGE Any named language
LAW Named documents made into laws
LOC Non-GPE locations, mountain ranges, bodies of water
MONEY Monetary values, including unit
NORP Nationalities or religious or political groups
ORDINAL Ordinal numbers: ensimmäinen, toinen, etc.
ORG Companies, agencies, institutions, etc.
PERCENT Percentage (including “%”)
PERSON People, including fictional
PRODUCT Vehicles, weapons, foods, etc. (Not services)
QUANTITY Measurements, as of weight or distance
TIME Times smaller than a day
WORK_OF_ART Titles of books, songs, etc.