Many of the examples make use of equations such as:
--eqn 'classify: label ~ question1 + question2'
The keyword, classify, is a task (also known as text-classification). See here for more discussion of tasks such as: regress, classify_tokens, classify_spans, ctc, etc.
In the classification case, for each input (pairs of two questions), there is a single label (semantically similar or not). Classify also generalizes from binary classification to multiclass classification (for tasks such as emotion classification). As shown in the table below, equations start with a number of different keywords (tasks):
- classify: lhs denotes a set of classes
- regress : lhs denotes a point in a vector space
- classify_tokens : there is a classification task for each token on the rhs
- classify_spans : used for SQuAD-like tasks where the output should be a span (substring) of the rhs
- ctc: used in speech recognition where the input is audio and the output is text
There are a number of examples of equations in the table below:
Dataset | Subset | Data Argument | Equation | Pipeline Task | |
---|---|---|---|---|---|
GLUE | COLA | H:glue,cola | classify : label ~ sentence | text-classification | |
GLUE | SST2 | H:glue,cola | classify : label ~ sentence | text-classification | |
GLUE | WNLI | H:glue,cola | classify : label ~ sentence | text-classification | |
GLUE | MRPC | H:glue,cola | classify : label ~ sentence1 + sentence2 | text-classification | |
GLUE | QNLI | H:glue,cola | classify : label ~ sentence1 + sentence2 | text-classification | |
GLUE | QQP | H:glue,cola | classify : label ~ question + sentence | text-classification | |
GLUE | SSTB | H:glue,cola | regress : label ~ question1 + question2 | ||
GLUE | MNLI | H:glue,cola | classify : label ~ premise + hypothesis | text-classification | |
SQuAD 1.0 | H:squad | classify_spans : answers ~ question + context | question-answering | ||
SQuAD 2.0 | H:squad_v2 | classify_spans : answers ~ question + context | question-answering | ||
CONLL2003 | POS | H:conll2003 | classify_tokens : pos_tags ~ tokens | token-classification | |
CONLL2003 | NER | H:conll2003 | classify_tokens : ner_tags ~ tokens | token-classification | |
CONLL2003 | Chunking | H:conll2003 | classify_tokens : chunk_tags ~ tokens | token-classification | |
TIMIT | H:timit_asr | ctc: text ~ audio | automatic-speech-recognition | ||
LibriSpeech | H:librispeech_asr | ctc: text ~ audio | automatic-speech-recognition | ||
Amazon Reviews | H:amazon_reviews_multi | classify: stars ~ review_title + review_body | text-classification | ||
VAD | C:$gft/datasets/VAD/VAD | regress: Valence + Arousal + Dominance ~ Word |