Skip to content

Latest commit

 

History

History
executable file
·
51 lines (42 loc) · 3.76 KB

eqn.md

File metadata and controls

executable file
·
51 lines (42 loc) · 3.76 KB

gft Equations

Many of the examples make use of equations such as:

--eqn 'classify: label ~ question1 + question2'

The keyword, classify, is a task (also known as text-classification). See here for more discussion of tasks such as: regress, classify_tokens, classify_spans, ctc, etc.

In the classification case, for each input (pairs of two questions), there is a single label (semantically similar or not). Classify also generalizes from binary classification to multiclass classification (for tasks such as emotion classification). As shown in the table below, equations start with a number of different keywords (tasks):

  1. classify: lhs denotes a set of classes
  2. regress : lhs denotes a point in a vector space
  3. classify_tokens : there is a classification task for each token on the rhs
  4. classify_spans : used for SQuAD-like tasks where the output should be a span (substring) of the rhs
  5. ctc: used in speech recognition where the input is audio and the output is text

There are a number of examples of equations in the table below:

Dataset Subset Data Argument Equation Pipeline Task
GLUE COLA H:glue,cola classify : label ~ sentence text-classification
GLUE SST2 H:glue,cola classify : label ~ sentence text-classification
GLUE WNLI H:glue,cola classify : label ~ sentence text-classification
GLUE MRPC H:glue,cola classify : label ~ sentence1 + sentence2 text-classification
GLUE QNLI H:glue,cola classify : label ~ sentence1 + sentence2 text-classification
GLUE QQP H:glue,cola classify : label ~ question + sentence text-classification
GLUE SSTB H:glue,cola regress : label ~ question1 + question2
GLUE MNLI H:glue,cola classify : label ~ premise + hypothesis text-classification
SQuAD 1.0 H:squad classify_spans : answers ~ question + context question-answering
SQuAD 2.0 H:squad_v2 classify_spans : answers ~ question + context question-answering
CONLL2003 POS H:conll2003 classify_tokens : pos_tags ~ tokens token-classification
CONLL2003 NER H:conll2003 classify_tokens : ner_tags ~ tokens token-classification
CONLL2003 Chunking H:conll2003 classify_tokens : chunk_tags ~ tokens token-classification
TIMIT H:timit_asr ctc: text ~ audio automatic-speech-recognition
LibriSpeech H:librispeech_asr ctc: text ~ audio automatic-speech-recognition
Amazon Reviews H:amazon_reviews_multi classify: stars ~ review_title + review_body text-classification
VAD C:$gft/datasets/VAD/VAD regress: Valence + Arousal + Dominance ~ Word