gft Equations

Many of the examples make use of equations such as:

--eqn 'classify: label ~ question1 + question2'

The keyword, classify, is a task (also known as text-classification). See here for more discussion of tasks such as: regress, classify_tokens, classify_spans, ctc, etc.

In the classification case, for each input (pairs of two questions), there is a single label (semantically similar or not). Classify also generalizes from binary classification to multiclass classification (for tasks such as emotion classification). As shown in the table below, equations start with a number of different keywords (tasks):

classify: lhs denotes a set of classes
regress : lhs denotes a point in a vector space
classify_tokens : there is a classification task for each token on the rhs
classify_spans : used for SQuAD-like tasks where the output should be a span (substring) of the rhs
ctc: used in speech recognition where the input is audio and the output is text

There are a number of examples of equations in the table below:

Dataset	Subset	Data Argument	Equation	Pipeline Task
GLUE	COLA	H:glue,cola	classify : label ~ sentence	text-classification
GLUE	SST2	H:glue,cola	classify : label ~ sentence	text-classification
GLUE	WNLI	H:glue,cola	classify : label ~ sentence	text-classification
GLUE	MRPC	H:glue,cola	classify : label ~ sentence1 + sentence2	text-classification
GLUE	QNLI	H:glue,cola	classify : label ~ sentence1 + sentence2	text-classification
GLUE	QQP	H:glue,cola	classify : label ~ question + sentence	text-classification
GLUE	SSTB	H:glue,cola	regress : label ~ question1 + question2
GLUE	MNLI	H:glue,cola	classify : label ~ premise + hypothesis	text-classification
SQuAD 1.0		H:squad	classify_spans : answers ~ question + context	question-answering
SQuAD 2.0		H:squad_v2	classify_spans : answers ~ question + context	question-answering
CONLL2003	POS	H:conll2003	classify_tokens : pos_tags ~ tokens	token-classification
CONLL2003	NER	H:conll2003	classify_tokens : ner_tags ~ tokens	token-classification
CONLL2003	Chunking	H:conll2003	classify_tokens : chunk_tags ~ tokens	token-classification
TIMIT		H:timit_asr	ctc: text ~ audio	automatic-speech-recognition
LibriSpeech		H:librispeech_asr	ctc: text ~ audio	automatic-speech-recognition
Amazon Reviews		H:amazon_reviews_multi	classify: stars ~ review_title + review_body	text-classification
VAD		C:$gft/datasets/VAD/VAD	regress: Valence + Arousal + Dominance ~ Word

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eqn.md

eqn.md

gft Equations

Files

eqn.md

Latest commit

History

eqn.md

File metadata and controls

gft Equations