Skip to content

Latest commit

 

History

History
143 lines (87 loc) · 4.55 KB

question_answering.md

File metadata and controls

143 lines (87 loc) · 4.55 KB

Chinese Question Answering

Background

Question answering (QA) automatically provides answers to questions posed in natural language. Answers may be contained in structured databases or unstructured text collections.

Example

Input:

世界上最大的国家是什么?

Output:

俄国

Standard Metrics

  • Typical metrics are accuracy, exact match, and F1.
  • Some tests require systems to locate answers in provided text, rather than return a string.
  • Some tests include questions for which no answer exists in the provided database or text collection, in which case systems must return “no answer exists” to get credit.

NLPCC KBQA shared task.

The KBQA shared task at NLPCC 2017 asks systems to retrieve answers from a provided knowledge base (KB) of factual triples. The knowledge base consists of 8.7m entities and 47.9m triples.

The test set was formed by human annotators who selected triples. For each triple, the annotator wrote down a natural-language question whose answer is the object of the triple. Q/A pairs are provided, but the triple is not provided.

Test set Size (Q/A pairs) Genre
NLPCC-ICCPOL KBQA 2016 9870 Open domain
NLPCC KBQA 2017 7631 Open domain

Metric

Averaged F1.

Results

14 teams participated.

System Averaged F1
Best anonymous score reported 0.47

Resources

Train set Size (Q/A pairs) Genre
NLPCC KBQA 2016/2017 14,609 Open domain

NLPCC DBQA shared task.

The DBQA shared task at NLPCC 2017 asks systems to

The test set was formed by human annotators who were given documents. For each document, an annotator selected a sentence, then constructed a natural-language question whose answer is that sentence.

Test set Size (document/sentence pairs) Genre
NLPCC-ICCPOL DBQA 2016 5779 Open domain
NLPCC DBQA 2017 2500 Open domain

Metrics

  • MRR.
  • MAP.
  • Accuracy @ N.
  • F1

Results

NLPCC DBQA 2016

System MRR F1
ERNIE 2.0 95.8 85.8
Meng et. al. (2019) (Glyce + BERT) - 83.4
ERNIE(baidu) 95.1 82.7
BERT 94.6 80.8

NLPCC DBQA 2017

System MRR MAP Accuracy @ 1
Best anonymous score reported 72.0 71.7 59.2

Resources

Train set Size (document/sentence pairs) Genre
NLPCC DBQA 2016/2017 8772 Open domain

Machine Reading Comprehension (MRC) tasks from CLUE benchmark.

CLUE is a Chinese Language Understanding Evaluation benchmark. Machine Reading Comprehension (MRC) is a task to teach machine to read and understand unstructured text and then answer questions about it. MRC corpus in CLUE consists of three datasets: CMRC 2018 (Cui et al.), ChID (Zheng et al.), and C3 (Sun et al.).

Metrics

  • Exact Match (CMRC 2018)
  • Accuracy (ChID and C3)

Results

System CMRC 2018 ChID C3
HUMAN (CLUE origin) 92.40 87.10 96.00
RoBERTa-wwm-ext-large (CLUE origin) 76.58 85.37 72.32
BERT-base (CLUE origin) 69.72 82.04 64.50

Resources

CLUE benchmark

Other resources.

  • WebQA (Baidu) has 42k questions and answers. link
  • DuReader (Baidu) has 200k questions from online query logs. link
  • Zhang and Zhao (2018) provide 1929 Q/A pairs in the domain of Gaokao history exam questions.

Suggestions? Changes? Please send email to [email protected]