Question answering (QA) automatically provides answers to questions posed in natural language. Answers may be contained in structured databases or unstructured text collections.
Input:
世界上最大的国家是什么?
Output:
俄国
- Typical metrics are accuracy, exact match, and F1.
- Some tests require systems to locate answers in provided text, rather than return a string.
- Some tests include questions for which no answer exists in the provided database or text collection, in which case systems must return “no answer exists” to get credit.
The KBQA shared task at NLPCC 2017 asks systems to retrieve answers from a provided knowledge base (KB) of factual triples. The knowledge base consists of 8.7m entities and 47.9m triples.
The test set was formed by human annotators who selected triples. For each triple, the annotator wrote down a natural-language question whose answer is the object of the triple. Q/A pairs are provided, but the triple is not provided.
Test set | Size (Q/A pairs) | Genre |
---|---|---|
NLPCC-ICCPOL KBQA 2016 | 9870 | Open domain |
NLPCC KBQA 2017 | 7631 | Open domain |
Averaged F1.
14 teams participated.
System | Averaged F1 |
---|---|
Best anonymous score reported | 0.47 |
Train set | Size (Q/A pairs) | Genre |
---|---|---|
NLPCC KBQA 2016/2017 | 14,609 | Open domain |
The DBQA shared task at NLPCC 2017 asks systems to
The test set was formed by human annotators who were given documents. For each document, an annotator selected a sentence, then constructed a natural-language question whose answer is that sentence.
Test set | Size (document/sentence pairs) | Genre |
---|---|---|
NLPCC-ICCPOL DBQA 2016 | 5779 | Open domain |
NLPCC DBQA 2017 | 2500 | Open domain |
- MRR.
- MAP.
- Accuracy @ N.
- F1
NLPCC DBQA 2016
System | MRR | F1 |
---|---|---|
ERNIE 2.0 | 95.8 | 85.8 |
Meng et. al. (2019) (Glyce + BERT) | - | 83.4 |
ERNIE(baidu) | 95.1 | 82.7 |
BERT | 94.6 | 80.8 |
NLPCC DBQA 2017
System | MRR | MAP | Accuracy @ 1 |
---|---|---|---|
Best anonymous score reported | 72.0 | 71.7 | 59.2 |
Train set | Size (document/sentence pairs) | Genre |
---|---|---|
NLPCC DBQA 2016/2017 | 8772 | Open domain |
CLUE is a Chinese Language Understanding Evaluation benchmark. Machine Reading Comprehension (MRC) is a task to teach machine to read and understand unstructured text and then answer questions about it. MRC corpus in CLUE consists of three datasets: CMRC 2018 (Cui et al.), ChID (Zheng et al.), and C3 (Sun et al.).
- Exact Match (CMRC 2018)
- Accuracy (ChID and C3)
System | CMRC 2018 | ChID | C3 |
---|---|---|---|
HUMAN (CLUE origin) | 92.40 | 87.10 | 96.00 |
RoBERTa-wwm-ext-large (CLUE origin) | 76.58 | 85.37 | 72.32 |
BERT-base (CLUE origin) | 69.72 | 82.04 | 64.50 |
- WebQA (Baidu) has 42k questions and answers. link
- DuReader (Baidu) has 200k questions from online query logs. link
- Zhang and Zhao (2018) provide 1929 Q/A pairs in the domain of Gaokao history exam questions.
Suggestions? Changes? Please send email to [email protected]