-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some datasets provided by download links (SQUAD 2.0 format) have no answers in them. #12
Comments
Hey, This problem was only for TriviaQA-Unfiltered. I'm working now to fix it. Also i have uploaded most SQuAD2.0 converted datasets, please see the main readme. multi75k can be create by adding multiple datasets to the list of datasets to convert. But i will also upload it eventually. Thanks |
Hi, Thanks for your reply. However, it seems like some links are broken in SQUAD 2.0 format for SQUAD datasets. Thanks in advance. |
File "/people/sanjay/MultiQA/models/multiqa_reader.py", line 242, in combine_context Same error when I run this:
|
Thanks for noticing!
These are still a format change minor bugs, I will take care of it shortly
and provide tests.
…On Fri, 20 Sep 2019 at 12:34 Sanjay Kamath ***@***.***> wrote:
File "/people/sanjay/MultiQA/models/multiqa_reader.py", line 242, in
combine_context
for ac in qa['answers']['open-ended']["answer_candidates"]:
KeyError: 'answer_candidates'
Same error when I run this:
python multiqa.py evaluate --model HotpotQA --datasets NewsQA,SearchQA
--cuda_device 0
or this,
python multiqa.py evaluate --model SQuAD1-1 --datasets
SQuAD1-1,NewsQA,SearchQA --cuda_device 0
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#12?email_source=notifications&email_token=ACUIPDO56RNADWQJS3DMJADQKSRLTA5CNFSM4IYKEYS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7GJMBY#issuecomment-533501447>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACUIPDJVVXGU4EIQONXWYRTQKSRLTANCNFSM4IYKEYSQ>
.
|
Hey, I've maid the fixes and uploaded the datasets. In addition, I will upload PytorchTransformers results and models on several datasets soon... |
Perfect! The problem is solved hence closing the thread. |
Hi, sorry to reopen this again. I was wondering how do you combine datasets when we pass a list of datasets while training. Thanks in advance. |
Hey, It concatenates per example so that each batch has a mix of datasets. Hope that helped... |
Hi, OK so each batch has a mixture of data, but when I use the default setting do we use all samples? For instance, if 5 datasets are 100k each, concatenating them as you do in default training setting as shown in readme results in 500k samples right? not 15k*5 = 75k? Just wanted to know how much samples do I use when I train 5 sets. Thanks |
To use all the data just make sure sample_size=-1 |
Ok alright. |
Hi, I'm trying to run pytorch-transformers code as you highlight in your repository using the same code. It works well for SQUAD 1 & 2, NewsQA. But when I try the exact same code for DROP and TriviaQA_Wiki it says "ValueError: num_samples should be a positive integer value, but got num_samples=0" I see that in DROP_dev.json there is no answers at all, the "answers" field contain an empty list, and "is_impossible" is false. |
@rsanjaykamath This is a few years on, but I'm actually trying to do the same thing. Did you ever get a good answer for evaluating DROP using the SQuAD setup? |
Hi,
This error seems to occur when I'm converting any datasets from jsonl.gz to Squad type json.
File "convert_multiqa_to_squad_format.py", line 43, in multi_example_to_squad
for answer_cand in qa['answers']["open-ended"]['answer_candidates']:
KeyError: 'answer_candidates'
Also how can I get the multi75k dataset? Are you planning on releasing it ?
Thanks for your time.
The text was updated successfully, but these errors were encountered: