Some datasets provided by download links (SQUAD 2.0 format) have no answers in them. #12

rsanjaykamath · 2019-09-19T11:54:22Z

Hi,

This error seems to occur when I'm converting any datasets from jsonl.gz to Squad type json.

File "convert_multiqa_to_squad_format.py", line 43, in multi_example_to_squad
for answer_cand in qa['answers']["open-ended"]['answer_candidates']:
KeyError: 'answer_candidates'

Also how can I get the multi75k dataset? Are you planning on releasing it ?

Thanks for your time.

alontalmor · 2019-09-19T20:17:07Z

Hey,

This problem was only for TriviaQA-Unfiltered. I'm working now to fix it.

Also i have uploaded most SQuAD2.0 converted datasets, please see the main readme.

multi75k can be create by adding multiple datasets to the list of datasets to convert. But i will also upload it eventually.

Thanks
Alon

rsanjaykamath · 2019-09-20T08:35:35Z

Hi,

Thanks for your reply.
Perfect, this makes it all easy for me. :)

However, it seems like some links are broken in SQUAD 2.0 format for SQUAD datasets.

Thanks in advance.

rsanjaykamath · 2019-09-20T10:34:33Z

File "/people/sanjay/MultiQA/models/multiqa_reader.py", line 242, in combine_context
for ac in qa['answers']['open-ended']["answer_candidates"]:
KeyError: 'answer_candidates'

Same error when I run this:

python multiqa.py evaluate --model HotpotQA --datasets NewsQA,SearchQA --cuda_device 0
or this,

python multiqa.py evaluate --model SQuAD1-1 --datasets SQuAD1-1,NewsQA,SearchQA --cuda_device 0

alontalmor · 2019-09-20T12:05:46Z

Thanks for noticing! These are still a format change minor bugs, I will take care of it shortly and provide tests.

…

On Fri, 20 Sep 2019 at 12:34 Sanjay Kamath ***@***.***> wrote: File "/people/sanjay/MultiQA/models/multiqa_reader.py", line 242, in combine_context for ac in qa['answers']['open-ended']["answer_candidates"]: KeyError: 'answer_candidates' Same error when I run this: python multiqa.py evaluate --model HotpotQA --datasets NewsQA,SearchQA --cuda_device 0 or this, python multiqa.py evaluate --model SQuAD1-1 --datasets SQuAD1-1,NewsQA,SearchQA --cuda_device 0 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#12?email_source=notifications&email_token=ACUIPDO56RNADWQJS3DMJADQKSRLTA5CNFSM4IYKEYS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7GJMBY#issuecomment-533501447>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACUIPDJVVXGU4EIQONXWYRTQKSRLTANCNFSM4IYKEYSQ> .

alontalmor · 2019-09-21T14:22:21Z

Hey, I've maid the fixes and uploaded the datasets. In addition, I will upload PytorchTransformers results and models on several datasets soon...

rsanjaykamath · 2019-09-24T20:28:45Z

Perfect! The problem is solved hence closing the thread.
Thanks.

rsanjaykamath · 2019-10-14T14:21:07Z

Hi, sorry to reopen this again.

I was wondering how do you combine datasets when we pass a list of datasets while training.
Does it concat all questions from the datasets in the list ? or take only 15k to create multi75k as you mentioned above in this thread?

Thanks in advance.

alontalmor · 2019-10-15T10:21:56Z

Hey,

It concatenates per example so that each batch has a mix of datasets.
You can control how much you take from each dataset using "sample_size"

Hope that helped...

rsanjaykamath · 2019-10-16T13:39:34Z

Hi,

OK so each batch has a mixture of data, but when I use the default setting do we use all samples?

For instance, if 5 datasets are 100k each, concatenating them as you do in default training setting as shown in readme results in 500k samples right? not 15k*5 = 75k?

Just wanted to know how much samples do I use when I train 5 sets.

Thanks

alontalmor · 2019-10-16T14:03:10Z

To use all the data just make sure sample_size=-1
It will use consume all the datasets until they are done. (even if they are not equally sized.. )

rsanjaykamath · 2019-10-16T18:15:10Z

Ok alright.
Got it now. Thanks

rsanjaykamath · 2019-10-26T14:48:28Z

Hi,

I'm trying to run pytorch-transformers code as you highlight in your repository using the same code. It works well for SQUAD 1 & 2, NewsQA. But when I try the exact same code for DROP and TriviaQA_Wiki it says "ValueError: num_samples should be a positive integer value, but got num_samples=0"

I see that in DROP_dev.json there is no answers at all, the "answers" field contain an empty list, and "is_impossible" is false.

juliuseizure · 2022-12-05T23:26:22Z

@rsanjaykamath This is a few years on, but I'm actually trying to do the same thing. Did you ever get a good answer for evaluating DROP using the SQuAD setup?

rsanjaykamath closed this as completed Sep 24, 2019

rsanjaykamath reopened this Oct 28, 2019

rsanjaykamath changed the title ~~KeyError while converting datasets.~~ Some datasets provided by download links (SQUAD 2.0 format) have no answers in them. Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some datasets provided by download links (SQUAD 2.0 format) have no answers in them. #12

Some datasets provided by download links (SQUAD 2.0 format) have no answers in them. #12

rsanjaykamath commented Sep 19, 2019

alontalmor commented Sep 19, 2019

rsanjaykamath commented Sep 20, 2019

rsanjaykamath commented Sep 20, 2019

alontalmor commented Sep 20, 2019 via email

alontalmor commented Sep 21, 2019

rsanjaykamath commented Sep 24, 2019

rsanjaykamath commented Oct 14, 2019

alontalmor commented Oct 15, 2019

rsanjaykamath commented Oct 16, 2019

alontalmor commented Oct 16, 2019

rsanjaykamath commented Oct 16, 2019

rsanjaykamath commented Oct 26, 2019

juliuseizure commented Dec 5, 2022

Some datasets provided by download links (SQUAD 2.0 format) have no answers in them. #12

Some datasets provided by download links (SQUAD 2.0 format) have no answers in them. #12

Comments

rsanjaykamath commented Sep 19, 2019

alontalmor commented Sep 19, 2019

rsanjaykamath commented Sep 20, 2019

rsanjaykamath commented Sep 20, 2019

alontalmor commented Sep 20, 2019 via email

alontalmor commented Sep 21, 2019

rsanjaykamath commented Sep 24, 2019

rsanjaykamath commented Oct 14, 2019

alontalmor commented Oct 15, 2019

rsanjaykamath commented Oct 16, 2019

alontalmor commented Oct 16, 2019

rsanjaykamath commented Oct 16, 2019

rsanjaykamath commented Oct 26, 2019

juliuseizure commented Dec 5, 2022