-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I predict on my own dataset? #16
Comments
You would need to save your dataset in the MultiQA format. This format is described in the dataset readme https://github.com/alontalmor/MultiQA/tree/master/datasets, and it also comes with a JSON-schema checker for the format you output. I think the fastest approach is just to copy the code for one of the datasets that close, say SQuAD1.1, make the changes needed, and build your dataset using: Hope this helps. |
Thanks for the info. I follow the MultiQA format to form the dataset.
It's good to have evaluation metric in evaluate command but usually we don't have golden labels in test data. |
Suppose I have a document and a question, I'd like to get the answer span and answer string.
What steps should I take to get what I want?
(I tried to format it as multiqa format, that is like
and dump it to
test.gz
and use predict likepython predict.py --model https://multiqa.s3.amazonaws.com/models/BERTBase/SQuAD1-1.tar.gz --dataset test.gz --dataset_name SQuAD --cuda_device 0
The text was updated successfully, but these errors were encountered: