-
Notifications
You must be signed in to change notification settings - Fork 109
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve this project to reach the scores reported in the original pap…
…er (#5) and make it more efficient and more readable. major modifications: - use one-hot encoding for POS & NER features instead of using seperate embeddings; change dropout to 0.4 accordingly. - if words are identical after normalization, their embeddings are averaged. - only replace one invisible character ('\s') with one space. More training examples are reserved because merging multiple spaces to one results in misalignment of answers. improved efficiency: - Code in "prepro.py" is refactored and made more efficient. Time consumption of "prepro.py" is reduced from 515s to 172s (3x faster) on a machine with 8 i7 CPUs and 16GB RAM. - function "get_answer_index" is simplified and much more readable other improvements: - id of each example is reserved for ease of debugging. - vocabulary of POS and NER tags is saved for ease of debugging. - other tiny improvements to make the code more readable.
- Loading branch information
Showing
9 changed files
with
2,476 additions
and
3,299 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.