Found the links to download word2vec binary model trained on google n…

…ews corpus.
neural-dialogue-metrics · Mar 13, 2019 · 9a2c9a9 · 9a2c9a9
1 parent f7a1227
commit 9a2c9a9
Show file tree

Hide file tree

Showing 3 changed files with 14 additions and 2 deletions.
diff --git a/GoogleNewsCorpusEmbLink.md b/GoogleNewsCorpusEmbLink.md
@@ -0,0 +1,3 @@
+# Links to the data
+- https://doc-04-bc-docs.googleusercontent.com/docs/securesc/hjocr289sqh1r4455mj0jihan5v2ingr/pe4tmd0a5bd3ue6nc1lqmbb8iamd2ics/1552471200000/06848720943842814915/04145494130524406310/0B7XkCwpI5KDYNlNUTTlSS21pQmM?e=download&nonce=dn92bkknfn7l6&user=04145494130524406310&hash=s68fjotcmst190dg9vsf54v9bplaqo0j
+- https://deeplearning4jblob.blob.core.windows.net/resources/wordvectors/GoogleNews-vectors-negative300.bin.gz
diff --git a/README.md b/README.md
@@ -53,6 +53,15 @@ You can install these deps with conda:
 The script assumes one example per line (e.g. one dialogue or one sentence per line), 
 where line n in `'path_to_ground_truth.txt'` matches that of line n in `'path_to_predictions.txt'`.
 
+# Recommended Word Embedding
+The word embedding you are recommended to use is the *Word2Vec* vectors trained on the *Google News Corpus*.
+This is also recommended by the original repository. To download this pre-trained embedding easily, here are some useful links:
+- [word2vec Google News model](https://github.com/mmihaltz/word2vec-GoogleNews-vectors.git) is a mirror of the google archive on Github,
+you need *git lfs* to be able to clone it.
+- [Google Code Archive](https://code.google.com/archive/p/word2vec/)
+- [Google Drive](https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit)
+
+
 # Where did the code come from?
 The main script `embedding_metrices.py` is adapted from [hed-dlg-truncated](https://github.com/julianser/hed-dlg-truncated).
 Thanks for their great script!

diff --git a/embedding_metrics.py b/embedding_metrics.py
@@ -29,7 +29,7 @@
 __authors__ = ("Chia-Wei Liu", "Iulian Vlad Serban")
 
 from random import randint
-from gensim.models import Word2Vec
+from gensim.models import KeyedVectors
 import numpy as np
 import argparse
 
@@ -186,7 +186,7 @@ def average(fileone, filetwo, w2v):
     args = parser.parse_args()
 
     print("loading embeddings file...")
-    w2v = Word2Vec.load_word2vec_format(args.embeddings, binary=True)
+    w2v = KeyedVectors.load_word2vec_format(args.embeddings, binary=True)
 
     r = average(args.ground_truth, args.predicted, w2v)
     print("Embedding Average Score: %f +/- %f ( %f )" % (r[0], r[1], r[2]))