Short wikipedia articles lookup using Google's USE (Universal Sentence Encoder) and Annoy (Approximate Nearest Neighbors Oh Yeah)
git clone https://github.com/jaganlal/wiki-use-annoy.git
cd wiki-use-annoy/
pip install -r requirements.txt
-
Download the
universal-sentence-encoder-large
model usingdownload-use.py
scriptpython download-use.py
-
Build annoy index for the
short-wiki.csv
filepython build-short-wiki-annoy-index.py
-
Find the similarties by providing the id
python find-similar-wiki-articles.py
Key in the id, say for example music-wikipedia
.
You'll see the following results (in the form of id for similicity)
pop-wikipedia
guitar-wikipedia
brain-wikipedia
world-wikipedia
science-wikipedia
malayalam-wikipedia
sourashtra-wikipedia
apple-wikipedia
usa-wikipedia
I started to create (short-wiki.csv) a short intro on some of the articles (source: wikipedia) about places, people, culture etc. So this application will lookup from that articles. Checkout short-wiki.csv
for more information on this. You can imagine this as a cleaned up data lookup. If you want to contribute (either code or data part), please feel free to fork it and create a PR.
As you have noticed, there are no error handlings
https://jaganlal.github.io/ui-sentence-similarity/
https://github.com/jaganlal/wiki-use-annoy-tf2/blob/master/README.md