Word Embedding in Go

wego is the implementations for word embedding (a.k.a word representation) models in Go. Word embedding makes word's meaning, structure, and concept mapping into vector space with low dimension. For representative instance:

Vector("King") - Vector("Man") + Vector("Woman") = Vector("Queen")

Like this example, models generate word vectors that could calculate word meaning by arithmetic operations for other vectors. wego provides CLI that includes not only training model for embedding but also similarity search between words.

Models

🎃 Word2Vec: Distributed Representations of Words and Phrases and their Compositionality [pdf]

🎃 GloVe: Global Vectors for Word Representation [pdf]

🎃 LexVec: Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations [pdf]

Why Go?

Data Science in Go @chewxy

Installation

$ go get -u github.com/ynqa/wego
$ bin/wego -h

Demo

Run the following command, and start to download text8 corpus and train them by Word2Vec.

$ sh scripts/demo.sh

Usage

Usage:
  wego [flags]
  wego [command]

Available Commands:
  glove       GloVe: Global Vectors for Word Representation
  help        Help about any command
  lexvec      Lexvec: Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations
  repl        Search similar words with REPL mode
  search      Search similar words
  word2vec    Word2Vec: Continuous Bag-of-Words and Skip-gram model

Flags:
  -h, --help   help for wego

File I/O

Input

Input corpus requires the format that is divided by space between words like text8 since wego parse with scanner.Split(bufio.ScanWords).

Output

Wego outputs a .txt file that is described word vector is subject to the following format:

<word> <value1> <value2> ...

Example

It's also able to train word vectors using wego APIs. Examples are as follows.

package main

import (
	"os"

	"github.com/ynqa/wego/pkg/builder"
	"github.com/ynqa/wego/pkg/model/word2vec"
)

func main() {
	b := builder.NewWord2vecBuilder()

	b.Dimension(10).
		Window(5).
		Model(word2vec.CBOW).
		Optimizer(word2vec.NEGATIVE_SAMPLING).
		NegativeSampleSize(5).
		Verbose()

	m, err := b.Build()
	if err != nil {
		// Failed to build word2vec.
	}

	input, _ := os.Open("text8")

	// Start to Train.
	if err = m.Train(input); err != nil {
		// Failed to train by word2vec.
	}

	// Save word vectors to a text file.
	m.Save("example.txt")
}

Name		Name	Last commit message	Last commit date
Latest commit History 224 Commits
.github		.github
cmd		cmd
examples		examples
pkg		pkg
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
wego.go		wego.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Word Embedding in Go

Models

Why Go?

Installation

Demo

Usage

File I/O

Input

Output

Example

About

Uh oh!

Releases

Packages

Languages

License

dormael/wego

Folders and files

Latest commit

History

Repository files navigation

Word Embedding in Go

Models

Why Go?

Installation

Demo

Usage

File I/O

Input

Output

Example

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages