GitHub - drbh/pretty-good-embeddings

git clone https://github.com/drbh/pretty-good-embeddings.git
cd pretty-good-embeddings
cargo run --example basic
# Input: Hello, world!
# Embeddings: [-0.03817713, 0.03291113, -0.0054594614, 0.014369917]

Semantic similarity between two sentences even when they do not share any words in common.

cargo run --example distance
# Distance: 0.19
# Input 1: begin immediately
# Input 2: start right away

# Distance: 0.22
# Input 1: highly skilled
# Input 2: extremely proficient

# Distance: 0.25
# Input 1: quickly approaching
# Input 2: rapidly nearing

# Distance: 0.31
# Input 1: gather information
# Input 2: collect data

# Distance: 0.47
# Input 1: quickly approaching
# Input 2: begin immediately

Exploring quantization to reduce the memory footprint of the embeddings and improve runtime performance.

cargo run --example bit_distance

# cargo run --example distance
# Memory size: 30720
# Memory size: 7680
# Quantized embedding is 75% smaller

# Distance: 198
# Input 1: begin immediately
# Input 2: start right away

# Distance: 207
# Input 1: quickly approaching
# Input 2: rapidly nearing

# Distance: 246
# Input 1: highly skilled
# Input 2: extremely proficient

# Distance: 268
# Input 1: gather information
# Input 2: collect data

We can use knn to find the nearest neighbors in the embedding space and use them to classify the input.

cargo run --example knn_classifier "ice cream"
# [
#     (
#         "food",
#         0.62412727,
#     ),
#     (
#         "food",
#         0.670159,
#     ),
#     (
#         "food",
#         0.67886794,
#     ),
# ]

cargo run --example code
# All embeddings are ready! Type a sentence to get the closest chunk of code.
# how do we initalize the client?

# --- line 21 ---
# .unwrap();

#         Self { environment }
#     }

#     pub fn init(&self, model_path: String) -> ClientSession {
#         let tokenizer_path = format!("{}/tokenizer.json", model_path);
#         let model_path = format!("{}/model.onnx", model_path);

#         // Create a new session with optimizations
# --- line 31 ---

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.cargo		.cargo
.devcontainer		.devcontainer
examples		examples
onnx		onnx
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

drbh/pretty-good-embeddings

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages