Skip to content

drbh/pretty-good-embeddings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

git clone https://github.com/drbh/pretty-good-embeddings.git
cd pretty-good-embeddings
cargo run --example basic
# Input: Hello, world!
# Embeddings: [-0.03817713, 0.03291113, -0.0054594614, 0.014369917]

Semantic similarity between two sentences even when they do not share any words in common.

cargo run --example distance
# Distance: 0.19
# Input 1: begin immediately
# Input 2: start right away

# Distance: 0.22
# Input 1: highly skilled
# Input 2: extremely proficient

# Distance: 0.25
# Input 1: quickly approaching
# Input 2: rapidly nearing

# Distance: 0.31
# Input 1: gather information
# Input 2: collect data

# Distance: 0.47
# Input 1: quickly approaching
# Input 2: begin immediately

Exploring quantization to reduce the memory footprint of the embeddings and improve runtime performance.

cargo run --example bit_distance

# cargo run --example distance
# Memory size: 30720
# Memory size: 7680
# Quantized embedding is 75% smaller

# Distance: 198
# Input 1: begin immediately
# Input 2: start right away

# Distance: 207
# Input 1: quickly approaching
# Input 2: rapidly nearing

# Distance: 246
# Input 1: highly skilled
# Input 2: extremely proficient

# Distance: 268
# Input 1: gather information
# Input 2: collect data

We can use knn to find the nearest neighbors in the embedding space and use them to classify the input.

cargo run --example knn_classifier "ice cream"
# [
#     (
#         "food",
#         0.62412727,
#     ),
#     (
#         "food",
#         0.670159,
#     ),
#     (
#         "food",
#         0.67886794,
#     ),
# ]
cargo run --example code
# All embeddings are ready! Type a sentence to get the closest chunk of code.
# how do we initalize the client?

# --- line 21 ---
# .unwrap();

#         Self { environment }
#     }

#     pub fn init(&self, model_path: String) -> ClientSession {
#         let tokenizer_path = format!("{}/tokenizer.json", model_path);
#         let model_path = format!("{}/model.onnx", model_path);

#         // Create a new session with optimizations
# --- line 31 ---

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages