forked from langchain-ai/langchainjs
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: MemoryVectorStore using mathjs cos (langchain-ai#753)
* feat: MemoryVectorStore using mathjs cos * feat: lint and export inmemoryvectorstore * Replace dependency, make it an int test, add entrypoint * Add docs --------- Co-authored-by: Nuno Campos <[email protected]>
- Loading branch information
1 parent
b5edb5c
commit b3fe15c
Showing
19 changed files
with
308 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
31 changes: 31 additions & 0 deletions
31
docs/docs/modules/indexes/vector_stores/integrations/memory.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
--- | ||
hide_table_of_contents: true | ||
sidebar_label: Memory | ||
sidebar_position: 1 | ||
--- | ||
|
||
import CodeBlock from "@theme/CodeBlock"; | ||
|
||
# `MemoryVectorStore` | ||
|
||
MemoryVectorStore is an in-memory, ephemeral vectorstore that stores embeddings in-memory and does an exact, linear search for the most similar embeddings. The default similarity metric is cosine similarity, but can be changed to any of the similarity metrics supported by [ml-distance](https://mljs.github.io/distance/modules/similarity.html). | ||
|
||
## Usage | ||
|
||
### Create a new index from texts | ||
|
||
import ExampleTexts from "@examples/indexes/vector_stores/memory.ts"; | ||
|
||
<CodeBlock language="typescript">{ExampleTexts}</CodeBlock> | ||
|
||
### Create a new index from a loader | ||
|
||
import ExampleLoader from "@examples/indexes/vector_stores/memory_fromdocs.ts"; | ||
|
||
<CodeBlock language="typescript">{ExampleLoader}</CodeBlock> | ||
|
||
### Use a custom similarity metric | ||
|
||
import ExampleCustom from "@examples/indexes/vector_stores/memory_custom_similarity.ts"; | ||
|
||
<CodeBlock language="typescript">{ExampleCustom}</CodeBlock> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
import { MemoryVectorStore } from "langchain/vectorstores/memory"; | ||
import { OpenAIEmbeddings } from "langchain/embeddings/openai"; | ||
|
||
export const run = async () => { | ||
const vectorStore = await MemoryVectorStore.fromTexts( | ||
["Hello world", "Bye bye", "hello nice world"], | ||
[{ id: 2 }, { id: 1 }, { id: 3 }], | ||
new OpenAIEmbeddings() | ||
); | ||
|
||
const resultOne = await vectorStore.similaritySearch("hello world", 1); | ||
console.log(resultOne); | ||
}; |
15 changes: 15 additions & 0 deletions
15
examples/src/indexes/vector_stores/memory_custom_similarity.ts
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
import { MemoryVectorStore } from "langchain/vectorstores/memory"; | ||
import { OpenAIEmbeddings } from "langchain/embeddings/openai"; | ||
import { similarity } from "ml-distance"; | ||
|
||
export const run = async () => { | ||
const vectorStore = await MemoryVectorStore.fromTexts( | ||
["Hello world", "Bye bye", "hello nice world"], | ||
[{ id: 2 }, { id: 1 }, { id: 3 }], | ||
new OpenAIEmbeddings(), | ||
{ similarity: similarity.pearson } | ||
); | ||
|
||
const resultOne = await vectorStore.similaritySearch("hello world", 1); | ||
console.log(resultOne); | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
import { MemoryVectorStore } from "langchain/vectorstores/memory"; | ||
import { OpenAIEmbeddings } from "langchain/embeddings/openai"; | ||
import { TextLoader } from "langchain/document_loaders/fs/text"; | ||
|
||
export const run = async () => { | ||
// Create docs with a loader | ||
const loader = new TextLoader( | ||
"src/document_loaders/example_data/example.txt" | ||
); | ||
const docs = await loader.load(); | ||
|
||
// Load the docs into the vector store | ||
const vectorStore = await MemoryVectorStore.fromDocuments( | ||
docs, | ||
new OpenAIEmbeddings() | ||
); | ||
|
||
// Search for the most similar document | ||
const resultOne = await vectorStore.similaritySearch("hello world", 1); | ||
|
||
console.log(resultOne); | ||
}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
import { similarity as ml_distance_similarity } from "ml-distance"; | ||
import { VectorStore } from "./base.js"; | ||
import { Embeddings } from "../embeddings/base.js"; | ||
import { Document } from "../document.js"; | ||
|
||
interface MemoryVector { | ||
content: string; | ||
embedding: number[]; | ||
// eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
metadata: Record<string, any>; | ||
} | ||
|
||
export interface MemoryVectorStoreArgs { | ||
similarity?: typeof ml_distance_similarity.cosine; | ||
} | ||
|
||
export class MemoryVectorStore extends VectorStore { | ||
memoryVectors: MemoryVector[] = []; | ||
|
||
similarity: typeof ml_distance_similarity.cosine; | ||
|
||
constructor( | ||
embeddings: Embeddings, | ||
{ similarity, ...rest }: MemoryVectorStoreArgs = {} | ||
) { | ||
super(embeddings, rest); | ||
|
||
this.similarity = similarity ?? ml_distance_similarity.cosine; | ||
} | ||
|
||
async addDocuments(documents: Document[]): Promise<void> { | ||
const texts = documents.map(({ pageContent }) => pageContent); | ||
return this.addVectors( | ||
await this.embeddings.embedDocuments(texts), | ||
documents | ||
); | ||
} | ||
|
||
async addVectors(vectors: number[][], documents: Document[]): Promise<void> { | ||
const memoryVectors = vectors.map((embedding, idx) => ({ | ||
content: documents[idx].pageContent, | ||
embedding, | ||
metadata: documents[idx].metadata, | ||
})); | ||
|
||
this.memoryVectors = this.memoryVectors.concat(memoryVectors); | ||
} | ||
|
||
async similaritySearchVectorWithScore( | ||
query: number[], | ||
k: number | ||
): Promise<[Document, number][]> { | ||
const searches = this.memoryVectors | ||
.map((vector, index) => ({ | ||
similarity: this.similarity(query, vector.embedding), | ||
index, | ||
})) | ||
.sort((a, b) => (a.similarity > b.similarity ? -1 : 0)) | ||
.slice(0, k); | ||
|
||
const result: [Document, number][] = searches.map((search) => [ | ||
new Document({ | ||
metadata: this.memoryVectors[search.index].metadata, | ||
pageContent: this.memoryVectors[search.index].content, | ||
}), | ||
search.similarity, | ||
]); | ||
|
||
return result; | ||
} | ||
|
||
static async fromTexts( | ||
texts: string[], | ||
metadatas: object[] | object, | ||
embeddings: Embeddings, | ||
dbConfig?: MemoryVectorStoreArgs | ||
): Promise<MemoryVectorStore> { | ||
const docs: Document[] = []; | ||
for (let i = 0; i < texts.length; i += 1) { | ||
const metadata = Array.isArray(metadatas) ? metadatas[i] : metadatas; | ||
const newDoc = new Document({ | ||
pageContent: texts[i], | ||
metadata, | ||
}); | ||
docs.push(newDoc); | ||
} | ||
return MemoryVectorStore.fromDocuments(docs, embeddings, dbConfig); | ||
} | ||
|
||
static async fromDocuments( | ||
docs: Document[], | ||
embeddings: Embeddings, | ||
dbConfig?: MemoryVectorStoreArgs | ||
): Promise<MemoryVectorStore> { | ||
const instance = new this(embeddings, dbConfig); | ||
await instance.addDocuments(docs); | ||
return instance; | ||
} | ||
|
||
static async fromExistingIndex( | ||
embeddings: Embeddings, | ||
dbConfig?: MemoryVectorStoreArgs | ||
): Promise<MemoryVectorStore> { | ||
const instance = new this(embeddings, dbConfig); | ||
return instance; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
import { test, expect } from "@jest/globals"; | ||
|
||
import { OpenAIEmbeddings } from "../../embeddings/openai.js"; | ||
import { Document } from "../../document.js"; | ||
import { MemoryVectorStore } from "../memory.js"; | ||
|
||
test("MemoryVectorStore with external ids", async () => { | ||
const embeddings = new OpenAIEmbeddings(); | ||
|
||
const store = new MemoryVectorStore(embeddings); | ||
|
||
expect(store).toBeDefined(); | ||
|
||
await store.addDocuments([ | ||
{ pageContent: "hello", metadata: { a: 1 } }, | ||
{ pageContent: "hi", metadata: { a: 1 } }, | ||
{ pageContent: "bye", metadata: { a: 1 } }, | ||
{ pageContent: "what's this", metadata: { a: 1 } }, | ||
]); | ||
|
||
const results = await store.similaritySearch("hello", 1); | ||
|
||
expect(results).toHaveLength(1); | ||
|
||
expect(results).toEqual([ | ||
new Document({ metadata: { a: 1 }, pageContent: "hello" }), | ||
]); | ||
}); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.