Add TimeWeightedVectorStoreRetriever to retrievers (langchain-ai#911)

* feat: Introduce Time-Weighted Retrieval for Relevant Documents - Add new file `time_weighted.ts` with functions to retrieve, add, and get documents from a vector store - Define `TimeWeightedVectorStoreRetriever` class that inherits from `BaseRetriever` - Include functions to calculate scores and hours passed to help identify salient documents * test: Add tests for time weighted retrievers. - Add new test file for time weighted retrievers - Implement several tests for TimeWeightedVectorStoreRetriever's getRelevantDocuments method with different data and searchKwargs values * refactor: Refactor TimeWeightedVectorStoreRetriever for better organization. - Refactored code in TimeWeightedVectorStoreRetriever for better organization - Combined memoryDocsAndScores and salientDocsAndScores for easy retrieval - Changed method names for better descriptive naming consistency * refactor: Reorganize retrievers exports in langchain codebase - Reorganize exports in the `index.ts` file in the `retrievers` directory - Move `MetalRetriever` and `RemoteLangChainRetriever` exports to the top - Add `TimeWeightedVectorStoreRetriever` export at the bottom for improved organization * feat: Refactor TimeWeightedVectorStoreRetriever interface and add tests - Improve time-weighted retriever functionality - Add JSDoc comments for key retriever functions - Refactor tests for better isolation and add tests for new functionality - Expose new interface for retriever in index file * refactor: Refactor time-weighted retriever constructor and add JSDoc comments - Simplified the constructor arguments using optional chaining and default values. - Improved code clarity by adding JSDoc comments for private functions. - Renamed variable for better understanding of its purpose. - Removed unnecessary keyword in a method. * refactor: Refactor test file naming convention. - Rename `time_weighted3.js` to `time_weighted.js` in the `time_weighted.test.ts` file in `langchain/src/retrievers/tests`. - No significant changes to other files were made. * style: Improve code style in retrievers index file - Add missing comma to SupabaseHybridSearchParams - Reformat import statements in retrievers/index.ts - Improve code readability and maintainability in langchain/src/retrievers/ * style: Remove unused eslint-disable comments in test file - Remove unnecessary eslint-disable comments in the `time_weighted.test.ts` file - Improve code readability and maintainability - Enhance overall code quality and consistency * Lint * Unod changes to index * Add entrypoint * Adds thrown error when using TimeWeightedVectorStoreRetriever on unsupported vector stores, adds docs and test --------- Co-authored-by: Nuno Campos <[email protected]> Co-authored-by: Jacob Lee <[email protected]>
Vinzhuo · May 4, 2023 · 0eccc85 · 0eccc85
1 parent 220c47e
commit 0eccc85
Show file tree

Hide file tree

Showing 14 changed files with 664 additions and 0 deletions.
diff --git a/docs/docs/modules/indexes/retrievers/time-weighted-retriever.mdx b/docs/docs/modules/indexes/retrievers/time-weighted-retriever.mdx
@@ -0,0 +1,23 @@
+# Time-Weighted Retriever
+
+A Time-Weighted Retriever is a retriever that takes into account recency in addition to similarity. The scoring algorithm is:
+
+```typescript
+let score = (1.0 - this.decayRate) ** hoursPassed + vectorRelevance;
+```
+
+Notably, `hoursPassed` above refers to the time since the object in the retriever was last accessed, not since it was created. This means that frequently accessed objects remain "fresh" and score higher.
+
+`this.decayRate` is a configurable decimal number between 0 and 1. A lower number means that documents will be "remembered" for longer, while a higher number strongly weights more recently accessed documents.
+
+Note that setting a decay rate of exactly 0 or 1 makes `hoursPassed` irrelevant and makes this retriever equivalent to a standard vector lookup.
+
+## Usage
+
+This example shows how to intialize a `TimeWeightedVectorStoreRetriever` with a vector store.
+It is important to note that due to required metadata, all documents must be added to the backing vector store using the `addDocuments` method on the **retriever**, not the vector store itself.
+
+import CodeBlock from "@theme/CodeBlock";
+import Example from "@examples/retrievers/time-weighted-retriever.ts";
+
+<CodeBlock language="typescript">{Example}</CodeBlock>
diff --git a/examples/src/retrievers/time-weighted-retriever.ts b/examples/src/retrievers/time-weighted-retriever.ts
@@ -0,0 +1,47 @@
+import { TimeWeightedVectorStoreRetriever } from "langchain/retrievers/time_weighted";
+import { MemoryVectorStore } from "langchain/vectorstores/memory";
+import { OpenAIEmbeddings } from "langchain/embeddings/openai";
+
+const vectorStore = new MemoryVectorStore(new OpenAIEmbeddings());
+
+const retriever = new TimeWeightedVectorStoreRetriever({
+  vectorStore,
+  memoryStream: [],
+  searchKwargs: 2,
+});
+
+const documents = [
+  "My name is John.",
+  "My name is Bob.",
+  "My favourite food is pizza.",
+  "My favourite food is pasta.",
+  "My favourite food is sushi.",
+].map((pageContent) => ({ pageContent, metadata: {} }));
+
+// All documents must be added using this method on the retriever (not the vector store!)
+// so that the correct access history metadata is populated
+await retriever.addDocuments(documents);
+
+const results1 = await retriever.getRelevantDocuments(
+  "What is my favourite food?"
+);
+
+console.log(results1);
+
+/*
+[
+  Document { pageContent: 'My favourite food is pasta.', metadata: {} }
+]
+ */
+
+const results2 = await retriever.getRelevantDocuments(
+  "What is my favourite food?"
+);
+
+console.log(results2);
+
+/*
+[
+  Document { pageContent: 'My favourite food is pasta.', metadata: {} }
+]
+ */
diff --git a/langchain/.gitignore b/langchain/.gitignore
@@ -235,6 +235,9 @@ retrievers/contextual_compression.d.ts
 retrievers/document_compressors.cjs
 retrievers/document_compressors.js
 retrievers/document_compressors.d.ts
+retrievers/time_weighted.cjs
+retrievers/time_weighted.js
+retrievers/time_weighted.d.ts
 retrievers/document_compressors/chain_extract.cjs
 retrievers/document_compressors/chain_extract.js
 retrievers/document_compressors/chain_extract.d.ts

diff --git a/langchain/package.json b/langchain/package.json
@@ -247,6 +247,9 @@
     "retrievers/document_compressors.cjs",
     "retrievers/document_compressors.js",
     "retrievers/document_compressors.d.ts",
+    "retrievers/time_weighted.cjs",
+    "retrievers/time_weighted.js",
+    "retrievers/time_weighted.d.ts",
     "retrievers/document_compressors/chain_extract.cjs",
     "retrievers/document_compressors/chain_extract.js",
     "retrievers/document_compressors/chain_extract.d.ts",
@@ -926,6 +929,11 @@
       "import": "./retrievers/document_compressors.js",
       "require": "./retrievers/document_compressors.cjs"
     },
+    "./retrievers/time_weighted": {
+      "types": "./retrievers/time_weighted.d.ts",
+      "import": "./retrievers/time_weighted.js",
+      "require": "./retrievers/time_weighted.cjs"
+    },
     "./retrievers/document_compressors/chain_extract": {
       "types": "./retrievers/document_compressors/chain_extract.d.ts",
       "import": "./retrievers/document_compressors/chain_extract.js",

diff --git a/langchain/scripts/create-entrypoints.js b/langchain/scripts/create-entrypoints.js
@@ -106,6 +106,7 @@ const entrypoints = {
   "retrievers/databerry": "retrievers/databerry",
   "retrievers/contextual_compression": "retrievers/contextual_compression",
   "retrievers/document_compressors": "retrievers/document_compressors/index",
+  "retrievers/time_weighted": "retrievers/time_weighted",
   "retrievers/document_compressors/chain_extract":
     "retrievers/document_compressors/chain_extract",
   "retrievers/hyde": "retrievers/hyde",