Skip to content

Commit

Permalink
Add TimeWeightedVectorStoreRetriever to retrievers (langchain-ai#911)
Browse files Browse the repository at this point in the history
* feat: Introduce Time-Weighted Retrieval for Relevant Documents

- Add new file `time_weighted.ts` with functions to retrieve, add, and get documents from a vector store
- Define `TimeWeightedVectorStoreRetriever` class that inherits from `BaseRetriever`
- Include functions to calculate scores and hours passed to help identify salient documents

* test: Add tests for time weighted retrievers.

- Add new test file for time weighted retrievers
- Implement several tests for TimeWeightedVectorStoreRetriever's getRelevantDocuments method with different data and searchKwargs values

* refactor: Refactor TimeWeightedVectorStoreRetriever for better organization.

- Refactored code in TimeWeightedVectorStoreRetriever for better organization
- Combined memoryDocsAndScores and salientDocsAndScores for easy retrieval
- Changed method names for better descriptive naming consistency

* refactor: Reorganize retrievers exports in langchain codebase

- Reorganize exports in the `index.ts` file in the `retrievers` directory
- Move `MetalRetriever` and `RemoteLangChainRetriever` exports to the top
- Add `TimeWeightedVectorStoreRetriever` export at the bottom for improved organization

* feat: Refactor TimeWeightedVectorStoreRetriever interface and add tests

- Improve time-weighted retriever functionality
- Add JSDoc comments for key retriever functions
- Refactor tests for better isolation and add tests for new functionality
- Expose new interface for retriever in index file

* refactor: Refactor time-weighted retriever constructor and add JSDoc comments

- Simplified the constructor arguments using optional chaining and default values.
- Improved code clarity by adding JSDoc comments for private functions.
- Renamed variable for better understanding of its purpose.
- Removed unnecessary keyword in a method.

* refactor: Refactor test file naming convention.

- Rename `time_weighted3.js` to `time_weighted.js` in the `time_weighted.test.ts` file in `langchain/src/retrievers/tests`.
- No significant changes to other files were made.

* style: Improve code style in retrievers index file

- Add missing comma to SupabaseHybridSearchParams
- Reformat import statements in retrievers/index.ts
- Improve code readability and maintainability in langchain/src/retrievers/

* style: Remove unused eslint-disable comments in test file

- Remove unnecessary eslint-disable comments in the `time_weighted.test.ts` file
- Improve code readability and maintainability
- Enhance overall code quality and consistency

* Lint

* Unod changes to index

* Add entrypoint

* Adds thrown error when using TimeWeightedVectorStoreRetriever on unsupported vector stores, adds docs and test

---------

Co-authored-by: Nuno Campos <[email protected]>
Co-authored-by: Jacob Lee <[email protected]>
  • Loading branch information
3 people authored May 4, 2023
1 parent 220c47e commit 0eccc85
Show file tree
Hide file tree
Showing 14 changed files with 664 additions and 0 deletions.
23 changes: 23 additions & 0 deletions docs/docs/modules/indexes/retrievers/time-weighted-retriever.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Time-Weighted Retriever

A Time-Weighted Retriever is a retriever that takes into account recency in addition to similarity. The scoring algorithm is:

```typescript
let score = (1.0 - this.decayRate) ** hoursPassed + vectorRelevance;
```

Notably, `hoursPassed` above refers to the time since the object in the retriever was last accessed, not since it was created. This means that frequently accessed objects remain "fresh" and score higher.

`this.decayRate` is a configurable decimal number between 0 and 1. A lower number means that documents will be "remembered" for longer, while a higher number strongly weights more recently accessed documents.

Note that setting a decay rate of exactly 0 or 1 makes `hoursPassed` irrelevant and makes this retriever equivalent to a standard vector lookup.

## Usage

This example shows how to intialize a `TimeWeightedVectorStoreRetriever` with a vector store.
It is important to note that due to required metadata, all documents must be added to the backing vector store using the `addDocuments` method on the **retriever**, not the vector store itself.

import CodeBlock from "@theme/CodeBlock";
import Example from "@examples/retrievers/time-weighted-retriever.ts";

<CodeBlock language="typescript">{Example}</CodeBlock>
47 changes: 47 additions & 0 deletions examples/src/retrievers/time-weighted-retriever.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import { TimeWeightedVectorStoreRetriever } from "langchain/retrievers/time_weighted";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings } from "langchain/embeddings/openai";

const vectorStore = new MemoryVectorStore(new OpenAIEmbeddings());

const retriever = new TimeWeightedVectorStoreRetriever({
vectorStore,
memoryStream: [],
searchKwargs: 2,
});

const documents = [
"My name is John.",
"My name is Bob.",
"My favourite food is pizza.",
"My favourite food is pasta.",
"My favourite food is sushi.",
].map((pageContent) => ({ pageContent, metadata: {} }));

// All documents must be added using this method on the retriever (not the vector store!)
// so that the correct access history metadata is populated
await retriever.addDocuments(documents);

const results1 = await retriever.getRelevantDocuments(
"What is my favourite food?"
);

console.log(results1);

/*
[
Document { pageContent: 'My favourite food is pasta.', metadata: {} }
]
*/

const results2 = await retriever.getRelevantDocuments(
"What is my favourite food?"
);

console.log(results2);

/*
[
Document { pageContent: 'My favourite food is pasta.', metadata: {} }
]
*/
3 changes: 3 additions & 0 deletions langchain/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,9 @@ retrievers/contextual_compression.d.ts
retrievers/document_compressors.cjs
retrievers/document_compressors.js
retrievers/document_compressors.d.ts
retrievers/time_weighted.cjs
retrievers/time_weighted.js
retrievers/time_weighted.d.ts
retrievers/document_compressors/chain_extract.cjs
retrievers/document_compressors/chain_extract.js
retrievers/document_compressors/chain_extract.d.ts
Expand Down
8 changes: 8 additions & 0 deletions langchain/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,9 @@
"retrievers/document_compressors.cjs",
"retrievers/document_compressors.js",
"retrievers/document_compressors.d.ts",
"retrievers/time_weighted.cjs",
"retrievers/time_weighted.js",
"retrievers/time_weighted.d.ts",
"retrievers/document_compressors/chain_extract.cjs",
"retrievers/document_compressors/chain_extract.js",
"retrievers/document_compressors/chain_extract.d.ts",
Expand Down Expand Up @@ -926,6 +929,11 @@
"import": "./retrievers/document_compressors.js",
"require": "./retrievers/document_compressors.cjs"
},
"./retrievers/time_weighted": {
"types": "./retrievers/time_weighted.d.ts",
"import": "./retrievers/time_weighted.js",
"require": "./retrievers/time_weighted.cjs"
},
"./retrievers/document_compressors/chain_extract": {
"types": "./retrievers/document_compressors/chain_extract.d.ts",
"import": "./retrievers/document_compressors/chain_extract.js",
Expand Down
1 change: 1 addition & 0 deletions langchain/scripts/create-entrypoints.js
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ const entrypoints = {
"retrievers/databerry": "retrievers/databerry",
"retrievers/contextual_compression": "retrievers/contextual_compression",
"retrievers/document_compressors": "retrievers/document_compressors/index",
"retrievers/time_weighted": "retrievers/time_weighted",
"retrievers/document_compressors/chain_extract":
"retrievers/document_compressors/chain_extract",
"retrievers/hyde": "retrievers/hyde",
Expand Down
Loading

0 comments on commit 0eccc85

Please sign in to comment.