-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Describe the bug
I'm investigating adopting tanstack/query in a large production application called Notion. We need to support tens of thousands of queries on screen at once.
The current implementation of QueryCache.findAll unnecessarily does a linear search of queries. This results in allocations O(N) where N = number of queries, and O(n) calls to hashQueryKeyByOptions
; for queries with large objects in their key this is especially expensive.
This happens even when we use an exact
QueryFilters. This is necessary because we don't know the correct hash function to convert the QueryKey to a queryHash string for O(1) lookup.
I have two ideas to optimize this: one gives approximately O(1) for exact filters at cost of additional book-keeping, another is to index primitive key prefixes to reduce total scope.
O(1) exact filter lookup
My idea to fix is to keep a registry of known queryKeyHashFn
functions in use, and then when performing findAll
for an exact key, first try converting the given key to with these for O(unique queryKeyHashFn) maximum complexity.
I think this idea is probably a pure win with little downside, although maybe I'm missing something.
Pseudocode below.
class RefCountSet<T> {
#refcounts = new Map<T, number>
[Symbol.iterator]() { return this.#refcounts.keys() }
add(value: T) {
let n = this.#refcounts.get(value) ?? 0
this.#refcounts.set(value, n + 1)
}
remove(value: T) {
let n = this.#refcounts.get(value)
if (n === undefined) { return }
n--
if (n === 0) {
this.#refcounts.delete(value)
} else {
this.#refcounts.set(value, n)
}
}
}
// Somewhere; I'm not sure what the appropriate place to store this is. Maybe part of QueryCache?
// Making it global is easier in my example code.
export const allQueryKeyHashFns = new RefCountSet<QueryKeyHashFunction<any>>
// in Query#setOptions
// https://github.com/TanStack/query/blob/main/packages/query-core/src/query.ts#L202-L208
class Query {
setOptions(options) {
const oldHashFn = this.options?.queryKeyHashFn
this.options = { ...this.#defaultOptions, ...options }
const newHashFn = this.options?.queryKeyHashFn
if (oldHashFn !== newHashFn) {
if (oldHashFn !== hashKey) { allQueryKeyHashFns.remove(oldHashFn) }
if (newHashFn !== hashKey) { allQueryKeyHashFns.add(newHashFn) }
}
this.updateGcTime(this.options.gcTime)
}
}
class QueryCache {
// Call from #find and #findAll when filters.exact && filters.queryKey
private findExact<
TQueryFnData = unknown,
TError = DefaultError,
TData = TQueryFnData,
>(
filters: QueryFilters<any>,
): Query<TQueryFnData, TError, TData> | undefined {
const tryHashFn = (hashFn: QueryKeyHashFunction<any>) => {
try {
const query = this.get(hashFn(filters.queryKey))
if (query && matchQuery(filters, query)) {
// Confirmed the query actually uses the hash function we tried
// and matches the non-queryKey filters. Could be further-optimized
// to avoid calling hashFn a second time
return query
} else {
return undefined
}
} catch (error) {
return undefined
}
}
let query = tryHashFn(hashKey)
if (!query) {
for (const hashFn of allQueryKeyHashFns) {
query = tryHashFn(hashFn)
if (query) {
break
}
}
}
return query as unknown as Query<TQueryFnData, TError, TData> | undefined
}
}
Key prefix indexing
For prefix query matching, we can index the primitive prefix of the query key in a map-prefix-trie. I'm not sure what the exact name of this data structure is, maybe just "trie" is the right word.
There is a tradeoff here between theoretical complexity advantages versus the time & space required for trie book-keeping. I haven't benchmarked to understand the exact tradeoff between the trie book-keeping and the current "just scan".
// Note: RefCountSet could be replaced with normal Set here
// if it holds that a Query object can only be associated exactly one queryPath.
// I couldn't confirm from reading the code that this assumption holds
// given arbitrary calls to Query#setOptions
type MapTrieNode<TValue> = {
key: Primitive
/**
* - Entries who's TKey path leads exactly to this node,
* therefor their path is all primitives.
* ourPath = [p1, p2, p3, ...pN]
* theirPath = [p1, p2, p3, ...pN]
*/
exact?: RefCountSet<TValue>
/**
* - Entries who's path after this point contains non-primitive keys.
* Such entries cannot be looked up by value deeper in the trie.
* Implies their TKey path != this node's keyPath.
* ourPath = [p1, p2, p3, ...pN]
* theirPath = [p1, p2, p3, ...pN, nonPrimitive, ...]
*/
nonPrimitiveSuffix?: RefCountSet<TValue>
/** Child nodes storing entries who's TKey path is prefixed with this node's path. */
children?: Map<Primitive, MapTrieNode<TValue>>
}
/** Path length is always 1 greater than the key length, as it includes the root node. */
function traverse<TValue>(
root: MapTrieNode<TValue>,
key: QueryKey,
// May create a child node if needed
lookup: (
parent: MapTrieNode<TValue>,
key: Primitive,
) => MapTrieNode<TValue> | undefined,
): Array<MapTrieNode<TValue>> {
const path: Array<MapTrieNode<TValue>> = [root]
let node: MapTrieNode<TValue> | undefined = root
for (let i = 0; i < key.length && node; i++) {
const keyPart = key[i]
if (isPrimitive(keyPart)) {
node = lookup(node, keyPart)
} else {
node = undefined
}
if (node) {
path.push(node)
}
}
return path
}
function gcPath(path: Array<MapTrieNode<any>>): void {
if (path.length === 0) {
return
}
for (let i = path.length - 1; i > 0; i--) {
const node = path[i]
if (!node) {
throw new Error('Should never occur (bug in MapTrie)')
}
if (
node.exact?.size ||
node.nonPrimitiveSuffix?.size ||
node.children?.size
) {
// Has data. Do not GC.
return
}
const parent = path[i - 1]
parent?.children?.delete(node.key)
}
}
class MapTrieSet<TKey extends QueryKey, TValue> {
#root: MapTrieNode<TValue> = {
key: undefined,
}
add(key: TKey, value: TValue): void {
const path = traverse(this.#root, key, (parent, keyPart) => {
parent.children ??= new Map()
let child = parent.children.get(keyPart)
if (!child) {
child = { key: keyPart }
parent.children.set(keyPart, child)
}
return child
})
const lastPathNode = path[path.length - 1]
if (!lastPathNode) {
throw new Error('Should never occur (bug in MapTrie)')
}
if (key.length === path.length - 1) {
lastPathNode.exact ??= new RefCountSet()
lastPathNode.exact.add(value)
} else {
lastPathNode.nonPrimitiveSuffix ??= new RefCountSet()
lastPathNode.nonPrimitiveSuffix.add(value)
}
}
remove(key: TKey, value: TValue): void {
const path = traverse(this.#root, key, (parent, keyPart) =>
parent.children?.get(keyPart),
)
const lastPathNode = path[path.length - 1]
if (!lastPathNode) {
throw new Error('Should never occur (bug in MapTrie)')
}
if (key.length === path.length - 1) {
lastPathNode.exact?.remove(value)
gcPath(path)
} else if (!isPrimitive(key[path.length - 1])) {
lastPathNode.nonPrimitiveSuffix?.remove(value)
gcPath(path)
}
}
/**
* Returns all values that match the given key:
* Either the value has the same key and is all primitives,
* Or the value's key is a suffix of the given key and contains a non-primitive key.
*/
getByPrefix(key: TKey): Iterable<TValue> | undefined {
let miss = false
const path = traverse(this.#root, key, (parent, keyPart) => {
const child = parent.children?.get(keyPart)
if (!child) {
miss = true
return undefined
}
return child
})
// Failed to look up one of the primitive keys in the path.
// This means there's no match at all.
// Appears to be incorrectly reported by @typescript-eslint as always false :\
if (miss) {
return undefined
}
const lastNode = path[path.length - 1]
if (!lastNode) {
throw new Error('Should never occur (bug in MapTrie)')
}
// If the `key` is all primitives then we need to recurse to find all values
// that match the prefix, as these values will be stored deeper in the trie.
//
// If the `key` contains a non-primitive part after the returned path,
// then all possible values that have the suffix are stored in this node.
const isPrimitivePath = path.length - 1 === key.length
if (!isPrimitivePath) {
return lastNode.nonPrimitiveSuffix?.[Symbol.iterator]()
}
// See if we can avoid instantiating a generator
if (
!lastNode.children &&
(lastNode.exact || lastNode.nonPrimitiveSuffix) &&
!(lastNode.exact && lastNode.nonPrimitiveSuffix)
) {
return lastNode.exact ?? lastNode.nonPrimitiveSuffix
}
return (function* depthFirstPrefixIterator() {
const queue = [lastNode]
while (queue.length > 0) {
const node = queue.pop()!
if (node.exact) {
yield* node.exact
}
if (node.nonPrimitiveSuffix) {
yield* node.nonPrimitiveSuffix
}
if (node.children) {
for (const child of node.children.values()) {
queue.push(child)
}
}
}
})()
}
}
Your minimal, reproducible example
N/A
Steps to reproduce
- Put 50,000 queries into the cache.
- Call
queryClient.invalidateQueries
with a unique, non-existing queryKey.
Expected behavior
I expect O(1) time and space used to invalidate a single exact query key.
I expect prefix invalidation does not take O(N) time and space when no query matches the prefix.
How often does this bug happen?
Every time
Screenshots or Videos
No response
Platform
Any platform
Tanstack Query adapter
None
TanStack Query version
5.85.3
TypeScript version
No response
Additional context
I'm not sure if this belongs better in discussion, but I see it as more of a bug than a feature request.
I am happy to contribute a fix, but wanted to open the issue first to see what y'all think and if an improvement here would be welcome.