-
-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proximity Map implementation with support for incremental edits. #8686
base: main
Are you sure you want to change the base?
Conversation
43fd1e4
to
08f51c5
Compare
bff6950
to
a189e19
Compare
@nicktobey DOLT
|
@nicktobey DOLT
|
@coffeegoddd DOLT
|
@nicktobey DOLT
|
7f6b0fc
to
e712abf
Compare
@nicktobey DOLT
|
e712abf
to
3d20dd6
Compare
@nicktobey DOLT
|
3d20dd6
to
eea16a4
Compare
@nicktobey DOLT
|
…ncy on github.com/esote/minmaxheap)
This also removes the `offset` parameter from table.NewTableIterator because it's unused.
eea16a4
to
3f75e65
Compare
@nicktobey DOLT
|
@nicktobey DOLT
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although the new types in prolly
could probably use a little more documentation on how they fit together, it was initially confusing.
@reltuk should take a look at the files in that package as well, it's not my area of expertise.
mustRebuild bool | ||
} | ||
|
||
func (f ProximityFlusher) visitNode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably deserves a comment describing algorithm within at high level, documenting params
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
…turing it in a closure.
@nicktobey DOLT
|
@nicktobey DOLT
|
@nicktobey DOLT
|
… same hash as if they were made directly.
@nicktobey DOLT
|
@coffeegoddd DOLT
|
…om the parent, so cap the tree level used when rebuilding the subtree.
@nicktobey DOLT
|
@coffeegoddd DOLT
|
@nicktobey DOLT
|
Based on #8408, now with additional functionality for incremental changes to indexes.
This is a large-scale PR merging several features into main, all designed for supporting vector indexes.
Vector Index Nodes
1defec9 adds a new message/node type: the vector index node. This message stores a node in a Merkle tree index whose structure is based on some distance measure in a multi-dimensional space: at each level, keys are arranged such that a key is closer to its parent key than any other key in the parent node.
One consequence of this design is that it's not possible to put a hard limit on the number of keys contained in each node. We can control the mean node size, but there's always a non-zero chance that a node will be large enough to break our usual encoding scheme (which uses 16-bit ints to store message offsets). To address this, the vector index node uses 32-bit ints to store message offsets instead of the 16 bits used by other node types.
Proximity Map
A ProximityMap is a new implementation of Dolt's Map, a data structure built on Merkle trees that maps key bytestrings to value bytestrings. The ProximityMap is backed by a tree of vector index nodes, allowing it to perform an approximate nearest neighbor search.
Proximity Maps resemble other Prolly Maps, but have the following invariants:
Notably, while the keys of an individual node are sorted, walking all of a vector indexes keys in standard iteration order will not be sorted.
28b7065 and 6b91635 contain the bulk of the ProximityMap implementation.
The bulk of the changes are in these three commits. Each of the other commits is a smaller self-contained change necessary to support vector indexes.