Kd trees for Julia.
Currently supports KNN-search and finding all points inside an hyper sphere centered at a given point
Care has been taken with regards to performance. The tree is for example not naively implemented as nodes pointing to other nodes but instead as a collection of densely packed arrays. This gives better cache locality. This however means that the tree is immutable and new points can not be entered into the tree after it has been created.
There are some benchmarks for the creation of the tree and the different searches in the benchmark folder.
Kristoffer Carlsson (@KristofferC)
The tree is created with the command:
using KDTrees
data = rand(3,10^3)
tree = KDTree(data, leaf_size=5)
The data
argument for the tree should be a matrix of floats of dimension (n_dim, n_points)
. The leaf_size
determines for what number of points the tree should stop splitting. 5 is a good number and is also the default value.
The exported query_ball_point(tree, point, radius)
finds all points inside a hyper sphere centered at a given point with the given radius. The function
returns a sorted list of the indices of the points in the sphere.
using KDTrees
tree = KDTree(randn(3, 1000))
query_ball_point(tree, [0.0, 0.0, 0.0], 0.4)
gives the indices:
8-element Array{Int64,1}:
184
199
307
586
646
680
849
906
926
The exported function k_nearest_neighbour(tree, point, k)
finds the k nearest neighbours to a given point. The function returns a tuple of two lists with the indices and the distances from the given points respectively. These are sorted in the order of smallest to largest distance.
using KDTrees
tree = KDTree(randn(3, 1000))
k_nearest_neighbour(tree, [0.0, 0.0, 0.0], 5)
gives both the indices and distances:
([300,119,206,180,845],[0.052019,0.200885,0.220441,0.22447,0.235882])
The benchmarks have been made with computer with a 4 core Intel i5-2500K @ 3.3 GHz with Julia v0.4.0-dev+3034.
Clicking on a plot takes you to the Plotly site for the plot where the exact data can be seen.
- Add proper benchmarks, compare with others implementations. Update: Partly done
- Add other measures than Euclidean distance.
- Use a heap for storing the K best points in KNN instead of a linear array (should only matter for large K).
Contributions are more than welcome. If you have an idea that would make the tree have better performance or be more general please create a PR. Make sure you run the benchmarks before and after your changes.