Skip to content

🏅 A state-of-the-art data structure that enables fast point and range searches in arrays of billions of items using orders of magnitude less space than traditional indexes

License

Notifications You must be signed in to change notification settings

RyanMarcus/PGM-index

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The PGM-index

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast point and range searches in arrays of billions of items using orders of magnitude less space than traditional indexes.

Website | Documentation | Paper | A³ Lab

Travis (.org) License GitHub stars

Building the code

To download and build the library use the following commands:

git clone https://github.com/gvinciguerra/PGM-index.git
cd PGM-index
git submodule update --init --recursive
cmake . -DCMAKE_BUILD_TYPE=Release
make -j8

Now you can run the unit tests via:

./test/tests

Minimal example

#include <vector>
#include <cstdlib>
#include <iostream>
#include <algorithm>
#include "pgm_index.hpp"

int main(int argc, char **argv) {
    // Generate some random data
    std::vector<int> dataset(1000000);
    std::generate(dataset.begin(), dataset.end(), std::rand);
    dataset.push_back(42);
    std::sort(dataset.begin(), dataset.end());

    // Construct the PGM-index
    const int error = 128;
    PGMIndex<int, error> index(dataset);

    // Query the PGM-index
    auto q = 42;
    auto approx_range = index.find_approximate_position(q);
    auto lo = dataset.cbegin() + approx_range.lo;
    auto hi = dataset.cbegin() + approx_range.hi;
    std::cout << *std::lower_bound(lo, hi, q);

    return 0;
}

License

This project is licensed under the terms of the GNU General Public License v3.0.

If you use the library please put a link to the website and cite the following paper:

@misc{FerraginaVinciguerra:2019,
  Archiveprefix = {arXiv},
  Author = {Paolo Ferragina and Giorgio Vinciguerra},
  Day = {14},
  Eprint = {1910.06169},
  Month = {10},
  Primaryclass = {cs.DS},
  Title = {The PGM-index: a multicriteria, compressed and learned approach to data indexing},
  Url = {https://arxiv.org/abs/1910.06169},
  Year = {2019}}

About

🏅 A state-of-the-art data structure that enables fast point and range searches in arrays of billions of items using orders of magnitude less space than traditional indexes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 83.8%
  • HTML 16.2%