Setup • Unit Tests • Model Training • Web Application • Practices • Citation •
Grep is a database partitioning framework using graph embedding algorithms, which judiciously selects partition-keys for each table in order to maximize the performance. Grep includes four parts, i.e., partition-key selection, selected-key evaluation, model training, and frontend demo.
A complete version will be made available at https://github.com/zhouxh19/grep
- A demo of Grep
grep_6_21_2.mov
Implementing a distributed database (cluster) is actually a tricky stuff. And we recommend you to implement GreenPlum, which is open-source and much more stable than some classic distributed databases like PG-XL. You can refer to our install_db_cluster.md for implementation instructions.
pip install -r requirements.txt
Step 1: Change the settings within ./api/services/partition/config.py.
Step 2: Run the test script that selects partition keys withhout evaluation feedback.
python test_partition_key_selection.py
Step 1: Change the settings within ./api/services/partition/config.py.
Step 2: Run the test script that estimate the performance under selected partitioning keys.
python test_partition_key_evaluation.py
python train_partition_models.py
TBD
python app.py
cd web/
npm install
npm run dev
Check out the subset of queries and partiton results (only those that are publicly available) within ./practices.
If you use Grep in your research, please cite:
@article{DBLP:journals/pacmmod/ZhouLFLG23,
author = {Xuanhe Zhou and
Guoliang Li and
Jianhua Feng and
Luyang Liu and
Wei Guo},
title = {Grep: {A} Graph Learning Based Database Partitioning System},
journal = {Proc. {ACM} Manag. Data},
volume = {1},
number = {1},
pages = {94:1--94:24},
year = {2023},
url = {https://doi.org/10.1145/3588948},
doi = {10.1145/3588948},
timestamp = {Mon, 19 Jun 2023 16:36:09 +0200},
biburl = {https://dblp.org/rec/journals/pacmmod/ZhouLFLG23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}