DedupSearch

General Information

DedupSearch is a string search algrithm for deduplicated storage systems.

Implementation is based on the open-source deduplication system, Destor, publicly available in C: "Design Tradeoffs for Data Deduplication Performance in Backup Workloads" (FAST 15).

We modified a publicly available implementation of the Aho Corasick algorithm in C++ to improve its data structures, memory locality, and suffix matching, and to support non-ASCII strings. For best integration of this implementation into Destor, we refactored the Destor code to use C++ instead of C.

Environment

Ubuntu 16.04

Dependencies

libssl-dev is required to calculate sha-1 digest;
GLib 2.32 or later version
Berkeley Database

libglib.so and glib.h may not be found when you first install it.

The header files (that originally locate in /usr/local/include/glib-2.0 and /usr/local/lib/glib-2.0/include) are required to be moved to a searchable path, such as "/usr/local/include".

Also a link named libglib.so should be created in "/usr/local/lib".

Install Berkeley DB by executing "sudo apt-get install libdb-dev"
Makefile is automatically generated by GNU autoconf and automake.

INSTALL

If all dependencies are installed, compiling destor with 2D-search is straightforward:

./configure CC="g++-7 -fpermissive --std=c++11 -Ofast"

make

make install

To uninstall destor, run

make uninstall

Running

If compile and install are successful, the executable file, destor, should have been moved to /usr/local/bin by default. You can create a config file, destor.config, in where you run destor. A sample destor.config is in the project directory.

NOTE: run rebuild script before destor backup task to prepare working directory and clear data.

destor can run as follows:

start a backup task,

destor /path/to/data -p"a line as in config file"
start a restore task,

destor -r /path/to/restore -p"a line as in config file"
show the current statistics of system,

destor -s
show help

destor -h
make a trace

destor -t /path/to/data
Parameter

-p"a line in config file"
Naive search

destor -n<JOB_ID> keyword1 keyword2 ... Use at end of command only!
DedupSearch search

destor -q<JOB_ID> keyword1 keyword2 ... Use at end of command only!
Tiny directory for DedupSearch search

-d"path to tiny's directory"
Keywords input file for DedupSearch search where each keyword is XX chars long (for binary keywords)

-f"path to keywords file" -lXX

Searching Examples

Search "hello" and "bye" with naive search

destor -n0 hello bye

Search "hello" with DedupSearch without 'Tiny' structure

destor -q0 hello

Search "hello" with DedupSearch with 'Tiny' structure, wich will be temporary saved at "./tiny_dir/tiny.db"

destor -d"./tiny_dir" -q0 hello

Search the keywords from the file "keywords.txt", where each keywords is 32 chars long, with DedupSearch without 'Tiny' structure

destor -f"keywords.txt" -q0 -l32

Configuration

A sample configuration is shown in destor.conf

Paper

To read more about DedupSearch please check our paper: "DedupSearch: Two-Phase Deduplication Aware Keyword Search". In 20th USENIX Conference on File and Storage Technologies (FAST 22), 2022.

Autors: Nadav Elias, Gala Yadgar (Technion - Israel Institute of Technology); Philip Shilane (Dell Technologies); Sarai Sheinvald (ORT Braude College of Engineering).

Email : nadav[dot]elias[at]technion[dot]ac[dot]il

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
DedupSearch-main		DedupSearch-main
DedupSearch-changes.md		DedupSearch-changes.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DedupSearch

General Information

Environment

Dependencies

INSTALL

Running

Searching Examples

Configuration

Paper

About

Releases

Packages

Languages

NadavElias/DedupSearch

Folders and files

Latest commit

History

Repository files navigation

DedupSearch

General Information

Environment

Dependencies

INSTALL

Running

Searching Examples

Configuration

Paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages