Skip to content

Latest commit

 

History

History
180 lines (165 loc) · 7.06 KB

README.md

File metadata and controls

180 lines (165 loc) · 7.06 KB

GitHub license GitHub Downloads GitHub stars

CoffeeDB is an out-of-the-box string/keyword search database. CoffeeDB builds indexes for numeric data and suffix arrays for text data to speed up query operations. Unlike most existing databases, CoffeeDB maintains these data structures in memory. On the one hand, this significantly improves the query speed, on the other hand, it also limits its ability to handle very large-scale data. If the data you need to retrieve does not exceed tens of gigabytes, then CoffeeDB may be the best practice you want.

Get Started

Download CoffeeDB, put it under any folder, run ./coffeedb, then CoffeeDB will create a database under this folder and start the service. By default, the service address is http://127.0.0.1:14920/coffeedb. All database operations are handled by the Post method of this address.

For example, you can insert a piece of data into the database by sending the following json to http://127.0.0.1:14920/coffeedb.

{
    "operation": "insert",
    "data": {
        "number": 123,
        "name": "sunkafei",
        "secret": "01010"
    }
}

Note that the corresponding value of "data" is the inserted object, which can contain any number of key-value pairs, and the type of value can be integer, real number or string.

Next, we insert another piece of data into the database:

{
    "operation": "insert",
    "data": {
        "number": 234,
        "name": "yulemao",
        "position": 1.7724,
        "secret": "010"
    }
}

As we can seen, different objects can contain different fields, but the same fields must be of the same type. After we finish modifying the database, we need to call the build operation to create the corresponding data structure and make the modification take effect:

{
    "operation": "build"
}

Next we can query the database:

{
    "operation": "query",
    "constraints": {
        "number": "[100,200]"
    }
}

It returns all objects whose number is between $100$ and $200$. The result may look like this:

[{"number":123,"name":"sunkafei","secret":"01010"}]

Note that we can adjust the query to open interval by writing "(100,200)". If you only want to see some keys of the object but not all, you can use fields to specify. Here's an example:

{
    "operation": "query",
    "constraints": {
        "number": "[100,900]"
    },
    "fields": ["name"]
}

The query result is:

[{"name":"sunkafei"},{"name":"yulemao"}]

In the case of string/keyword searches, an additional key named $correlation is added to the object to indicate the number of occurrences of the keyword. For example:

{
    "operation": "query",
    "constraints": {
        "secret": "010"
    }
}

The query result is:

[{"$correlation":2,"number":123,"name":"sunkafei","secret":"01010"},{"$correlation":1,"number":234,"name":"yulemao","position":1.7724,"secret":"010"}]

If there are multiple query constraints that need to be met, you can simply list them in constraints. For example:

{
    "operation": "query",
    "constraints": {
        "secret": "010",
        "number": "[200,900]"
    },
    "fields": ["name"]
}

The query result is:

[{"name":"yulemao"}]

You can find a sample Python code here.

Run CoffeeDB

Download CoffeeDB, put it under any folder, run ./coffeedb, then CoffeeDB will create a database under this folder and start the service. By default, the service address is http://127.0.0.1:14920/coffeedb. All database operations are handled by the Post method of this address. You can change CoffeeDB's default behavior with the following command-line parameters:

Key Value Type Explanation
port Integer The port number to bind to.
clear None Clear all data before starting the service. Use this command with caution as it will delete all data irretrievably.
directory String The folder where the data is saved.

For example, the following command clear the past data and start service at http://127.0.0.1:12345/coffeedb.

$ ./coffeedb --port=12345 --clear

Supported Operations

Insert

The insert operation has the following general format:

{
    "operation": "insert",
    "data": {
        ...
    }
}

where the data field contains the object to be inserted. The inserted object can contain any number of key-value pairs. Note that all insert operations will be cached and will not take effect immediately. To make insert operations effective, you need to call the build operation.

Remove

The remove operation has the following general format:

{
    "operation": "remove",
    "constraints": {
        ...
    }
}

where all objects satisfy the constraints will be removed from the database. The format of constraints is the same as the format of constraints in query operation. Note that all remove operations will be cached and will not take effect immediately. To make remove operations effective, you need to call the build operation.

Build

The build operation has the following format:

{
    "operation": "build"
}

It makes all database modification operations take effect. As the build operation is time-consuming, you should call build once after all modifications are completed, rather than calling build after each modification.

Query

The query operation has the following general format:

{
    "operation": "query",
    "constraints": {
        ...
    },
    "fields": [
        ...
    ]
}

All obejcts that meet the constraints in constraints will be selected, and then the fields in fields will be filtered out and returned. You can get all objects in the database by omitting constraints, and you can get all fields in objects by omitting fields.

For fields of string type, the constraint can be a substring that must appear. In this case, an additional field named $correlation will be added to the returned object to indicate the number of occurrences of this substring.

For fields of type integer and float, the constraint can be an interval indicating the range of numbers. For example:

Value Explanation
[1,100] Values between $1$ and $100$.
[1,inf) Values greater than $1$ (inclusive).
[-inf,1) Values less than $1$ (exclusive).

You can find some use cases of the query operation in Get Started.