SQLSolver is an automated prover that verifies the equivalence of SQL queries, which is presented in the paper "Proving Query Equivalence Using Linear Integer Arithmetic" (SIGMOD 2024).
- Ubuntu (22.04.1 LTS is tested)
- z3 4.8.9 (SMT solver)
- antlr 4.8 (Generate tokenizer and parser for SQL AST)
- Python 3
- Java 17
- Gradle 7.3.3
z3 and antlr library have been put in lib/
off-the-shelf.
Python is typically installed by default.
Type python3 --version
to check whether Python 3 is installed.
In cases where it is not installed, use this following instruction:
sudo apt install python3
If you do not have Java or Gradle installed, you may refer to these instructions:
# Installing Java 17
sudo add-apt-repository ppa:linuxuprising/java
sudo apt update
sudo apt-get install -y oracle-java17-installer oracle-java17-set-default
# Installing Gradle 7.3.3
wget https://services.gradle.org/distributions/gradle-7.3.3-bin.zip -P /tmp
sudo unzip -d /opt/gradle /tmp/gradle-7.3.3-bin.zip
sudo touch /etc/profile.d/gradle.sh
sudo chmod a+wx /etc/profile.d/gradle.sh
sudo echo -e "export GRADLE_HOME=/opt/gradle/gradle-7.3.3 \nexport PATH=\${GRADLE_HOME}/bin:\${PATH}" >> /etc/profile.d/gradle.sh
source /etc/profile
We use Gradle as the project build tool. After all the above dependencies are installed, you can compile the SQLSolver project with the following command.
gradle compileJava
After Compilation, you can build a JAR file using the following command.
gradle fatjar
The JAR file will be generated in the build/libs/
directory relative
to the current directory.
To use the JAR file of SQLSolver, you need to put libz3.so and libz3java.so in a custom directory: <path/to/lib>
and then use the command:
export LD_LIBRARY_PATH=<path/to/lib>
The code below shows how to use SQLSolver's JAR File to verify SQLs.
-sql1=<path/to/query1>
: the first file whose path is<path/to/query1>
to store the SQLs.-sql2=<path/to/query2>
: the second file whose path is<path/to/query2>
to store the SQLs.-schema=<path/to/schema>
: the schema file whose path is<path/to/schema>
to store the schema.-print
: output the result to standard output stream.-output=<path/to/output>
: output the result to a file whose path is<path/to/output>
.
Each SQL file has multiple SQL statements and should store a SQL statement in one line, and the corresponding lines in both files will be considered as pairs of SQL statements to be verified for equivalence.
java -jar sqlsolver.jar -sql1=<path/to/query1> -sql2=<path/to/query2> -schema=<path/to/schema> [-print] [-output=<path/to/output>]
Since SQLSolver uses the parser of Calcite to parse SQL queries, please ensure that queries in SQL files satisfy the syntax requirements of Calcite parser.
The example here shows how to use SQLSolver's JAR File to verify three simple SQLs.
First SQL file with path ./example_sql1.sql
:
SELECT i, j FROM a
SELECT x, y FROM b
Second SQL file with path ./example_sql2.sql
:
SELECT T.COL1, T.COL2 FROM (SELECT i AS COL1, j AS COL2 FROM a) AS T
SELECT T.COL1, T.COL2 FROM (SELECT x AS COL1, y AS COL2 FROM b) AS T
Schema file with path ./example_schema.sql
:
CREATE TABLE a ( i INT PRIMARY KEY, j INT, k INT );
CREATE TABLE b ( x INT PRIMARY KEY, y INT, z INT );
Then, use the following command to verify the three SQL pairs:
java -jar sqlsolver.jar -sql1=./example_sql1.sql -sql2=./example_sql2.sql -schema=./example_schema.sql -output=./result
You will get a file named result
in the current directory with the following content:
EQ
EQ
The Java users can directly download SQLSolver's source code and access the package sqlsolver.api
.
The entry of SQLSolver is in sqlsolver.api.entry.Verification
.
There are two main interfaces:
/**
* Verify two sql equivalence.
*/
VerificationResult verify(String sql0, String sql1, String schema)
/**
* Verify pairwise sql equivalence in the sqlList.
* The two sql to verify are in the same index of both lists.
* The two sqlList (sqlList0 and sqlList1) should have same size.
*/
List<VerificationResult> verify(List<String> sqlList0, List<String> sqlList1, String schema)
The VerificationResult is an enum class for verification result which has four cases:
- EQ: two SQLs are equivalent
- NEQ: two SQLs are not equivalent
- UNKNOWN: SQLSolver doesn't know the equivalency between the two SQLs
Note that verifying the equivalence of two SQL queries is an undecidable problem. Thus, the verification algorithm of SQLSolver does not guarantee to identify all equivalent queries. SQLSolver may output NEQ or UNKNOWN for some equivalent query pairs.
You can import SQLSolver by JAR File or directly by downloading source code in your JAVA's project.
By calling the interface of SQLSolver, you can get the SQLs verification result.
Before you use SQLSolver, you should put libz3.so
and libz3java.so
in a custom directory: <path/to/lib>
and then use the command:
export LD_LIBRARY_PATH=<path/to/lib>
Here is a little demo that shows how to use SQLSolver's interface:
import sqlsolver.api.entry.Verification;
import sqlsolver.superopt.logic.VerificationResult;
public class Main {
public static void main(String[] args) {
String sql1 = "SELECT i, j FROM a";
String sql2 = "SELECT T.COL1, T.COL2 FROM (SELECT i AS COL1, j AS COL2 FROM a) AS T";
String schema = "CREATE TABLE a ( i INT PRIMARY KEY, j INT, k INT );\n" +
"CREATE TABLE b ( x INT PRIMARY KEY, y INT, z INT );";
VerificationResult result = Verification.verify(sql1, sql2, schema);
System.out.println(result);
}
}
This program will output a single EQ
that indicates the verification result of two SQLs sql1
and sql2
under the schema schema
.
In our paper, SQLSolver is evaluated on four benchmarks, including test cases derived from Calcite, Spark SQL, TPC-C, and TPC-H. The files of SQL queries and schemas are listed in the following table.
Each file of SQL queries consists of multiple pairs of SQL queries. Each query in the odd-numbered line is equivalent to the query in the next line. For example, the first query is equivalent to the second query in each file.
Benchmark | File of Schema | File of SQL Queries |
---|---|---|
Calcite | Calcite Schema | Calcite Test Set |
Spark SQL | Spark SQL Schema | Spark SQL Test Set |
TPC-C | TPC-C Schema | TPC-C Test Set |
TPC-H | TPC-H Schema | TPC-H Test Set |
This repository includes the source code and benchmarks.
|-- api # SQLSolver's entry.
|-- lib # Required external library.
|-- common # Common utilites.
|-- sql # Data structures of SQL AST and query plan.
|-- stmt # Manipulation program of queries.
|-- superopt # Core algorithm of SQLSolver
|-- sqlsolver_data # Data input/output directory, such as benchmarks
If you use SQLSolver in your projects or research, please kindly cite our paper:
@article{sqlsolver,
author = {Ding, Haoran and Wang, Zhaoguo and Yang, Yicun and Zhang, Dexin and Xu, Zhenglin and Chen, Haibo and Piskac, Ruzica and Li, Jinyang},
title = {Proving Query Equivalence Using Linear Integer Arithmetic},
year = {2023},
issue_date = {December 2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {1},
number = {4},
journal = {Proc. ACM Manag. Data},
month = {Dec},
articleno = {227},
numpages = {26},
}
If you have any questions, please submit an issue or contact our email.