Datafuse is a Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture written in Rust, inspired by ClickHouse and powered by arrow-rs, built to make it easy to power the Data Cloud.
-
Fearless
- No data races, No unsafe, Minimize unhandled errors
-
High Performance
- Everything is Parallelism
-
High Scalability
- Everything is Distributed
-
High Reliability
- True Separation of Storage and Compute
- Memory SIMD-Vector processing performance only
- Dataset: 100,000,000,000 (100 Billion)
- Hardware: AMD Ryzen 7 PRO 4750U, 8 CPU Cores, 16 Threads
- Rust: rustc 1.49.0 (e1884a8e3 2020-12-29)
- Build with Link-time Optimization and Using CPU Specific Instructions
- ClickHouse server version 21.2.1 revision 54447
Query | FuseQuery (v0.1) | ClickHouse (v21.2.1) |
---|---|---|
SELECT avg(number) FROM numbers_mt(100000000000) | (3.11 s.) | ×3.14 slow, (9.77 s.) 10.24 billion rows/s., 81.92 GB/s. |
SELECT sum(number) FROM numbers_mt(100000000000) | (2.96 s.) | ×2.02 slow, (5.97 s.) 16.75 billion rows/s., 133.97 GB/s. |
SELECT min(number) FROM numbers_mt(100000000000) | (3.57 s.) | ×3.90 slow, (13.93 s.) 7.18 billion rows/s., 57.44 GB/s. |
SELECT max(number) FROM numbers_mt(100000000000) | (3.59 s.) | ×4.09 slow, (14.70 s.) 6.80 billion rows/s., 54.44 GB/s. |
SELECT count(number) FROM numbers_mt(100000000000) | (1.76 s.) | ×2.22 slow, (3.91 s.) 25.58 billion rows/s., 204.65 GB/s. |
SELECT sum(number+number+number) FROM numbers_mt(100000000000) | (23.14 s.) | ×5.47 slow, (126.67 s.) 789.47 million rows/s., 6.32 GB/s. |
SELECT sum(number) / count(number) FROM numbers_mt(100000000000) | (3.09 s.) | ×1.96 slow, (6.07 s.) 16.48 billion rows/s., 131.88 GB/s. |
SELECT sum(number) / count(number), max(number), min(number) FROM numbers_mt(100000000000) | (6.73 s.) | ×4.01 slow, (27.59 s.) 3.62 billion rows/s., 28.99 GB/s. |
SELECT number FROM numbers_mt(10000000000) ORDER BY number DESC LIMIT 1000 | (6.91 s.) | ×1.42 slow, (9.83 s.) 1.02 billion rows/s., 8.14 GB/s. |
SELECT max(number),sum(number) FROM numbers_mt(1000000000) GROUP BY number % 3, number % 4, number % 5 | (10.87 s.) | ×1.95 fast, (5.58 s.) 179.23 million rows/s., 1.43 GB/s. |
Note:
- ClickHouse system.numbers_mt is 16-way parallelism processing
- FuseQuery system.numbers_mt is 16-way parallelism processing
- SQL Parser
- Query Planner
- Query Optimizer
- Predicate Push Down
- Limit Push Down
- Projection Push Down
- Type coercion
- Parallel Query Execution
- Distributed Query Execution
- Hash GroupBy
- Merge-Sort OrderBy
- Joins (WIP)
- Projection
- Filter (WHERE)
- Limit
- Aggregate Functions
- Scalar Functions
- UDF Functions
- SubQueries
- Sorting
- Joins (WIP)
- Window (TODO)
- 0.1 Support aggregation select (2021.02)
- 0.2 Support distributed query (2021.03)
- 0.3 Support group by (2021.04)
- 0.4 Support order by (2021.04)
- 0.5 Support join
- 1.0 Support TPC-H benchmark
Datafuse is licensed under Apache 2.0.