The repo benchmark several C/C++ log library. The tested libraries list below(Sort alphabetically)
- easyloggingpp: C++ logging library. It is extremely powerful, extendable, light-weight, fast performing, thread and type safe and consists of many built-in features. It provides ability to write logs in your own customized format. It also provide support for logging your classes, third-party libraries, STL and third-party containers etc.
- fmtlog: fmtlog is a performant fmtlib-style logging library with latency in nanoseconds.
- glog: C++ implementation of the Google logging module
- haclog: Haclog(Happy Aync C log) is an extremely fast plain C logging library
- loguru: A lightweight C++ logging library
- nanolog: Nanolog is an extremely performant nanosecond scale logging system for C++ that exposes a simple printf-like API.
- quill: Asynchronous Low Latency C++ Logging Library
- reckless: Reckless logging. Low-latency, high-throughput, asynchronous logging library for C++.
- spdlog: Fast C++ logging library
Use google benchmark for performance testing. The test is divided into two scenario
- Scenario 1: Determine the minimum test time (set MinTime in google benchmark), and write logs as much as possible during this time. This scenario is mainly for asynchronous logs, which can reflect the throughput of the log library, and how efficient is the writing of the log front-end when the buffer is highly stressed. The number of threads use in this scenario is: 1/2/4/8
- Scenario 2: Determine the number of iterations and repetitions (set Iterations + Repetitions in google benchmark) to reflect the performance of each log library under non-stress testing. The number of threads tested in this scenario is: 1/${0.5 * num_of_CPU}/${1 * num_of_CPU}/${2 * num_of_CPU}
Run build.sh
(build.bat
in Windows) to build, The log library under test will be automatically downloaded during the build process. Then run run_benchmark.sh
to run tests and generate benchmark report, report fill will be generated in build
directory, filename is benchmark_*.txt
.
Note: Since some log libraries do not support all platforms, Only linux can guarantee all log libraries be tested.
Date: 2023-10-18
Log libraries' version, see: CMakeLists.txt
Write data per time:
struct LogMsg {
uint64_t u64;
uint32_t u32;
int64_t i64;
int32_t i32;
char s[128];
};
logging output patterns:
${Level}|${datetime}|${filename}.${line_no}|${func_name}|${thread_id} - u64: msg.u64, i64: msg.i64, u32: msg.u32, i32: msg.i32, s: msg.s
Type: laptop
Machine: 20R10002CD ThinkPad X1 Carbon 7th
System: Arch Linux x86_64
Kernel: 6.5.6-arch2-1
CPU: Intel i7-10710U (12) @ 4.700GHz
GPU: Intel Comet Lake UHD Graphics
Memory: 15659MiB
gcc: (GCC) 13.2.1 20230801
ldd: (GNU libc) 2.38
Execute cpupower frequency-info
to view the information in my machine
analyzing CPU 7:
driver: intel_pstate
...
hardware limits: 400 MHz - 4.70 GHz
...
boost state support:
Supported: yes
Active: yes
Before starting the benchmark, run scripts/set_cpu_freq.sh
to limit the CPU frequency to a fixed value (here I chose 3.2GHz). Then, when running the benchmark, run scripts/monitor_cpu_freq.sh
to monitor the CPU frequency, and confirm that there is no rate drop due to high CPU temperature.
Due to some problems encountered during benchmark on my local machine, fmtlog
, quill
and reckless
did not fully cover all scenarios.
- When
#define FMTLOG_BLOCK 1
infmtlog
, the benchmark process will be stuck and unable to continue, sofmtlog
only tested the mode that discard log when the buffer is full - When use
UNBOUNDED
inquill
, in Scenario 1, it will cause the memory usage to keep growing and fail, so when benchmarkquill_unbounded
, skip Scenario 1 - I did not find the interface to set the buffer size for
fmtlog
andNanolog
, so I only used the default buffer size with them
Due to personal limitations, if there are any errors, omissions, or lack of consideration, please feel free to correct me!
In the gbenchmark directory, you can find the details of the benchmark report of my local machine, graphically represented as follows:
Scenario 1: Determine the minimum test time (x axis: logging library + number of threads, y axis: google benchmark - Time)
Scenario 2: Determine the number of iterations and repetitions (x axis: logging library + number of threads, y axis: google benchmark - Time)
The y-axis is google benchmark - Time. In the case of multi-threading, it does not represent the average single time consumption, but the average result based on throughput and time. The formula is: google benchmark Time = (sum(time-consuming of each thread) ÷ number of threads) ÷ total number of executions Therefore, the results of multi-threading can be compared horizontally, but it cannot be simply regarded as a single time consumption. If you want to get the average time-consuming data of a single time under multi-threading, you can modify the google benchmark (v1.8.3) codes:
// benchmark_runner.cc
BenchmarkRunner::IterationResults BenchmarkRunner::DoNIterations() {
...
// Just comment the following line
// i.results.real_time_used /= b.threads();
...
}
It is easy to see from the above chart
- In scenario 1, the faster log libraries are (in alphabetical order): fmtlog, haclog, Nanolog, quill_bounded
- In scenario 2, the faster log libraries are (in alphabetical order): fmtlog, haclog, Nanolog
**So why are the four log libraries mentioned above faster? What are their advantages and disadvantages, and what pitfalls should we pay attention to when using them? **
The above four log libraries are all asynchronous log libraries. They are also all designed multi-buffer queue and the consumer is responsible for polling. The overall ideas are similar, and the code details are interesting in their own way. But as a user, there is one thing need Pay special attention to
Pay special attention to
Pay special attention to
Pay special attention to
Since the buffer is used, you must consider the situation when the buffer is full. At this time, there are three different ways to deal with it
- Blocking: The producer thread blocks, waiting for the buffer to have enough space before writing.
- Discard: Give up writing this log, or write this log directly but give up the entire buffer of logs
- Expand: Dynamically increase the buffer length and continue writing
haclog
and Nanolog
chose option 1, fmtlog
chose option 1/2, and quill
chose option 2/3.
Due to the situation mentioned in Extra Instructions, fmtlog
can only choose option 2, and quill
only discusses bounded mode in Scenario 1
advantage
- It shows good speed in both scenarios, among which the speed in Scenario 1 is on par with
quill_bounded
and tied for the fastest speed - In both scenarios, the stability of speed performance is good enough and the fluctuation is small
- Use
format
formatting style
disadvantages
- A large number of lost logs occurred in Scenario 1
- Lost logs also occurred in Scenario 2
- When setting
#define FMTLOG_BLOCK 1
, the benchmark of Scenario 1 cannot be completed
advantage
- In most cases, both scenarios show good speed. In the case of multi-threading in Scenario 2, the speed is only slightly worse than Nanolog
- When the buffer is full, blocking mode is adopted and no logs will be lost
disadvantages
- When the throughput exceeds a certain threshold, the buffer is filled up, the efficiency will drop significantly
advantage
- Shows good speed in both scenarios, with speed ranking first in case of multi-threading in Scenario 2
- Very high throughput, no significant drop in efficiency occurs in either Scenario 1 or Scenario 2
- When the buffer is full, blocking mode is adopted and no logs will be lost
disadvantages
- The log cannot be read directly and needs to be decoded using the
decompressor
program - Only support Linux
advantage
- Speed in Scenario 1 is on par with
fmtlog
and tied for the fastest speed - Use
format
formatting style
disadvantages
- Log loss may occur in
bounded
mode - In
unbounded
mode, it may cause the memory to keep growing and the benchmark of Scenario 1 cannot be completed
haclog
is developed in plain C, while fmtlog
, Nanolog
and quill
are developed in C++; when it comes to the log library scenario, theoretically the upper limit of C++ implementation speed will be a little higher, as shown in the following two aspects
- Compilation-time calculation: C++ can pre-calculate the log parameter information at compile-time, while the log library implemented in plain C needs to calculate it during the first run
- Log front-end serialization
- C needs to traverse va_list at runtime, and the generated assembly code is serialized parameter by parameter through a loop. Whether it is judging the precalculated type through switch, or calling a preset function pointer, it may bring a little extra overhead
- C++ performs serialization through Variadic template, and the assembly code can be implemented in the form of flat expansion, exchanging space for time
fmtlog
andquill
use format formatting style, which is easy to write, and both have strong speed stability.fmtlog
andquill_bounded
perform well in Scenario 1. However, as analyzed in the previous section, their greater speed comes at the cost of losing logs, this requires special attention in usage scenarioshaclog
also performs well in both scenarios. In Scenario 2, its performance is only slightly inferior toNanolog
. However, in Scenario 1, when throughput exceeds a certain threshold, the efficiency is greatly reduced. This is exactly the opposite offmtlog
andquill
,haclog
reduces the speed in exchange for not losing logsNanolog
shows extremely high efficiency and ultra-high throughput in both scenarios. It is the king in Scenario 2 and guarantees no loss of logs under pressure in Scenario 1 You don’t have to sacrifice too much efficiency. However, its ultra-high throughput is traded for non-real-time readability of logs, which also makes tools such astail
unusable.
It can be seen that none of the current asynchronous logs can crush other log libraries in all aspects, but they all make trade-offs in some aspects.