LogBlock

About LogBlock

This is the replication package for the paper titled "Improving State-of-the-art Compression Techniques for Log Management Tools".

LogBlock is a log preprocessing tool that improves the compression of small blocks of log data. Modern log management tools usually splits log data into small blocks to improve the performance of information query. As shown in the following table, different sizes are adopted by different log management tools LogBlock has better compression ratio than direct compression, or traditional log preprocessing tools which have good compression ratio on large-sized log files.

Block sizes used by log management tools:

Log Management Tool	Block Size	Reference
ELK Stack	16KB, 60KB	https://lucene.apache.org/core/7_4_0/core/org/apache/lucene/codecs/lucene50/Lucene50StoredFieldsFormat.html
Splunk	128KB	https://static.rainfocus.com/splunk/splunkconf18/sess/1523558790516001KFjM/finalPDF/Behind-The-Magnifying-Glass-1734_1538786592130001CBKR.pdf
Sumo Logic	64KB	https://help.sumologic.com/05Search/Get-Started-with-Search/Search-Basics/Search-Large-Messages
Syslog Ng	64KB	https://www.syslog-ng.com/technical-documents/doc/syslog-ng-open-source-edition/3.16/release-notes/summary-of-changes
Nginx	64KB	http://nginx.org/en/docs/http/ngx_http_log_module.html
Rsyslog	8KB	https://www.rsyslog.com/doc/master/rainerscript/global.html
DataDog	256KB	https://docs.datadoghq.com/api/v1/logs/
Sentry	1000 characters	https://docs.sentry.io/clients/java/config/

Evaluation

We include sample logs in this repo for evaluation purposes. To access the full dataset, please contact Loghub.

The repository contains framework for evaluating different log preprocessing approaches. We take the following approaches into consideration.

LogBlock - Reduce repetitiveness through preprocessing heuritstics.

LogZip - Extract reptitve template & variables through iterative clustering. Please check the full paper for more details: Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression.

Cowic - Compress log entries with pretrain a compression models. Please check the full paper for more details: Cowic: A Column-Wise Independent Compression for Log Stream Analysis.

LogArchive (not taken into comparison) - Cluster log messages according to text similarity then compress. For more details, please check: Adaptive log compression for massive log data.

We do not include the source code of Logzip, Cowic and LogArchive due to copyright reasons. To evaluate these approaches from our framework, such tools should be cloned and compiled. Start from here to evaluate the compression performance of each approach on small logs.
During the execution, the random truncated log blocks will be saved under 'temp' folder; the preprocess data will be recoreded under 'data' folder; and the compression performance result will be saved under 'result' folder. All of these folders will be created at runtime.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LogBlock		LogBlock
figs		figs
py		py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LogBlock

About LogBlock

Block sizes used by log management tools:

Evaluation

About

Releases

Packages

Contributors 2

Languages

SAILResearch/suppmaterial-21-kundi-logblock

Folders and files

Latest commit

History

Repository files navigation

LogBlock

About LogBlock

Block sizes used by log management tools:

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages