Updated: 2019-09-23
Index Return to Main Contents
qdda - the quick & dirty dedupe analyzer
qdda <options> [FILE...]
qdda checks files, data streams or block devices for duplicate blocks to estimate deduplication efficiency on dedupe capable storage systems, using key-value stores in SQLite, MD5 hashing and LZ4 or DEFLATE compression. It also estimates compression ratios for all-flash arrays XtremIO X1 and X2 as well as VMAX All-flash / PowerMAX.
qdda can create very large database files and generate lots of read I/O and heavy CPU load. Check the RESOURCE REQUIREMENTS section before you start.
The SQLite database file(s) (qdda.db) may be removed at any time using 'qdda --delete' or simply deleting the qdda.db file.
For additional safety, run qdda as non-root user. See the SECURITY AND SAFETY section for details on how to do this.
- -V, --version
- show version and copyright info
- -h, --help
- show usage
- -m, --man
- show detailed manpage
- -d, --db <file>
- database file path (default $HOME/qdda.db)
- -a, --append
- Append data instead of deleting database
- --delete
- Delete database
- -q, --quiet
- Don't show progress indicator or intermediate results
- -b, --bandwidth <mb/s>
- Throttle bandwidth in MB/s (default 200, 0=disable)
- --array <list|array>
- show/set arraytype or custom (see man page section STORAGE ARRAYS)
- --compress <method>
- set compression method <none|lz4|deflate>[:interval]
- -x, --detail
- Detailed report (file info and dedupe/compression histograms)
- -n, --dryrun
- skip staging db updates during scan
- --purge
- Reclaim unused space in database (sqlite vacuum)
- --import <file>
- import another database (must have compatible metadata)
- --cputest
- Single thread CPU performance test
- --nomerge
- Skip staging data merge and reporting, keep staging database
- --debug
- Enable debug output
- --queries
- Show SQLite queries and results
- --tmpdir <dir>
- Set $SQLITE_TMPDIR for temporary files
- --workers <wthreads>
- number of worker threads
- --readers <rthreads>
- (max) number of reader threads
- --findhash <hash>
- find blocks with hash=<hash> in staging db
- --tophash <num>
- show top <num> hashes by refcount
- --squash
- set all refcounts to 1
- --mandump
- dump raw manpage to stdout
- --bashdump
- dump bash_completion script to stdout
- --demo
-
show quick demo
FILE description:
FILE is usually a disk device such as /dev/sda but can also be a flat file, a named pipe, a disk partition or anything else that can be read as a stream. It will also read from stdin if it is not connected to a tty, i.e. you can do 'cat <file> | qdda '.
Modifiers
Each file can have a modifier by adding a colon (:). Currently the modifier is in the format <maxmb[,dup]> where maxmb is the maximum amount of mibibytes to read from the stream: /dev/sda:1024 will read the first 1024MiB and then stop. dup is the number of times the data is processed (only for testing purposes) i.e. /dev/sda:1024,2 will generate 2048 MiB with dupcount=2 (the first 1024 MiB of sda but processed twice).
Special filenames
Special filenames are zero (alias for /dev/zero), random (alias for /dev/urandom) and compress (same as random but make the data compressible). This allows you to generate test data.
Test data example
qdda zero:512 random:512 random:256,2 compress:128,4
Generates a test dataset with 512MiB zeroed, 512MiB uncompressible, unique data, 256MiB uncompressible data used twice, and 128MiB compressible data with 4 copies each.
Currently qdda supports 3 storage arrays:
- XtremIO X1 (--array=x1)
- The first generation XtremIO with 8KB block size and compression bucket sizes 2K, 4K, 8K. As XtremIO performs all compression and dedupe operations inline, the results of qdda for dedupe should match the array dedupe very closely. XtremIO uses a proprietary compression algorithm which has a slightly lower compression ratio compared to LZ4, but claims to be much faster. This means the qdda results are slightly over-optimistic. The differences are too small however to be a major issue.
- XtremIO X2 (--array=x2) (default)
- With the X2, the internal blocksize was increased to 16KiB and many more (15) compression bucket sizes are available: 1K up to 16K with 1K increments, where 14K is missing because in XtremIO X2 architecture it would allocate the same capacity as 15K uncompressed. The larger block size and more variations in buckets makes XtremIO compression much more effective. There is still a slight difference in compression ratio due to LZ4 versus the native XtremIO algorithm.
- VMAX (--array=vmax)
- VMAX All-Flash VMAX compresses/dedupes data using 128K chunks which get compressed in bucket sizes from 8K up to 128K with 8K increments. The compression is initially performed by splitting 128K into 4 32K chunks, but if the data is cold for a while it can get re-compressed on the full 128K block. Not all bucket sizes are available at initial configuration and VMAX changes the compression layout dynamically. The compression algorithm in VMAX is LZS which is similar to LZ4 so qdda uses LZ4 to estimate VMAX compression. It also can delay or avoid compression at all for up to 20% of all data to avoid overhead for hot data blocks. As the data reduction rate is not immediately known or deterministic, qdda assumes the final state scenario where 128K blocks get fully compressed again and deduped so the qdda result reflects the optimal end result for idle data.
- PowerMAX (--array=pmax)
- PowerMAX uses DEFLATE (zlib) compression, on 128K blocks split into buckets of 8K .. 128K like VMAX. DEFLATE achieves a higher compression ratio but at a higher CPU overhead.
- custom (--array=<custom definition>)
-
Specify a string with array=custom:<blocksize>:<size1,size2....>
example: qdda --array=custom:64:8,16,32,48,64
for a custom array with blocksize 64K, and buckets of 8, 16, 32, 48 and 64K
The compress and hash algorithms are slightly different from these actual arrays and the results are a (close) approximation of the real array data reduction. Currently qdda only uses LZ4 (default) or DEFLATE (ZLIB) compression.
Currently qdda supports LZ4 as well as ZLIB (DEFLATE) compression.
LZ4 is a very fast, lightweight compression algorithm with reasonable compression ratios. My [email protected] can compress roughly at 500MB/s per core. DEFLATE offers higher compression ratios but at the expense of much heavier CPU load. The same i5-4440 can do roughly 55MB/s per core.
Both compression algorithms use their default compression level.
For this reason, when using DEFLATE a default, random sample interval of 20 is used so that on average 1 out of 20 blocks gets sampled. The end compression ratio is then calculated from the sampled values.
You can change the default algorithm and interval using the
--compress
option:
--compress <none|lz4|deflate>[:interval]
where interval represents the average number of non-sampled vs sampled blocks i.e. an interval of 20
means on average one out of every 20 blocks gets sampled (inverse of the sample rate).
When selecting 'none' no compression is done, only dedupe analysis.
qdda
has basic error handling. Most errors result in simply aborting with an error message and return code.
Currently aborting qdda with ctrl-c may result in corruption of the SQLite QDDA database.
- qdda -d /tmp/demo compress:128,4 compress:256,2 compress:512 zero:512
-
Analyze a compressible reference test data set with 128Mx4, 256Mx2, 512x1 and 512M zeroed.
Example output
Database info (/tmp/demo.db): database size = 1.12 MiB array id = XtremIO X2 blocksize = 16 KiB compression = lz4 sample percentage = 100.00 % Overview: total = 2048.00 MiB ( 131072 blocks) free (zero) = 512.00 MiB ( 32768 blocks) used = 1536.00 MiB ( 98304 blocks) dedupe savings = 640.00 MiB ( 40960 blocks) deduped = 896.00 MiB ( 57344 blocks) compressed = 451.62 MiB ( 46.08 %) allocated = 483.09 MiB ( 30918 blocks) Details: used = 1536.00 MiB ( 98304 blocks) unique data = 512.00 MiB ( 32768 blocks) non-unique data = 1024.00 MiB ( 65536 blocks) compressed raw = 772.98 MiB ( 49.67 %) compressed net = 451.62 MiB ( 49.59 %) Summary: percentage used = 75.00 % percentage free = 25.00 % deduplication ratio = 1.71 compression ratio = 1.85 thin ratio = 1.33 combined = 4.24 raw capacity = 2048.00 MiB net capacity = 483.09 MiB
Explanation
- database size
- Size of the primary SQLite database on disk
- array ID
- Name of array for which dedupe and compress estimates are calculated. Can be a custom string.
- blocksize
- Blocksize on which hashes and compression sizes are calculated
- compression
- Compression algorithm used
- sample
- Percentage of all blocks that were sampled for compression ratio. Equals 1/interval.
- total
- Total scanned blocks
- free
- Free (zero) blocks
- used
- Used (non-zero) blocks
- dedupe savings
- Blocks saved by merging duplicate blocks
- deduped
- Blocks required after dedupe
- compressed
- Capacity after compressing (deduped) blocks i.e.sum of compressed size of all blocks after dedupe
- allocated
- Capacity after allocating compressed blocks into buckets. This is the required capacity on an inline dedupe/compress capable storage array
- unique data
- Blocks that are unique (cannot be deduped)
- non-unique data
- Blocks that appear at least 2x (can be deduped)
- compressed raw
- Capacity required for compressing all raw data (before dedupe) i.e. sum of compressed size of all scanned blocks
- compressed net
- Capacity required for compressing all deduped data (after dedupe) i.e. sum of compressed size of all deduped blocks
- percentage used
- Percentage of all raw blocks that are non-zero
- percentage free
- Percentage of all raw blocks that are zero
- deduplication ratio
- capacity used divided by deduped
- compression ratio
- capacity deduped divided by allocated
- thin ratio
- capacity used divided by total
- combined
- Overall data reduction (dedupe ratio * compress ratio * thin ratio)
- raw capacity
- equal to total
- net capacity
- equal to allocated
- qdda --detail
-
Show detailed histograms from the database
Example output
File list: file blksz blocks MiB date url 1 16384 8192 128 20190204_0944 workstation:/dev/urandom 2 16384 16384 256 20190204_0944 workstation:/dev/urandom 3 16384 32768 512 20190204_0944 workstation:/dev/zero 4 16384 32768 512 20190204_0944 workstation:/dev/urandom Dedupe histogram: dup blocks perc MiB 0 32768 25.00 512.00 1 32768 25.00 512.00 2 32768 25.00 512.00 4 32768 25.00 512.00 Total: 131072 100.00 2048.00 Compression Histogram (2): size buckets RawMiB perc blocks MiB 1 3350 52.34 5.84 210 3.28 2 3642 56.91 6.35 456 7.12 3 3568 55.75 6.22 669 10.45 4 3648 57.00 6.36 912 14.25 5 3607 56.36 6.29 1128 17.62 6 3510 54.84 6.12 1317 20.58 7 3603 56.30 6.28 1577 24.64 8 3415 53.36 5.96 1708 26.69 9 3516 54.94 6.13 1978 30.91 10 3532 55.19 6.16 2208 34.50 11 3572 55.81 6.23 2456 38.38 12 3539 55.30 6.17 2655 41.48 13 3682 57.53 6.42 2992 46.75 15 7322 114.41 12.77 6865 107.27 16 3838 59.97 6.69 3838 59.97 Total: 57344 896.00 100.00 30969 483.89
Explanation
File list shows info on the files that were scanned.
Dedupe histogram
shows the distribution of duplicate counts of the scanned blocks. The first row (0) is a special case and shows how many blocks were blank (zeroed). Each other row shows dupcount (how many copies of each block were found), the amount of blocks, the percentage (from all scanned blocks), and Mibibytes (after dedupe). For example, the row with dupcount 4 has 32768 blocks which means qdda found 4 blocks to be the same (dupcount 4), 4 more blocks being the same and so on with a total of 32768 blocks (8192 sets of 4 similar blocks each). The row with dup=1 means these are unique blocks in the dataset. A very high dupcount usually is the result of some special blocks such as filled with ones (0xFFFFFFFF...) or other common data structures such as metadata or padding blocks. In our reference test set the dupcounts are distributed evenly.
Compression histogram
qdda will calculate the compressed size for each (deduped) block and sort it into 1KiB multiples. Then it will sort the amounts into the defined bucket sizes for the array. For example XtremIO X1 has bucket sizes 2K,4K, 8K. A block with a compressed size between 1 and 2048 bytes will go into the 2K bucket, sizes between 2049 and 4096 will go into bucket 4K and everything else into 8K.
The compression histogram shows the distribution of bucket sizes. In this case for XtremIO X2 it shows that 3350 blocks were compressed into 1K buckets. The array has a blocksize of 16K so in order to store 3350 1K buckets we need 210 16K blocks (3350*1/16).
3415 blocks were compressed into the 8K bucket, and this requires 1708 blocks to be allocated (3415*8/16).
3838 blocks could not be compressed in less than 16K so these are stored 1:1.
qdda --tophash 5
Shows the 5 most common hash values in the database. Note that these are the 60-bit truncated MD5 hashes of each block.
Example output
(from a scan of a Linux bootdisk i.e. /dev/sda)
hash blocks 452707908990536042 402 110182122676868616 146 356086100205023294 16 918237101795036785 9 941619753194353532 9
Explanation
We see that 452707908990536042 is the most common hash in the database with a dupcount of 402. To find out what the contents are of a block that has this hash value, we can scan the data again but keep the staging database with the --nomerge option as the staging database keeps all the offsets of the block hashes (if we only scan one file). We can then query the staging database for the offsets (will look for the 2nd most common hash, 110182122676868616):
sqlite3 qdda-staging.db "select * from offsets where hash=110182122676868616 limit 2"
hash hexhash offset bytes ------------------ ------------------ ---------- ---------- 110182122676868616 0x0187720e8ac0d608 181 2965504 110182122676868616 0x0187720e8ac0d608 182 2981888
We see that the hash appears on block offsets 181 and 182 (and 144 more but we limit the query to the first 2). We can hexdump the contents of this particular block to see what's in there:
dd bs=16K status=none count=1 if=/dev/sda skip=181 | hexdump -Cv|head
00000000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 00000010 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 00000020 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| 00000030 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................| etc...
We can verify the MD5 hash:
dd bs=16K status=none count=1 if=/dev/sda skip=181 | md5sum
92ab673d915a94dcf187720e8ac0d608 - |-------------| --> Note that the last 15 hex digits (equal to 60 bits) match the hexadecimal hash value in the database.
By default, when scanning data, qdda deletes the existing database and creates a new one. Using the --append option you can keep existing data and add more file(s) to the existing database:
qdda /dev/<disk1> qdda --append /dev/<disk2>
It is also possible to join 2 databases together using --import:
qdda --delete qdda --db db1 random:512 qdda --db db2 random:512 qdda --import db1.db qdda --import db2.db
The newly created database qdda.db will contain data from both db1 and db2.
The combined databases can be gathered from different servers (by copying the qdda.db files to one central location) so this allows one to create a data reduction analysis across multiple hosts.
Storage capacity
During scans, data is stored in a SQLite staging database table with 2 columns (hash and bytes).
The hash is usually a large integer requiring 8 bytes, bytes is another int which is usually 2 bytes (sometimes 3 when
using block sizes larger than 64K). The database also stores the rowid which is up to 4 bytes and a b-tree iternal index which usually
gets about 7 bytes per row.
So the amount of bytes per row equals b-tree (7) + rowid (4) + hash (8) + val (2) = 21
Scanning a terabyte disk at 16K blocksize requires 67,108,864 rows. The database capacity required for
the staging table is then 67108864 * 21 = 1,409,286,144 bytes = 1344 MiB (not including a little bit extra capacity for SQLite internals).
So at 16K blocksize the database capacity for scanning is roughly 0.11% of the data size.
After scanning the primary database will be updated from the staging database (merge process). During merge the required capacity is double the size or 0.22% for both databases, however, SQLite also creates hidden temporary tables which require another 0.22%.
Sizing summary for a 1TiB random dataset:
Primary database - 1400MiB (will be smaller if the data has zero blocks or can be deduped) Staging database - 1400MiB (deleted after merging) Temp tables - 2800MiB (hidden, deleted after merging) Total - 5600MiB (file system free space required, or 0.56%)
A (very) safe assumption for reserved space for qdda is 1% of data size for a blocksize of 16kb.
After merging the data, the staging database is deleted and the database size is about 0.12% of the original data size (at 16K blocksize).
Default SQLite storage locations:
Primary database: $HOME/qdda.db
Staging database: $HOME/qdda-staging.db
Temp storage: /var/tmp
Note that you may change the locations of primary and staging database (--db <file> option) and for the hidden temp tables (--tmpdir <path> option or by setting SQLITE_TMPDIR).
I/O requirements
qdda will scan data at 200MB/s throttled using large blocks, concurrently reading all files. If throttling is disabled, qdda will scan as fast as possible until CPU power for processing the data or writing to the database becomes the bottleneck.
CPU
qdda
starts a separate reader thread for each given file or stream (max 32), and a number of worker threads equivalent to the number of CPU cores
unless the amount of workers and readers is changed via command line options.
If the amount of readers is less than the number of files, each reader will process one file at a time so some files will be on the wait queue until another file is completed. If the amount of workers is set to less than the amount of CPU cores, hashing and compression will be limited to those threads.
Memory
qdda
allocates a number of read buffers which are 1 MiB each. The amount of buffers is set to #workers + #readers + 32. So on a
system with 8 cores reading 2 files, the amount of buffers = 2 + 8 + 32 = 42 MiB.
qdda also requires additional memory for SQLite, etc. but the total required memory usually fits in less than 100MiB.
How qdda works:
Each stream (device, file or pipe) is scanned where each block is hashed and compressed.
The results (hash,compressed_bytes) go into a staging table. At the end of processing, the
staging data is merged into the main kv table (which actually holds 3 columns: hash - blocks - compressed_bytes).
The report is then generated by querying the kv table.
Hashing:
The hash value is a 60-bit truncated MD5 sum (tradeoff between database limits, efficiency and low chance of hash collisions when scanning very large data sets). Although storage arrays typically use the SHA algorithm with higher number of bits, MD5 has better performance, and a very low amount of collisions will not impact the results or cause data corruption. See also the ACCURACY section.
Compression:
Some All-Flash arrays use "bucket" compression to achieve high throughput, low overhead and good compression. qdda simulates compression uzing LZ4 compression. LZ4 has very high throughput and the compression ratios are very close to what All-Flash Arrays can achieve. For VMAX/Powermax, DEFLATE (zlib) is used which is much slower but achieves a higher compression rate (everything is a tradeoff).
Bucket Compression:
If an array would store compressed blocks by just concatenation of the blocks (such as with file compression tools like ZIP or GZIP), random access would be very poor as the overhead for finding block offsets would be very high. Also, modification of a compressed block would cause severe fragmentation and other issues. For this reason, AFA's like XtremIO use "bucket compression". For example, XtremIO has bucket sizes of 1K to 16K with 1K steps. Say an incoming 16K block compresses to a size of 4444 bytes. The smallest bucket where this would fit into is the 5K bucket which means the remaining 676 bytes in the bucket are not used. This causes a slightly lower compress ratio (16384:5120 vs 16384:4444) but vastly improves performance and reduces fragmentation and partial write issues.
Throttling:
qdda processes a number of blocks per read IO and measures the service time. If the service time is too low it means the throughput is higher than the bandwidth limit. The reader is then put to sleep for a number of microseconds to match the overall bandwidth limit. This prevents accidentally starving IO on a production host. Disable throttling with '--bandwidth 0' or set a different bandwidth.
Blocksize:
The default blocksize is 16KiB (XtremIO X2). The block size is stored in metadata and only datasets with matching blocksizes can be merged or combined. The maximum blocksize is currently 128K, the minimum is 1K.
Notes on hash algorithm
qdda
uses an integer field to store the hash value in SQLite.
An SQLite integer (used for the hash in the primary key-value table) has max 8 bytes and is
signed integer with a MAX value of 9 223 372 036 854 775 807 (2^64/2-1).
The hashing algorithm of qdda is MD5 which is 128 bits - which would not fit in SQLite
integers and would be converted to another datatype (TXT or blob) resulting in poor performance.
Therefore qdda only uses the 7.5 least significant bytes (60 bits) of the MD5 hash.
The number of rows with a 50% chance of a hash collision is roughly
rows = 0.5 + sqrt(0.25 + 2 * ln (2) * 2^bits)
See https://en.wikipedia.org/wiki/Birthday_problem#Probability_table
for more info on hash collisions.
A hash of (60 bits) is a tradeoff between DB space consumed, performance and accuracy and has a 50% chance of a single collision with ~ 1.2 billion rows (At 16K blocksize this equals about 19 TB) which is fine for datasets up to many terabytes. A 64-bit hash would get roughly 1 collision every 77TB@16K. A collision would be a serious problem for a deduplicating storage array but for an analysis tool a few collisions are not a serious problem so we can get away with using truncated hashes.
Notes on compression algorithm
qdda
uses LZ4 with default parameters for compression. Some storage arrays (including XtremIO) use a proprietary compression algorithm
usually for performance reasons or to achieve higher compression ratios. Also some arrays (such as VMAX) don't compress ALL
data but keep frequently accessed data in uncompressed storage pools.
Some arrays do post-processing which also results in not all data being compressed or deduped all the time. qdda currently ignores these effects and produces results for all data as if it was compressed and deduped immediately (inline).
qdda
is multi-threaded during disk scans so the read process can go as fast as possible while the worker processes handle the compression and
hashing calculations. An updater process is dedicated to update the SQLite staging database.
Giving accurate numbers for performance is almost impossible due to differences in IO speed, CPU power and other factors. You can get a rough idea of your system's capabilities by running the --cputest option which gives the estimates for a single thread:
*** Synthetic performance test, 1 thread *** Initializing: 65536 blocks, 16k (1024 MiB) Hashing: 1842670 usec, 582.71 MB/s, 35565.78 rows/s Compress DEFLATE: 32676647 usec, 32.86 MB/s, 2005.59 rows/s Compress LZ4: 2503945 usec, 428.82 MB/s, 26173.10 rows/s DB insert: 51219 usec, 20963.74 MB/s, 1279525.12 rows/s
The overview shows how fast a single core can hash, compress and update a dataset of the given size (this is on an Intel Core i5-4440 CPU @ 3.10GHz). The reference dataset is a random(ish) block of data and the numbers are an indication only. Note that the compress rate is inaccurate but repeatable. A real dataset is usually less random and may show higher or lower speeds.
A data scan by default will allocate 1 thread per file, 1 thread for database updates and the number of worker threads equal to the amount of cpu cures. Experience shows that the bottleneck is usually read IO bandwidth until the database updater is maxed out (on a fast reference system this happened at about 7000MB/s). Future versions may use multiple updater threads to avoid this bottleneck.
After data scan the staging data has to be merged with the primary database. This is done by joining existing data with staging data and running an 'insert or replace' job in SQLite. Testing the speed can be done with the --dbtest option. Output of a merge of 1TB data @16K on i5-4440 CPU @ 3.10GHz:
Merging 67108864 blocks (1048576 MiB) with 0 blocks (0 MiB) in 157.28 sec (426686 blocks/s, 6666 MiB/s)
Tuning - You may speed up I/O by altering the default database location from $HOME/qdda.db to another path with the '-d' option,
to a faster file system (such as SSD based). You can also set the SQLite TEMP dir to an alternative location with '--tmpdir <dir>'
or setting SQLITE_TMPDIR (also helps if you run out of diskspace).
You can avoid the merge (join) phase and delay it to a later moment using the "--nomerge" (no report) option. Ideal if you scan on a slow server with limited space and you want to do the heavy lifting on a faster host later.
None, everything is contained in the SQLite database and command line options
- SQLITE_TMPDIR
-
if set, is used for the temporary tables such as used for sorting and joining
- TMPDIR
-
if SQLITE_TMPDIR is not set, TMPDIR is used for temp tables
qdda is safe to run even on files/devices that are in use. It opens streams read-only and by design, it cannot modify any files except SQLite3 database files. It writes to a database file that needs to be either newly created or a pre-existing SQLite3 database. It can remove the database file but ONLY if it is an SQLite3 file.
For added safety you may run qdda as a non-privileged user. However, non-root users usually do not have read access to block devices. To run qdda in a safe way, there are various methods you need to provide read access to the disk devices you need to scan.
Changing the group/permissions using chmod on /dev/<disk> is problematic as it either gives all users read access
or alters permissions which may break other applications such as Oracle ASM.
The best solution to this issue I have found is to use extended ACLs on Linux:
setfacl -m u:<user>:r /dev/<disk>
This gives <user> read-only access without altering any of the existing ownerships/permissions. The permissions will typically be reset at next reboot or through udev(7). You need to have ACL enabled on the file system containing /dev/ and the setfacl tool installed.
qdda
currently only runs on 64-bit linux. Disks from other platforms can be processed by using qdda over a named or unnamed pipe.
You can do this using netcat (nc)
target host: (as qdda)
nc -l 19000 | qdda
source host: (as root)
cat /dev/<disk> | nc targethost 19000
On ESXi, this worked for me. Make sure you pick the raw disk and not a partition (i.e. not ending with :1 or something similar).
You also need an open outgoing port for this, port 902 is usually open as it is reserved for vCenter.
ESX host:
cat /vmfs/devices/disks/t10.ATA_____Samsung_SSD_840_PRO_Series______________S1ATNSADB36601D_____ | nc db01 902
Linux host (make sure to have netcat installed):
nc -l 902 | qdda
Database journaling and synchronous mode are disabled for performance reasons. This means the internal database may be corrupted if qdda is ended
in an abnormal way (killed, file system full, etc).
Accessing the SQLite database directly requires recent versions of the sqlite3 tools. Older versions are not compatible with the database
schema and abort with an error upon opening.
Scanning disk partitions (/dev/sdb1, /dev/sdd4 etc) or otherwise unaligned partitions may produce poor dedupe results. This is
"as designed" - we assume you know what you are doing.
Dumping multiple devices to a single pipe (i.e. cat /dev/sda /dev/sdb | qdda) may result in wrong alignment as well.
lz4(1), zlib(3), md5(1), sqlite3(1), mkfifo(1), nc(1), udev(7), setfacl(1)
Written by Bart Sjerps http://bartsjerps.wordpress.com
If you have suggestions for improvements in this tool, please send them along via the above address.
The source code and downloadable binaries are available from https://github.com/bsjerps/qdda
Copyright © 2018 Bart Sjerps, License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.
This software is provided "as is" and follows the licensing and warranty guidelines of the GPL. In normal language that means I will not be held responsible for any problems you may encounter with this software.
- NAME
- SYNOPSIS
- DESCRIPTION
- IMPORTANT NOTES
- OPTIONS
- STORAGE ARRAYS
- COMPRESSION
- ERRORS
- EXAMPLE
- COMBINING MULTIPLE SCANS
- RESOURCE REQUIREMENTS
- EXPLANATION
- ACCURACY
- PERFORMANCE
- CONFIG FILES
- ENVIRONMENT VARIABLES
- SECURITY AND SAFETY
- OTHER PLATFORMS
- KNOWN ISSUES
- SEE ALSO
- AUTHOR
- COPYRIGHT
- DISCLAIMER
This document was created by man2html, using the manual pages.
Time: 16:19:31 GMT, September 23, 2019