Linux kernel zstd compression

Zstandard — Fast real-time compression algorithm

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Zstandard, or zstd as short version, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios. It’s backed by a very fast entropy stage, provided by Huff0 and FSE library.

Zstandard’s format is stable and documented in RFC8878. Multiple independent implementations are already available. This repository represents the reference implementation, provided as an open-source dual BSD and GPLv2 licensed C library, and a command line utility producing and decoding .zst , .gz , .xz and .lz4 files. Should your project require another programming language, a list of known ports and bindings is provided on Zstandard homepage.

Development branch status:

For reference, several fast compression algorithms were tested and compared on a server running Arch Linux ( Linux version 5.5.11-arch1-1 ), with a Core i9-9900K CPU @ 5.0GHz, using lzbench, an open-source in-memory benchmark by @inikep compiled with gcc 9.3.0, on the Silesia compression corpus.

Compressor name	Ratio	Compression	Decompress.
zstd 1.4.5 -1	2.884	500 MB/s	1660 MB/s
zlib 1.2.11 -1	2.743	90 MB/s	400 MB/s
brotli 1.0.7 -0	2.703	400 MB/s	450 MB/s
zstd 1.4.5 —fast=1	2.434	570 MB/s	2200 MB/s
zstd 1.4.5 —fast=3	2.312	640 MB/s	2300 MB/s
quicklz 1.5.0 -1	2.238	560 MB/s	710 MB/s
zstd 1.4.5 —fast=5	2.178	700 MB/s	2420 MB/s
lzo1x 2.10 -1	2.106	690 MB/s	820 MB/s
lz4 1.9.2	2.101	740 MB/s	4530 MB/s
zstd 1.4.5 —fast=7	2.096	750 MB/s	2480 MB/s
lzf 3.6 -1	2.077	410 MB/s	860 MB/s
snappy 1.1.8	2.073	560 MB/s	1790 MB/s

The negative compression levels, specified with —fast=# , offer faster compression and decompression speed in exchange for some loss in compression ratio compared to level 1, as seen in the table above.

Zstd can also offer stronger compression ratios at the cost of compression speed. Speed vs Compression trade-off is configurable by small increments. Decompression speed is preserved and remains roughly the same at all settings, a property shared by most LZ compression algorithms, such as zlib or lzma.

The following tests were run on a server running Linux Debian ( Linux version 4.14.0-3-amd64 ) with a Core i7-6700K CPU @ 4.0GHz, using lzbench, an open-source in-memory benchmark by @inikep compiled with gcc 7.3.0, on the Silesia compression corpus.

Compression Speed vs Ratio	Decompression Speed

A few other algorithms can produce higher compression ratios at slower speeds, falling outside of the graph. For a larger picture including slow modes, click on this link.

The case for Small Data compression

Previous charts provide results applicable to typical file and stream scenarios (several MB). Small data comes with different perspectives.

The smaller the amount of data to compress, the more difficult it is to compress. This problem is common to all compression algorithms, and reason is, compression algorithms learn from past data how to compress future data. But at the beginning of a new data set, there is no «past» to build upon.

To solve this situation, Zstd offers a training mode, which can be used to tune the algorithm for a selected type of data. Training Zstandard is achieved by providing it with a few samples (one file per sample). The result of this training is stored in a file called «dictionary», which must be loaded before compression and decompression. Using this dictionary, the compression ratio achievable on small data improves dramatically.

The following example uses the github-users sample set, created from github public API. It consists of roughly 10K records weighing about 1KB each.

Compression Ratio	Compression Speed	Decompression Speed

These compression gains are achieved while simultaneously providing faster compression and decompression speeds.

Training works if there is some correlation in a family of small data samples. The more data-specific a dictionary is, the more efficient it is (there is no universal dictionary). Hence, deploying one dictionary per type of data will provide the greatest benefits. Dictionary gains are mostly effective in the first few KB. Then, the compression algorithm will gradually use previously decoded content to better compress the rest of the file.

Dictionary compression How To:

Create the dictionary

zstd —train FullPathToTrainingSet/* -o dictionaryName

Compress with dictionary

zstd -D dictionaryName FILE

Decompress with dictionary

zstd -D dictionaryName —decompress FILE.zst

If your system is compatible with standard make (or gmake ), invoking make in root directory will generate zstd cli in root directory.

Other available options include:

make install : create and install zstd cli, library and man pages
make check : create and run zstd , tests its behavior on local platform

A cmake project generator is provided within build/cmake . It can generate Makefiles or other build scripts to create zstd binary, and libzstd dynamic and static libraries.

By default, CMAKE_BUILD_TYPE is set to Release .

A Meson project is provided within build/meson . Follow build instructions in that directory.

You can also take a look at .travis.yml file for an example about how Meson is used to build this project.

Note that default build type is release.

You can build and install zstd vcpkg dependency manager:

The zstd port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, please create an issue or pull request on the vcpkg repository.

Visual Studio (Windows)

Going into build directory, you will find additional possibilities:

Projects for Visual Studio 2005, 2008 and 2010.
- VS2010 project is compatible with VS2012, VS2013, VS2015 and VS2017.
Automated build scripts for Visual compiler by @KrzysFR, in build/VS_scripts , which will build zstd cli and libzstd library without any need to open Visual Studio solution.

You can build the zstd binary via buck by executing: buck build programs:zstd from the root of the repo. The output binary will be in buck-out/gen/programs/ .

You can run quick local smoke tests by executing the playTest.sh script from the src/tests directory. Two env variables $ZSTD_BIN and $DATAGEN_BIN are needed for the test script to locate the zstd and datagen binary. For information on CI testing, please refer to TESTING.md

Zstandard is currently deployed within Facebook. It is used continuously to compress large amounts of data in multiple formats and use cases. Zstandard is considered safe for production environments.

Zstandard is dual-licensed under BSD and GPLv2.

The dev branch is the one where all contributions are merged before reaching release . If you plan to propose a patch, please commit into the dev branch, or its own feature branch. Direct commit to release are not permitted. For more information, please read CONTRIBUTING.

About

Zstandard — Fast real-time compression algorithm

Источник

Comparison of Compression Algorithms

GNU/Linux and *BSD has a wide range of compression algorithms available for file archiving purposes. There’s gzip, bzip2, xz, lzip, lzma, lzop and less free tools like rar, zip, arc to choose from. Knowing which one to use can be so confusing. Here’s an attempt to give you an idea how the various choices compare.

Introduction [ edit ]

Most file archiving and compression on GNU/Linux and BSD is done with the tar utility. It’s name is short for tape archiver which is why every tar command you will use ever has to include the f flag to tell it that you are will be working on files not a ancient tape device. Creating a compressed file with tar is typically done by running tar c reate f and a compression algorithms flag followed by files and/or directories. The standard compression flags are:

Short Option	Long Option	Algorithm
z	—gzip	gzip
j	—bzip2	bzip2
J	—xz	xz
z	—compress	LZA (compress)
—lzip	lzip
—lzma	lzma
—zstd	zstd

These are not your only options, there’s more. tar accepts -I to invoke any third party compression utility.

Short Option	Algorithm
-Iplzip	Parallel lzip
-Ipigz	Parallel gzip
-Ipxz	Parallel XZ (LZMA)
The above arguments will only work if you actually have plzip and pigz installed. Also note that you will have to have c or x before and -f after -I when you use -I . Example: tar c -I»pigz -9″ -f archive.tar.gz folder/

So which should you use? It depends on the level of compression you want and speed you desire. You may have to pick just one of the two. Speed will depend widely on what binary you use for the compression algorithm you pick. As you will see below: There is a huge difference between using the standard bzip2 binary most (all?) distributions use by default and parallel pbzip2 which can into multi-core computing.

Compressing The Linux Kernel [ edit ]

Note: These tests were done using a Ryzen 2600 with Samsung SSDs in RAID1. The differences between bzip2 and pbzip2 and xz and pxz will be much smaller on a dual-core. We could test on slower systems if anyone cares, but that seems unlikely given that only 3 people a month read this article and those three people use Windows, macOS and an Android phone respectively.

The following results are what you can expect in terms of relative performance when using tar to compress the Linux kernel with tar c —algo -f linux-5.8.1.tar.algo linux-5.8.1/ (or tar cfX linux-5.8.1.tar.algo linux-5.8.1/ or tar c -I»programname -options» -f linux-5.8.1.tar.algo linux-5.8.1/ )

Ruling out cache impact was done by running sync; echo 3 > /proc/sys/vm/drop_caches between runs.

The exact number will vary depending on your CPU, number of cores and SSD/HDD speed but the relative performance differences will be somewhat similar.

Algorithm	Time	Size	Command	Parameters	Comment
none	0m0.934s	939M	tar	cf	tar itself is an archiving tool, you do not need to compress the archives.
gzip	0m23.502s	177M	gzip	cfz
gzip	0m3.132s	177M	pigz	c -Ipigz -f	Parallel gzip using pigz 2.4.
bzip2	1m0.798s	134M	bzip2	cfj	Standard bzip2 will only use one core (at 100%)
bzip2	0m9.091s	135M	pbzip2	c -Ipbzip2 -f	Parallel bzip2. pbzip2 process used about 900 MiB RAM at maximum.
lz4	0m3.914s	287M	lz4	c -I»lz4″ -f	Really fast but the resulting archive is barely compressed. Worst compression king.
lz4	0m56.506s	207M	lz4 -12	c -I»lz4 -12″ -f	Supports levels -8 . Uses 1 core, and there does not appear to be any multi-threaded variant.
lzip	4m42.017s	116M	lzip	c —lzip -f	v1.21. Standard lzip will only use one core (at 100%). Very slow.
lzip	0m42.542	118M	plzip	c -Iplzip -f	plzip 1.8 (Parallel lzip), default level (-6).
lzip	1m39.697s	110M	plzip -9	c -I»plzip -9″ -f	Parallel lzip at best compression ( -9 ). plzip process used 5.1 GiB RAM at maximum.
xz	5m2.952s	114M	xz	cfJ	Standard xz will only use one core (at 100%). Unbearably slow.
xz	0m53.569s	115M	pxz	c -Ipxz -f	Parallel PXZ 4.999.9beta. Process used 1.4 GiB RAM at maximum.
xz	1m33.441s	110M	pxz -9	c -I»pxz -9″ -f	Parallel PXZ 4.999.9beta using its best possible compression. pxz process used 3.5 GiB at maximum.
zstd	0m3.034s	167M	zstd	c —zstd -f	zstd uses 1 core by default.
zstd	1m18.238s	117M	zstd -19 -T0	c -I»zstd -19 -T0″ -f	-19 gives the best possible compression and -T0 utilizes all cores. If a non-zero number is specified, zstd uses that many cores.

Notable Takeaways [ edit ]

A few minor points should be apparent from above numbers:

All the standard binaries GNU/Linux distributions give you as a default for all the commonly used compression algorithms are extremely slow compared to the parallel implementations that are available but not defaults.
- This is true for bzip, there is a huge difference between 10 seconds and one minute. And it is specially true for lzip and xz, the difference between one minute and five is significant.
- The difference between the pigz parallel implementation of gzip and regular gzip may appear to be small since both are very fast. The difference between 3 and 23 seconds is huge in terms of percentage.
lzip and xz offer the best compression. They are also the slowest alternatives. This is especially true if you do not use the parallel implementations.
- Both plzip (5.1 GiB) and pxz (3.5 GiB at -9) use a lot of memory. Expect much worse performance on memory-constrained machines.
The difference between bzip2 and pbzip2 is huge. It may not appear that way since bzip is so much faster than xz and lzip but pbzip actually about ten times faster than regular bzip.
pbzip2’s default compression is apparently it’s best at -9. A close-up inspection of the output files reveal that they are identical (130260727b) with and without -9.

zstd, appears to be the clear winner, with leading compression speed, decompression speed, and acceptable compression ratio.

Decompressing The Linux Kernel [ edit ]

Compression ratio is not the only concern one may want to consider, a well-compressed archive that takes forever to decompress will make end-users unhappy. Thus; it may be worth-while to look at the respective decompression speeds.

Keep in mind that most people will not use any parallel implementation to decompress their archives, it is much more likely that they will use whatever defaults the distributions provide. And those would be.. the single-threaded implementations.

Algorithm	Time	Command	Parameters	Comments
none	0m1.204s	tar	-xf	Raw tar with no compression.
gzip2	0m4.232s	gzip2	-xfz
gzip	0m2.729s	pigz	-x -Ipigz -f	gzip is a clear winner if decompression speed is the only consideration.
bzip2	0m20.181s	bzip2	xfj
bzip2	0m19.533s	pbzip2	-x -Ipbzip2 -f	The difference between bzip2 and pbzip2 when decompressing is barely measurable
lzip	0m10.590s	lzip	-x —lzip -f
lz4	0m1.873s	lz4	-x -Ilz4 -f	Fastest of them all but not very impressive considering the compression it offers is almost nonexistant.
lzip	0m8.982s	plzip	-x -Iplzip -f
xz	0m7.419s	xz	-xfJ	xz offers the best decompression speeds of all the well-compressed algorithms.
xz	0m7.462s	pxz	-x -Ipxz -f
zstd	0m3.095s	zstd	-x —zstd -f	When compressed with no options (the default compression level is 3).
zstd	0m2.556s	zstd	x —zstd -f	When compressed with tar c -I»zstd -19 -T0″ (compression level 19)

TIP: tar is typically able to figure out what kind of archive you are trying to extract.
tar xf linux-5.8.1.tar.xz and tar xfJ linux-5.8.1.tar.xz will both work. You need to specify what kind of compression algorithm you want to use when you make an archive but you can omit algorithm-specific flags and let tar figure it out when you extract archives.

xz is the fastest decompressing well-compressed algorithm. gzip does offer much faster decompression but the compression ratio gzip offers is far worse. bzip2 offers much faster compression than xz but xz decompresses a lot faster than bzip2.
zstd is also looking very good when the best compression level 19 and multiple cores are used. Decompression is very fast and it is faster, not slower, when higher compression is used.

zram block drive compression [ edit ]

The Linux kernel allows you to create a compressed block device in RAM using the zram module. It is typically used to create a compressed RAM-backed RAM device but it does not have to be used for that purpose; you can use it like you would use any block device like a HDD or a NVMe drive. The Linux kernel supports several compression algorithms for zram devices:

Benchmarking how these in-kernel compression algorithms block devices work in a repeatable way is a bit tricky. Here’s what happens if you extract Linux 5.9 rc4 to a uncompressed kernel tmpfs:

and then create and mount a compressed zram file system using the various compression algorithms:

We repeated the above steps of each of the available compression algorithms ( lzo lzo-rle lz4 lz4hc 842 zstd ) and did the same «benchmark»:

We then used zramctl to see the compressed and the total memory use by the zram device.

In case you want to try yourself, do this between each run:

These are the results:

Storing Linux 5.9 rc4
on a compressed zram block device

Algorithm	cp time	Data	Compressed	Total
lzo	4.571s	1.1G	387.8M	409.8M
lzo-rle	4.471s	1.1G	388M	410M
lz4	4.467s	1.1G	403.4M	426.4M
lz4hc	14.584s	1.1G	362.8M	383.2M
842	22.574s	1.1G	538.6M	570.5M
zstd	7.897s	1.1G	285.3M	298.8M

|Time, in this case, is mostly irrelevant. There is a practical difference, and the numbers in the above table do vary. However, be aware that kernel write caching was not disabled. The numbers provide an indication, and they are what time returned. They just don’t accurately reflect the total time before all data was actually «written» to the zram device.

It would seem that the zstd compression algorithm is vastly superior when it comes to compressing the Linux kernel in memory. It is also notably slower than lzo-rle, not that the times listed above are very accurate, they should merely be taken as an indication.

We are not entirely clear on what compression level the kernel uses for zstd by default. For comparison,

produces a 117 MiB large linux-5.9-rc4.tar.zstd file while

produces a 166 MiB file in 1.389 seconds. Going down to level 1 ( -1 increases the file size to 186M while the time is reduced to 0m1.064s. That’s still one hundred megabyte less than what the Linux kernel version 5.9 rc4 uses to store itself on a zram block device. It is safe to say that the compression you can expect when you use the kernel-provided implementations of various compression algorithms differs from what you get when you create archives using tar.

Источник

Linux kernel zstd compression

zstd (1) — Linux Man Pages

zstd: zstd, zstdmt, unzstd, zstdcat — Compress or decompress .zst files

SYNOPSIS

DESCRIPTION