Extensions can be recompressed with `advdef`


Forum: DSL Ideas and Suggestions
Topic: Extensions can be recompressed with `advdef`
started by: stupid_idiot

Posted by stupid_idiot on July 10 2007,14:56
To all extension makers:
I would like to bring a great program to your attention - it is `advdef` which is part of the < AdvanceCOMP > project.
The source code can be downloaded < here >.

`advdef` recompresses zlib-compressed files with a 100%-zlib-compatible 7zip-originated 'Deflate' algorithm. The size of the reduced file is usually between 90-100% of the original. The size reductions are enough to suggest using `advdef` always. All '.dsl' and '.tar.gz' extensions are suitable. '.uci' extensions are not - my layman's explanation is that '.uci' files are iso images with compressed sectors rather than one whole compressed file. This does not make '.uci' inferior - `create_compressed_fs` already uses 7zip's LZMA in addition to 7zip's Deflate for optimal efficiency.

Point of interest:
The latest version of `create_compressed_fs` (2.05) can create slightly smaller '.uci' files than the version in DSL (2.01) because it can choose between different compression levels for each block (zlib levels 1-9 or lzma) based on size - even though lzma is better in most cases, sometimes it loses due to higher overhead compared to zlib. This process is very time-consuming. It is used only if you specify `-L-2`. It is best-used during the final make.
The source code for the latest version can be downloaded
< here >.
You might want to read the help for `create_compressed_fs`, because the syntax has completely changed between 2.01 and 2.05.

Posted by mikshaw on July 10 2007,19:40
Quote
The size of the reduced file is usually between 90-100% of the original
This doesn't sound like much of an advantage, unless there is _no_ extra cost in processing (which also equals time) during decompression. Bzip2 can also create much smaller packages, and is already available in DSL, but is not used because of the dramatic increase in processing needed.

Quote
The latest version of `create_compressed_fs` (2.05) can create slightly smaller '.uci' files than the version in DSL
I recall reading something about the newer version of cloop being made specifically for kernel 2.6. Of course I could be completely wrong about this, and for all I know the cloop module version may have very little influence on the create_compressed_fs version. I do know that at one time I tried to install a recent version of cloop in another 2.4-kernel distro. The cloop module failed, but create_compressed_fs still worked. I thought it was odd.
As with advdef, if the extra time needed extends to decompression in addition to compression, it would probably be faced with some resistance from some people. Otherwise, it sounds like something worth looking into.

Posted by WDef on July 11 2007,13:30
On a related nore: I tried without success to get anyone here interested in lzop a few years back.  Much faster (eg 9X) than gzip or bz2 and only slightly lower compression ratio.  A huge improvement for backup times.
Posted by ^thehatsrule^ on July 11 2007,13:41
I think one of the major points is memory consumption... since gzip really takes up very little.  This link may be interesting for this topic: < http://tukaani.org/lzma/benchmarks >
Posted by roberts on July 11 2007,15:01
Reducing size is not of paramount interest when compared to decompression time and memory used. This is especially true given DSL's design philosphy of primarily being a nomadic live CD/frugal, where decompression of MyDSL extensions occurs many times and therefore time and memory required to perform such is paramount.
Posted by WDef on July 11 2007,17:05
Interesting link, hatsrule.

lzop uses LZO compression -  I suppose that is different to LZMA.  I do have a suspicion that lzop uses more memory than gzip however, but it is way faster.

Posted by stupid_idiot on July 13 2007,11:22
I refer to the wikipedia < entry > for AdvanceCOMP:

"DEFLATE specifies a stream-encoding such that any compliant decoder is able to parse any valid stream; the algorithm and program used for the compression stage are not mandated.

For generation of compressed sections of DEFLATE data, an encoder available in the zlib/gzip reference implementation has typically been utilised. The zlib/gzip compressor offers the user a sliding scale between CPU usage and the likely amount of reduction in size achieved on a range of -0 (no compression) to -9 (maximum gzip compression).

The 7-Zip DEFLATE encoder, used in the AdvanceCOMP suite, effectively extends the sliding scale further. A much more detailed search of compression possibilities is performed, at the expense of significant further processor time spent searching. Effectively, the 10-point scale used in gzip is extended to include extra settings above -9, the previous maximum search level. There will be no difference in decompression speed, regardless of the level of compressed size achieved or time taken to encode the data."

Posted by ^thehatsrule^ on July 13 2007,13:38
So, that means it just does a better job of optimizing the compressed file?  Did you actually test this out yet?
Posted by stupid_idiot on July 13 2007,16:32
Yes - all the .dsl and .tar.gz extensions I've submitted in the last few months.

To recompress a '.tar.gz' file I would use:
`advdef -z4 foo.tar.gz`
When done, it will show the size in bytes of the current file and original file, and the percentage size of the current file over the original file.

Posted by humpty on July 13 2007,21:18
i'm all for anything that compresses the backup file faster. it would justify upgrading to a faster CF card.
Posted by WDef on July 14 2007,11:39
Sorry Humpty - I confused the thread a little by bringing up lzop.  Lzop compresses *faster* (the fastest there is) but slightly bigger - from what stupid_idiot says, advdef compresses *smaller* (both as compared with gzip).

So different advantages with different progs.

Posted by ^thehatsrule^ on July 14 2007,17:54
But a major point is that advdef's format is still gzip-compatible - it just takes more time for compression, which should be fine just when making extensions.  So this seems to be more realistic to implement right away.

For Lzop, perhaps this would be great for backup/restore... though I suppose for people with larger backup.tar.gz's to grow even bigger.

Posted by ^thehatsrule^ on July 15 2007,19:02
Some reports using advdef:

Regular .tar.gz extension: ~50MB file (using gzip default -9 level)
Used: advdef -z4
~4 minutes; saved ~2%; ~1MB difference

Posted by WDef on July 18 2007,12:57
hats:  a pedantic aside  - I don't think -9 is the default for GNU gzip.
Posted by ^thehatsrule^ on July 18 2007,13:29
wdef: Ah.. well I used tar zcvf... I think I meant it for the gzip default in GNU tar, because the size of that and manually using gzip -9 was the same (or is it?)
Posted by mikshaw on July 18 2007,13:34
Still, I wonder why bother at all for 2% when using gzip -9 is nearly the same and takes less time.  If you want max compression when using tar -z, you can set the variable GZIP="-9"
Posted by stupid_idiot on July 18 2007,16:00
Two things:
1. Recompression does not take very long.
2. Usually, I am seeing double-digit size reductions even for extensions of 1MB or less.
e.g.
Code Sample
du install_flash_player_7_linux.tar.gz
999     install_flash_player_7_linux.tar.gz

time advdef -z4 install_flash_player_7_linux.tar.gz
    1017790      980222  96% install_flash_player_7_linux.tar.gz
    1017790      980222  96%

real    0m5.163s
user    0m4.788s
sys     0m0.204s

du install_flash_player_7_linux.tar.gz
963     install_flash_player_7_linux.tar.gz


Filetypes do play a part as well. Fonts, for example, seem to benefit more from the new algorithm.
e.g. (DSL 3.3 '/usr/X11R6/lib/X11/fonts/misc')
Code Sample
/usr/X11R6/lib/X11/fonts/misc# du -cs *.gz
1096    total
/usr/X11R6/lib/X11/fonts/misc# advdef -z4 *.gz
919075      785524  85%

This is important: CPU is a P4 1.7 'Willamette' on PC-133 SDRAM.

Posted by mikshaw on July 18 2007,17:48
And again, the question of the amount of compression used in these *.gz files has not been addressed. If it wasn't originally compressed to maximum (most aren't), the comparison is an unfair one. It's possible that compressing with gzip at the maximum level, as ^thehatsrule^ demonstrated, might show the size decrease to be consistently insignificant.  Or the results might be more encouraging than that. I stand by my original (and repeated) statement that if it doesn't make it noticeably smaller (which has yet to be shown), or if it uses noticeably more resources (a p4 won't say much about how it works on 486), there is very little point in it.

I can understand the desire for better compression techniques, or course, but over the last several years there has been little acceptible improvement. Either the technique is too slow, or the result isn't worth replacing the de facto standard.

Posted by stupid_idiot on July 18 2007,23:00
Quote (mikshaw @ July 18 2007,21:48)
And again, the question of the amount of compression used in these *.gz files has not been addressed. If it wasn't originally compressed to maximum (most aren't), the comparison is an unfair one. It's possible that compressing with gzip at the maximum level, as ^thehatsrule^ demonstrated, might show the size decrease to be consistently insignificant.

Agreed. I will post the font-compression comparison again, this time using '-9' on the original files first.

Update:
Code Sample

/usr/X11R6/lib/X11/fonts/misc# gzip -d *.gz
/usr/X11R6/lib/X11/fonts/misc# du -cs *.pcf
5148    total
/usr/X11R6/lib/X11/fonts/misc# gzip -9 *.pcf
/usr/X11R6/lib/X11/fonts/misc# du -cs *.gz
1056    total
/usr/X11R6/lib/X11/fonts/misc# advdef -z4 *.gz
880984      786600  89%
/usr/X11R6/lib/X11/fonts/misc# du -cs *.gz
948     total

Posted by ^thehatsrule^ on July 19 2007,05:02
I think the reason why my package couldn't have been as small as it could be, was that it contained many archived types of files already.  I think if I did run it on each of those, then compare, there could be a significant difference.  <edit>Tried it out... now it has a total of 8% difference in size - around 4 MB</edit>

Since I mainly do my testing in virtualized or emulated environments, I do not think posting system specs will be beneficial for comparison.

I also tried some ~3mb binaries and the recompressed files were usually 4-5% smaller.

Decompression:
I did some timed tests using busybox's time (wonder why this was included in the busybox build), and concluded that the minor differences (few milliseconds) were good enough to conclude that the time taken to decompress should be the same with or without advdef.  I also subtracted differences in system memory using free (is this a valid use of it?), and the peak values were also very close - so the memory consumption should be around the same.

So what it boils down to, is if the extension maintainer has the time and the resources to use advdef.

also, for reference.. from manpage:
Quote
4 Limitations
The advdef program cannot be used to recompress huge files because it needs to allocate memory for both the complete compressed and uncompressed data.

Posted by mikshaw on July 19 2007,13:19
That sounds promising.
Posted by ^thehatsrule^ on July 19 2007,17:28
Just as I thought all was going well, I just realized the compressed archives inside the package were only "lightly" compressed... so those reported differences aren't any good for comparison.

<edit>repackaged everything, now the total difference is ~6%, which is still not bad</edit>

Powered by Ikonboard 3.1.2a
Ikonboard © 2001 Jarvis Entertainment Group, Inc.