Building my a previous article where I encrypt/decrypt a string, I wanted to go further and ZIP it. So started a
rather long and painful wander into the world in compression.
Knowing the GZIP is included in the .NET framework I naturally started there.
However, after crafting a basic unit test that did a basic round trip SOURCE->GZIP->RESULT and comparing that SOURCE==RESULT -- which worked,
I discovered that the compressed size was LARGER than the SOURCE. Some articles on the web led my to believe that my implementation was
incorrect giving a larger result.
However, this did not resolve my issue. It turns out that I had selected a .JPG file as my source binary test data. The file format itself
if of course heavily compressed already resulting in the file size growing my almost 100%!
This means an implementation of GZIP needs to check the resulting file size, then if it is larger discard the compression and use the
raw file source instead.
This leave a last problem, how do you know which method to use. We now find ourselves moving back to a formal file format like .ZIP, .7Z, etc.
My basic solution is to always add a 4-byte prefix to the binary stream 0x0000 means RAW, while 0x0001 means GZIP (the rest
of the padding if future-proofing). Nutz.
This deep-delve into compression lead me to the 7Zip SDK which as a complete implementation on C#
amongst others. Wow! Clearly, I would need to implement this...
Peter Bromberg made a great start.
In the end all I did was to simplify what he had build and create a clean implementation I called 7Zip.SDK.Clean (
[7Zip.SDK.Clean.zip])