Building my a previous article where I encrypt/decrypt a string, I wanted to go further and ZIP it. So started a rather long and painful wander into the world in compression.

Knowing the GZIP is included in the .NET framework I naturally started there.

However, after crafting a basic unit test that did a basic round trip SOURCE->GZIP->RESULT and comparing that SOURCE==RESULT -- which worked, I discovered that the compressed size was LARGER than the SOURCE. Some articles on the web led my to believe that my implementation was incorrect giving a larger result.

However, this did not resolve my issue. It turns out that I had selected a .JPG file as my source binary test data. The file format itself if of course heavily compressed already resulting in the file size growing my almost 100%!

This means an implementation of GZIP needs to check the resulting file size, then if it is larger discard the compression and use the raw file source instead.

This leave a last problem, how do you know which method to use. We now find ourselves moving back to a formal file format like .ZIP, .7Z, etc. My basic solution is to always add a 4-byte prefix to the binary stream 0x0000 means RAW, while 0x0001 means GZIP (the rest of the padding if future-proofing). Nutz.

This deep-delve into compression lead me to the 7Zip SDK which as a complete implementation on C# amongst others. Wow! Clearly, I would need to implement this...

Peter Bromberg made a great start. In the end all I did was to simplify what he had build and create a clean implementation I called 7Zip.SDK.Clean ([7Zip.SDK.Clean.zip] [7Zip.SDK.Clean.zip])