Thursday, August 25, 2011

Shrinking Down JPEGS! ( JPEGmini )

Recently, I came across this post from PetaPixel about JPEGmini

It details a company that has come up with a way to optimize an image so that it visually looks virtually identical to the original, but takes up a fraction of the original file size. That's awesome! I even tested it against some image files:

Black Sea Nettles - JPEG shrunk using custom algorithm.

  • 24 megapixel JPEG / 12MB in size => 1.6MB resulting file after JPEGmini.
  • 12 megapixel JPEG / 4.8MB in size => 1.1MB resulting file after JPEGmini.
I performed a difference test against the original and the JPEGmini version. They are virtually identical. If you take the resulting difference image and adjust the levels, there are in fact different, but the difference can be attributed to minor random noise vs compression artifacts. Amazing!
DIY JPEG Compression Improvement

Being the kind of guy I am, I was wondering... how did they do this? According to the JPEGmini site:
  • improved algorithms
  • compressing to a level just before artifacting becomes an issue
Well, thinking that I could at least get some improvement, I leverage ImageMagick and some shell scripting.

I tested with the same images and came up with the following:
  • 24MP / 12MB => 2.8MB (1.7MB with updates below) (jpeg compression quality @73%)
  • 12MP / 4.6MB => 1.1MB (jpeg compression quality @87%)
The images produced were likewise virtually the same and a difference mask between the two in photoshop shows nothing, unless you auto-level. The amount of difference between my reduced images and JPEGmini's reduced images were comparable. No new JPEG compression algorithm on my part, just applying basic JPEG compression guidelines from FileFormat's JPEG Page:


The JPEG library supplied by the Independent JPEG Group uses a quality setting scale of 1 to 100. To find the optimal compression for an image using the JPEG library, follow these steps:
  • Encode the image using a quality setting of 75 (-Q 75). If you observe unacceptable defects in the image, increase the value, and re-encode the image. If the image quality is acceptable, decrease the setting until the image quality is barely acceptable. This will be the optimal quality setting for this image.  Repeat this process for every image you have (or just encode them all using a quality setting of 75).

The process that my script goes through:

  1. Compress original image at 99% image quality
  2. Do a comparison metric between the original and the new image.
  3. If the compression has resulted in a difference greater than a certain threshold, use the compression quality percentage prior to the current one.
  4. If difference is less than threshold, decrease image quality by 1% and repeat from step #2.
Using this process against the whole image, I could reduce <12MP files down to ranges similar to what JPEGmini was able to achieve, albeit slower. For files >12MP, I would achieve great savings in space, but not as great as JPEGmini, for a given threshold of difference. 

Edit: Upon reviewing my notes, it looks like the threshold for <12MP and for >12MP is different from the JPEGmini files. When optimizing with the expanded threshold for larger files, I am able to get comparable to what JPEGmini gets.

How I Did The Compression Process

  • Use ImageMagick's "compare" to generate "image distortion" readings:
    • convert -quality PERCENTAGE% original-image.jpg compressed-image.jpg
    • compare -verbose -metric mae original-image.jpg compressed-image.jpg
  • Using the example file from JPEGmini, establish a baseline:
    • 24MP image = ceiling of 370 points of distortion
    • 12MP image = ceiling of 265 points of distortion
  • Starting at 99% quality for JPEG compression, continue to reduce quality until the distortion measured exceeds the baseline. If it does, use the compression level from before.
The level of acceptable compression will differ from image to image. If the image has more pixels/images, then it will be able to contain more points of distortion before the image quality suffers.

Optimizations (speed)

To optimize the speed, especially for larger images, I scaled down the original image and performed the tests on the scaled down images. Once the ideal was located, I performed the compression on the original with the quality determined from the scaled down image.  For very large images, I take a crop from center of the image, which results in a more accurate quality prediction.

Thoughts

I can see why most applications don't both with this process... it's basically exhaustive testing to determine how much you can compress before image quality is degraded unacceptably. In addition to scaling down the image, I could also do quality percentage point skipping and backtracking when quality is degraded. 

What would be ideal is an API and library set that does an image quality analysis and gives you an optimally compressed JPEG out of the box. That is the value add of what JPEGmini is offering, and I think that it is a technology which any company involved with image storage should look into.

The Code

Note, the shell script/etc are not published, as the goal was to explore whether or not it could be done. Based on the description above, one can easily write the appropriate wrapper script or use the appropriate java/python/php/etc hooks. 

Update/Edited(8/29/2011):

  • Improved optimization to the quality locating logic resulted in a reduction from 5 minutes to 13 seconds to locate the quality setting and compress a 24MP image.
  • Planning on creating a Lightroom 3 export plugin for this scripted method.
  • Planning on trying to reduce the time required from 13 seconds down to under 10 seconds.

6 comments:

  1. Updated blog, got the quality level setting logic to achieve a hit at 13-14 seconds for a 24MP image vs the 5 minutes it was taking. Woot!

    When I can get it down to below 5 seconds(unlikely), I should be able to turn out a viable Lightroom 3 export plugin.

    ReplyDelete
  2. Some metrics:

    Optimal compression quality level determined: 77% >> Compresing [ DSC00620.jpg ] => [ ../mini-2/DSC00620.jpg ]
    Optimal compression quality level determined: 77% >> Compresing [ DSC00622.jpg ] => [ ../mini-2/DSC00622.jpg ]
    Optimal compression quality level determined: 77% >> Compresing [ DSC00624.jpg ] => [ ../mini-2/DSC00624.jpg ]
    Optimal compression quality level determined: 77% >> Compresing [ DSC00626.jpg ] => [ ../mini-2/DSC00626.jpg ]
    Optimal compression quality level determined: 77% >> Compresing [ DSC00629.jpg ] => [ ../mini-2/DSC00629.jpg ]
    Optimal compression quality level determined: 77% >> Compresing [ DSC00632.jpg ] => [ ../mini-2/DSC00632.jpg ]
    Optimal compression quality level determined: 75% >> Compresing [ DSC00633.jpg ] => [ ../mini-2/DSC00633.jpg ]
    Optimal compression quality level determined: 77% >> Compresing [ DSC00636.jpg ] => [ ../mini-2/DSC00636.jpg ]
    Optimal compression quality level determined: 77% >> Compresing [ DSC00637.jpg ] => [ ../mini-2/DSC00637.jpg ]
    Optimal compression quality level determined: 59% >> Compresing [ DSC00639.jpg ] => [ ../mini-2/DSC00639.jpg ]
    Optimal compression quality level determined: 65% >> Compresing [ DSC00642.jpg ] => [ ../mini-2/DSC00642.jpg ]
    Optimal compression quality level determined: 59% >> Compresing [ DSC00644.jpg ] => [ ../mini-2/DSC00644.jpg ]
    Optimal compression quality level determined: 77% >> Compresing [ DSC00648.jpg ] => [ ../mini-2/DSC00648.jpg ]
    Optimal compression quality level determined: 75% >> Compresing [ DSC00652.jpg ] => [ ../mini-2/DSC00652.jpg ]
    Optimal compression quality level determined: 68% >> Compresing [ DSC00653.jpg ] => [ ../mini-2/DSC00653.jpg ]
    Optimal compression quality level determined: 81% >> Compresing [ DSC00665.jpg ] => [ ../mini-2/DSC00665.jpg ]

    ORIGINALS:


    6.2M DSC00620
    6.2M DSC00622
    6.3M DSC00624
    6.0M DSC00626
    5.7M DSC00629
    6.0M DSC00632
    5.9M DSC00633
    6.2M DSC00636
    6.0M DSC00637
    4.4M DSC00639
    4.6M DSC00642
    4.3M DSC00644
    5.8M DSC00648
    6.2M DSC00652
    5.4M DSC00653
    5.5M DSC00665


    Recompressed:

    988K DSC00620
    1020K DSC00622
    1016K DSC00624
    936K DSC00626
    864K DSC00629
    984K DSC00632
    844K DSC00633
    988K DSC00636
    924K DSC00637
    448K DSC00639
    528K DSC00642
    424K DSC00644
    808K DSC00648
    796K DSC00652
    596K DSC00653
    1.0M DSC00665

    Summary:

    91MB vs 13MB

    Total time taken, about 3:20 to process 16 files, or about 12-13 seconds apiece.

    ReplyDelete
  3. Just processed a bunch of images from a recent trip to the Monterey Bay Aquarium. Some are still triggering a bug where it is compressing, but not at a level nearly as high as one would like. 93% vs say 80%. Will need to tweak.

    62 10MP down sampled images re-compressed from 605MB to 321MB. Comparing them at full screen, they look the same. The images with large swaths of out of focus detail compress much better than those without. Ie, a 7.2MB image compressed down to 560KB was pretty cool.

    ReplyDelete
  4. Which OS and ImageMagick version are you using for the comparison?

    When I tried to use the compare command as you've written, it failed to work until I provided another parameter (difference PNG output file). And when it did work, it took between 5 and 10 seconds per comparison (on a 12MP image), so for example a 10 step compression-comparison operation could last a couple of minutes.

    Are you releasing some code that can help us out?

    ReplyDelete
  5. On the Mac OSX(MBP):
    * Darwin Kernel Version 11.1.0
    * Version: ImageMagick 6.7.1-1 2011-07-21

    On the Linux VPS(256MB Linode Instance:
    * Ubuntu 11.04
    * Version: ImageMagick 6.6.2-6 2011-03-16

    The key to ensuring a usable time frame for processing the images is to sample and work on a subset of the image. If you work on the whole image, yes, it will take quite a while per image. I was testing against 12MP images from my A700 and 24MP images from online forums from the A77.

    Technique #1, I just downsampled the original image. However, this resulted in inaccurate results.

    Technique #2: I take 4-5 sample slices of the original image, and do the processing on those slices. I find this works better to preserve the details, but adds additional steps.

    In either case, instead of going through all quality levels, I start from a best guess level, and compare it to the target difference value, and change direction to match. This reduces the search space by about half, so instead of say doing 10-15 compress and compares, I'm maybe doing 3-7 compress and compares. Hence I'm able to get a final output image in under 15 seconds for an image.

    I'm working on creating a stable script/library for this, as I think it's pretty cool. Currently, there is a bug where it sets the target too low and nothing good comes of it.

    ReplyDelete
  6. Just swung by the PetaPixel page again and it looks like someone else found a native Ubuntu binary that does this:

    http://www.akirasan.net/?p=799

    jpegoptim :: http://freshmeat.net/projects/jpegoptim/

    Sweet! No sense re-inventing the wheel. :)

    ReplyDelete