Introduction: Quantimetric Image Processing
(Above figure illustrates comparison of existing image processing method with quantimetric image processing. Note the improved result. Top right image shows strange artifacts that come from incorrect assumption that pictures measure something such as light. Bottom right image shows better result by doing the same thing quantimetrically.)
In this Instructable you will learn how to greatly improve the performance of existing imaging or vision sensing systems by using a very simple concept: Quantimetric image sensing.
Quantimetric image processing greatly improves upon any of the following:
- Existing image processing such as image deblurring;
- Machine learning, computer vision, and pattern-recognition;
- Wearable face recognizer (see http://wearcam.org/vmp.pdf), AI and HI based vision, etc..
The basic idea is to quantimetrically pre-process and post-process the images, as follows:
- Expand the dynamic range of the image or images;
- Process the image or images as you normally would;
- Compress the dynamic range of the image or images (i.e. undo step 1).
In previous Instructables, I taught some aspects of HDR (High Dynamic Range) sensing and quantimetric sensing, e.g. linearity, superposition, etc..
Now let us put this knowledge to use.
Take any existing process you'd like to use. The example I'll show is image deblurring, but you can also use it for just about anything else.
Step 1: Expand the Dynamic Range of Your Image or Images
(Figures adapted from "Intelligent Image Processing", John Wiley and Sons Interscience Series, Steve Mann, November 2001)
The first step is to expand the dynamic range of the input image.
Ideally you should first determine the camera's response function, f, and then apply the inverse response, f inverse, to the image.
Typical cameras are compressive of dynamic range, so we typically want to apply an expansive function.
If you don't know the response function, begin by trying something simple such as loading the image into an image array, casting the variables to a data type such as (float) or (double), and raising each pixel value to an exponent, such, as, for example, squaring each pixel value.
Rationale:
Why are we doing this?
The answer is that most cameras compress their dynamic range. The reason they do this is that most display media expands dynamic range. This is quite by accident: the amount of light emitted by a cathode-ray tube television display is approximately equal to the voltage raised to the exponent of 2.22 so that when the video voltage input is about half way, the amount of light emitted is much less than half.
Photographic media are also dynamic-range expansive. For example, a photographic "neutral" grey card emits 18% of the incident light (not 50% of incident light). This much light (18%) is considered to be in the middle of the response. So as you can see, if we look at a graph of output as a function of input, display media behave as if they are ideal linear displays that contain a dynamic range expander before the ideal linear response.
In the top figure, above, you can see the display boxed in with a dotted line, and it is equivalent to having an expander before the ideal linear display.
Since displays are inherently expansive, cameras need to be designed to be compressive so that the images look good on the existing displays.
Back in the old days when there were thousands of television receiver displays and just one or two broadcasting stations (e.g. just one or two television cameras), it was an easier fix to put a compressive nonlinearity into the camera than to recall all the televisions and put one in each television receiver.
By accident this also helped with noise reduction. In audio we call this "Dolby" ("companding") and award a patent for it. In video it happened totally by accident. Stockham proposed that we should take the logarithm of images before processing them, and then take the antilog. What he did not realize is that most cameras and displays already do this quite by chance. Instead, what I proposed is that we do the exact opposite of what Stockham proposed. (See "Intelligent Image Processing", John Wiley and Sons Interscience Series, page 109-111.)
In the lower picture, you see the proposed anti-homomorphic (quantimetric) image processing, where we've added the step of expansion and compression of the dynamic range.
Step 2: Process the Images, or Perform the Computer Vision, Machine Learning, or the Like
The second step, after dynamic range expansion, is to process the images.
In my case, I simply performed a deconvolution of the image, with the blur function, i.e. image deblurring, as is commonly known in the prior art.
There are two broad categories of quantimetric image sensing:
- Helping people see;
- Helping machines see.
If we're trying to help people see (which is the example I'm showing here), we're not done yet: we need to take the processed result back into imagespace.
If we're helping machines see (e.g. face recognition), we're done now (no need to go on to step 3).
Step 3: Recompress the Dynamic Range of the Result
When we're working in expanded dynamic range, we're said to be in "lightspace" (quantimetric imagespace).
At the end of Step 2, we're in lightspace, and we need to get back to imagespace.
So this step 3 is about getting back to imagespace.
To perform step 3, simply compress the dynamic range of the output of Step 2.
If you know the response function of the camera, simply apply it, to get the result, f(p(q)).
If you don't know the response function of the camera, simply apply a good guess.
If you squared the image pixels in step 1, now is the time to take the square root of each image pixel to get back to your guess regarding imagespace.
Step 4: You Might Want to Try Some Other Variations
Deblurring is just one of many possible examples. Consider, for example, the combining of multiple exposures.
Take any two pictures such as the two I have above. One was taken during the day, and the other at night.
Combine them to make a dusk-like picture.
If you just average them together it looks like garbage. Try this yourself!
But if you first expand the dynamic range of each image, then add them, and then compress the dynamic range of the sum, it looks great.
Compare image processing (adding the images) with quantimetric image processing (expanding, adding, and then compressing).
You can download my code and more example material from here: http://wearcam.org/ece516/pnmpwadd/
Step 5: Going Further: Now Try It With HDR Image Composites
(Above image: HDR welding helmet uses quantimetric image processing for augmented reality overlays. See Slashgear 2012 September 12.)
In summary:
capture an image, and apply the following steps:
- expand the dynamic range of the image;
- process the image;
- compress the dynamic range of the result.
And if you want an even better result, try the following:
capture a plurality of differently exposed images;
- expand the dynamic range into lightspace, as per my previous Instructable on HDR;
- process the resulting quantimetric image, q, in lightspace;
- compress the dynamic range through tonemapping.
Have fun and please click "I made it" and post your results, and I'll be happy to comment or provide some constructive help.