DCMTK: min-max-window vs min-max-window-n vs histogram-window - image-processing

I'd like to convert a set of DICOM instances (CT, MR, X-Ray basically) to JPEG (regular 8bits lossy grayscale). I have been starring at the options from dcmj2pnm:
dcmj2pnm: Convert DICOM images to PGM/PPM, PNG, TIFF, JPEG or BMP
I am quite confident that --use-window 1 is the right option when the input DICOM instance provide a single value for (0028,1050) Window Center & (0028,1051) Window Width. What I fail to understand is what is the "right" option to choose from (when WC/WW is missing):
--min-max-window,
--min-max-window-n (what does extreme values mean ?),
--histogram-window 5 (found ref here)
Are there any good rule of thumb when processing DICOM CT (Pixel Padding Value) images ? MR images ? X-Ray images ?

Being the author of this DCMTK tool, I should be able to answer your questions.
I am quite confident that --use-window 1 is the right option when the input DICOM instance provide a single value for (0028,1050) Window Center & (0028,1051) Window Width.
This is at least true if the stored VOI window is correct (i.e. appropriate for the stored pixel data).
What I fail to understand is what is the "right" option to choose from (when WC/WW is missing):
As a rule of thumb, --min-max-window (compute VOI window using min-max algorithm) usually creates good results. If there are extreme values in the image (very low and/or very high pixel values) that do not belong to the medical content, they can be ignored: option --min-max-window-n ignores the minimum and maximum values (and use the next lower/higher value in the pixel data as the borders of the VOI window), option --histogram-window [n] computes a histogram on the pixel values used in the image and ignores "n" percent of both the low and the high values (when computing the VOI window).

Related

OpenCV: How to use free scaling parameter (alpha) when dealing with getOptimalNewCameraMatrix and stereoRectify?

How to use free scaling parameter (alpha) when dealing with getOptimalNewCameraMatrix and stereoRectify : should one use the same value ?
As far as I understand it, I guess a few things that led me to this question are worth to be listed:
In getOptimalNewCameraMatrix, OpenCV doc says "alpha Free scaling parameter between 0 (when all the pixels in the undistorted image are valid) and 1 (when all the source image pixels are retained in the undistorted image)" [sounds to me like 1 = retain source pixels = minimize loss]
In stereoRectify, OpenCV doc says "alpha Free scaling parameter.... alpha=0 means that ... (no black areas after rectification). alpha=1 means that ... (no source image pixels are lost)
So in the end alpha, seems to be a parameter that may "act" the same way ? (1 = no source pixel lost - sounds like, not sure here)
As far as I understand it, after calibrateCamera, one may want to call getOptimalNewCameraMatrix (computing new matrices as outputs) and then stereoRectify (using new computed matrices as inputs) : do one want to use the same alpha?
Are these 2 alphas the same? Or does one want to use 2 different alphas?
The alphas are the same.
The choice of value depends entirely on the application. Ask yourself:
Does the application need to see all the input pixels to do its job (because, for example, it must use all the "extra" FOV near the image edges, or because you know that the scene's subject that's of interest to the application may be near the edges and you can't lose even a pixel of it)?
Yes: choose alpha=1
No: choose a value of alpha that keeps the "interesting" portion of
the image predictably inside the undistorted image.
In the latter case (again, depending on the application) you may need to compute the boundary of the undistorted image within the input one. This is just a poly-curve, that can be be approximated by a polygon to any level of accuracy you need, down to the pixel. Or you can use a mask.

Scaling images before doing conversion or vice versa?

I wonder which one among methods below should preserve more details of images:
Down scaling BGRA images and then converting them to NV12/YV12.
Converting BGRA images to NV12/YV12 images and then down scaling them.
Thanks for your recommendation.
Updated 2020-02-04:
For my question is more clear, I want to desribe a little more.
The images is come from a video stream like this:
Video Stream
-> decoded to YV12.
-> converted to BGRA.
-> stamped texts.
-> scaling down (or YV12/NV12).
-> YV12/NV12 (or scaling down).
-> H264 encoder.
-> video stream.
The whole sequence of tasks ranges from 300 to 500ms.
The issue I have is text stamped over the images after converted
and scaled looks not so clear. I wonder order at items: 4. then .5 or .5 then.4
Noting that the RGB data is very likely to be non-linear (e.g. in an sRGB format) ideally you need to
Convert from the non-linear "R'G'B'" data to linear RGB (Note this needs higher bit precision per channel) (see function spec on wikipedia)
Apply your downscaling filter
Convert the linear result back to non-linear R'G'B' (ie. sRGB)
Convert this to YCbCr/NV12
Ideally you should always do filtering/blending/shading in linear space. To give you an intuitive justification for this, the average of black (0) and white (255) in linear colour space will be ~128 but in sRGB this mid grey is represented as (IIRC) 186. If you thus do your maths in sRGB space, your result will look unnaturally dark/murky.
(If you are in a hurry, you can sometimes get away with just using squaring (and sqrt()) as a kludge/hack to convert from sRGB to linear (and vice versa))
For avoiding two phases of spatial interpolation the following order is recommended:
Convert RGBA to YUV444 (YCbCr) without resizing.
Resize Y channel to your destination resolution.
Resize U (Cb) and V (Cr) channels to half resolution in each axis.
The result format is YUV420 in the resolution of the output image.
Pack the data as NV12 (NV12 is YUV420 in specific data ordering).
It is possible to do the resize and NV12 packing in a single pass (if efficiency is a concern).
In case you don't do the conversion to YUV444, U and V channels are going to be interpolated twice:
First interpolation when downscaling RGBA.
Second interpolation when U and V are downscaled by half when converting to 420 format.
When downscaling the image it's recommended to blur the image before downscaling (sometimes referred as "anti-aliasing" filter).
Remark: since the eye is less sensitive to chromatic resolution, you are probably not going to see any visible difference (unless image has fine resolution graphics like colored text).
Remarks:
Simon answer is more accurate in terms of color accuracy.
In most cases you are not going to see the difference.
The gamma information is lost when converting to NV12.
Update: Regarding "Text stamped over the images after converted and scaled looks not so clear":
In case getting clear text is the main issue, the following stages are suggested:
Downscale BGRA.
Stamp text (using smaller font).
Convert to NV12.
Downsampling an image with stamped text, is going to result unclear text.
A better solution is to stamp a test with smaller font, after downscaling.
Modern fonts uses vectored graphics, and not raster graphics, so stamping text with smaller font gives better result than downscaled image with stamped text.
NV12 format is YUV420, the U and V channels are downscaled by a factor of x2 in each axis, so the text quality will be lower compared to RGB or YUV444 format.
Encoding image with text is also going to damage the text.
For subtitles the solution is attaching the subtitles in a separate stream, and adding the text after decoding the video.

How to choose the number of bins when creating HSV histogram?

I was reading some documentation about HSV histogram, and in several refs the Saturation channel was quantized into 256 values. Why is that? Is there any reason behind choosing this number?
I have the same questions for the Hue channel, often it is quantized into 180 values.
Disclaimer: Off-hand answers (i.e., not backed up by any documentation):
"256" is a popular number for a bin size because Programmers Like Round Numbers -- it fits in a single byte. And "180" because the HSB circle is "360 [degrees]", but "360" does not fit into a single byte.
For many image formats, the range of RGB values is limited to 0..255 per channel -- 3 bytes in total. To store the same amount of data (ignoring any artifacts of converting to another color model), Saturation and Brightness are often expressed in single bytes as well. The same could be done for Hue, by scaling the original range of 0..359 (as Hue is usually expressed as a value in degrees on the HSB Color Wheel) into the byte range 0..255. However, probably because it's easier to do calculations with a number close to the original 360° full circle, the range is clipped to 0..179. That way the value can be stored into a single byte (and thus "HSB" uses as much memory as "RGB") and can be converted trivially back to (close to) its original value -- multiply by 2. Obviously, sticking to the storage space wins over fidelity.
Given 256 values for both S and B, and 180 for H, you end up with a color space of 256*256*180 = 11,796,480 colors. To inspect the number of colors, you build a histogram: an array where you can read out the total amount of pixels in a certain color or color range. Using a color range here, instead of actual values, significantly cuts down the memory requirements.
For an RGB color image, with the colors fairly evenly distributed, you could shift down each channel a certain number of bits. This is how a straightforward conversion from 24-bit "true-color" RGB down to 15-bit RGB "high-color" space works: each channel gets divided by 8, reducing 256 values down to 32 (5 bits per channel). Conversion to a 16-bit high-color RGB space works the same; the bit that got left over in the 15-bit conversion is assigned to green. Thus, the range of colors for green is doubled, which is useful since the human eye is more perceptive for shades of green than for the other two primaries.
It gets more complicated when the colors in the input image are not evenly distributed. A naive solution is to create an array of [256][256][256], initialize all to zero, then fill the array with the colors of the image, and finally sort them. There are better alternatives -- let me consult my old Computer Graphics [1] here. Hold on.
13.4 Reproducing Color mentions the names of two different approaches from Heckbert (Color Image Quantization for Frame Buffer Display, SIGGRAPH 82): the popularity and the median-cut algorithms. (Unfortunately, that's all they say about this topic. I assume efficient code for both can be googled for.)
A rough guess:
The size for each bin (H,S,B) should be reflected by what you are trying to use it for. This older SO question, for example, uses a large bin for hue -- color is considered the most important -- and only 3 different values for both saturation and brightness. Thus, bright images with some subdued areas (say, a comic book) will give a good spread in this histogram, but a real-color photograph will not so much.
The main limit is that the bin sizes, multiplied with each other, should use a reasonably small amount of memory, yet cover enough of each component to get evenly filled. Perhaps some trial-and-error comes into play here. You could initially evenly distribute all of H, S, and B components over the available memory in your histogram and process a small part of the image; say, 1 out of 4 pixels, horizontally and vertically. If you notice one of the component bins fills up too fas where others stay untouched, adjust the ranges and restart.
If you need to do an analysis of multiple pictures, make sure they are all alike in their color gamut. You cannot expect a reasonable bin size to work on all sorts of images; you would end up with an evenly distribution, where all matches are only so-so.
[1] Computer Graphics. Principles and Practices. (1997) J.D. Foley, A. van Dam, S.K. Feiner, and J.F. Hughes, 2nd ed., Reading, MA: Addison-Wesley.

direction on image pattern description and representation

I have a basic question regarding pattern learning, or pattern representation. Assume I have a complex pattern of this form, could you please provide me with some research directions or concepts that I can follow to learn how to represent (mathematically describe) these forms of patterns? in general the pattern does not have a closed contour nor it can be represented with analytical objects like boxes, circles etc.
By mathematically describe I'm assuming you mean derive from the image a vector of values that represents the content of the image. In computer vision/image processing we call this an "image descriptor".
There are several image descriptors that could be applied to pixel based data of the form you showed, which appear to be 1 value per pixel i.e. greyscale images.
One approach is to perform "spatial gridding" where you divide the image up into a regular grid of a constant size e.g. a 4x4 grid. You then average the pixel values within each cell of the grid. Then concatenate these values to form a 16 element vector - this coarsely describes the pixel distribution of the image.
Another approach would be to use "image moments" which are 2D statistical moments. Use this equation:
where f(x,y) is they pixel value at coordinates (x,y). W and H are the image width and height. The mu_x and mu_y indicate the average x and y. The values i and j select the order of moment you want to compute. Various orders of moment can be combined in different ways for example in the "Hu moments" we can compute 7 numbers using combinations of image moments:
The cool thing about the Hu moments is you can scale, rotate, flip etc the image and you still get the same 7 values which makes this a robust ("affine invariant") image descriptor.
Hope this helps as a general direction to read more in.

Differences between gamma correction and exposure in image processing

Anyone know what is the difference between gamma and exposure? And what is the difference between gamma correction and exposure adjustment in image processing?
Since you don't have an image processing background i would start with a basics
1) Every digital image has a dynamic range of gray levels.Now gray levels are nothing but values which ultimately corresponds to a color. Say Mono-chrome image(Black and white image) has only 2 gray levels i.e. 0 and 1 where 0 means black and 1 means white color. Here the dynamic range is [0-1]. In these images each pixel is stored as a single bit.
Similarly there is Gray-scale images have shades of gray in them. Here each pixel is stored as 8-bit so dynamic range is [0-255]. How? just apply the formula (2^n -1) where n is number of bits. i.e. (2^8 - 1) i.e. 256-1 = 255.
Similarly there are color-images which are 24-bit images.In general the dynamic range of gray levels in image is given by [0 - L-1] where L is number of gray levels.
2) Now once you have understood what is dynamic range lets understand Gamma correction.Gamma correction is nothing but a function that compress the dynamic range of images so that we can view the image more nicely or properly. But why do we need to compress dynamic range? A best day to day example is during day time when we cannot see the stars, the reason is because the intensity of sun is so large as compared to the intensity of stars that we cannot see the stars in day time.Similarly when dynamic range is high in an image then that of the display device we cannot see the image properly. Therefore we can use gamma correction to compress the dynamic range of image
3) Gamma correction can be written as g(x,y) = c * f(x,y) ^ # where # is symbol of gamma (since i don't know how to write gamma symbol here, i have used #) and f(x,y) is original image with high dynamic range, g(x,y) is modified image. C is a positive constant.
4) Exposure as said earlier in an answer its phenomena in camera. I don't know much about it as it is not covered in the syllabus of image processing which i am currently studying.
Gamma correction is a non-linear global function that compresses certain ranges in your image. It is mainly used in order to be more efficient from human vision point of view, in fixed point format. It is absent in raw files, but exists in JPEG. Each pixel undergoes the following transformation:
y = x^p
Exposure is a physical phenomenon in your camera. Exposure adjustment on the other hand is linear global function. It is used mainly in order to compensate for lack or excess of exposure in the camera:
y = a*x
Exposure is an indication of the total quantity of light that reaches the CCD of your camera (or the silver ions on film). It can be expressed as the number of photons that hit your image-recording elements.
Films and CCD are calibrated to expect a certain quantity of light (certain number of photons) in order to be able to create an "average" image.
The higher the "expected" quantity of light, the lower the ISO number of your film (or camera setting) => in order to obtain a normal image, a film (or camera setting) of 100 ISO needs more light than a film of 3200 ISO, hence the use of 3200 ISO films for night photography.
next step: the camera thing. When you want to make a picture (= have photons hit your CCD or film), you need to open the diaphragm of your camera. Depending on how much you open your diaphragm, the nature of your image will change (speaking from an artistic point of view here). If your diaphragm is wide open, most of the image which is not perfectly in focus will be blurred (e.g. as used in portrait photography). Conversely, if your diaphragm is only a little bit open during exposure, most of your image will be very sharp. This is used very often for landscape photography.
As your film (or CCD) expect a certain quantity of light with a given ISO value, it is obvious that a smaller diaphragm opening requires longer exposure times whereas a wide open diaphraghm requires a very short time.
Good books about this subject are the series "The Camera", "The Negative" and "The Print" by Ansel Adams.
Conclusion: exposure and gamma correction are different things.
- Exposure is a part of the parameters you need to control while creating your initial image through the use of a camera.
- Gamma correction is related to subsequent manipulation of your image file. I'm not sure if the notion of "gamma correction" is being used in the context of film.
Basically:
Gamma is a monitor thing.
Exposure is a camera thing.

Resources