How to get color depth of DICOM pixel data in reliable way? - image-processing

A DICOM file may have either uncompressed pixel data or compressed pixel data. It's PhotometricInterpretation (0028,0004) can be MONOCHROME1/MONOCHROME2/RGB/PALETTE COLOR/YBR etc. There is also a Pydicom page about color space.
But from these pages or any other DICOM websites, it is not clear to me that how to get the color depth.
Is either BitsAllocated (0028,0100) or BitsStored (0028,0101) tag referring to color depth? Can its color depth be different than these two tag values?
How to get color depth of DICOM pixel data in reliable way?

Bits Stored is the number of bits that is used for the actual color or grayscale data, so it is at least related to the color depth. Bits Allocated is always a multiple of 8, as the data is always organized in bytes, where some of the upper bits may not be used for data (with the exception of Bit Data, where it is 1).
Getting the bit depth is not as straightforward as it may seem. While the number of bits used for the data can mostly be defined, the resolution of the data (e.g. the distance between adjacent values) may also depend on the Photometric Interpretation, and of course on the resolution provided by the modality itself.
The easiest case is monochrome data (Photometric Interpretation is MONOCHROME1 or MONOCHROME2), where the color depth is directly defined by Bits Stored typical values being 12, 14 or 16. The same is mostly true for RGB data (e.g. data originally recorded as RGB), and while it is true that Bits Stored can have different values for JPEG2000 encoded images as correctly mentioned by #kritzel_sw, I yet have to see any RGB data with Bits Stored different from 8. Update: I still haven't seen this, but found that RTDOSE images can have 32 Bits Stored.
For color data in the YBR color space (Photometric Interpretation is YBR_xxx) this is less clear. It somewhat depends on your definition of color depth. Given that the used color space is YBR instead of RGB, and the number of bits used for each component maybe different (for example in YBR_FULL_422, which is used for some JPEG compressed images, 2 channels our downsampled), the resulting image if converted into RGB (what is mostly done) uses 8 bits for each color component, but the actual number of possible values is less than 256 for that reason. So if your definition of color depth depends on the number of bits used per RGB channel, the answer would probably be 8 in this case, but if you define the color depth per YBR channel, the answer could be different and depends both on the Photometric Interpretation and Bits Stored.
A special case is the PhotometricInterpretation of PALETTE COLOR, where the possible colors are defined in the color table. In this case, the number of colors per color component is defined in the first value of the Palette Color Lookup Table Descriptor (0028,1101-1104), which is equal for all 3 tables (e.g. for the Red, Green and Blue components). The actual color depth has to be derived from that value.
Given all that the answer is probably: it depends. I'll also add the note by #kritzel_sw, that many of the IODs limit the degrees of freedom of how pixel data is encoded significantly, which will narrow down the possibilities for the color depth for any concrete type of images.
I'm interested if anybody has a more straightforward answer.

Related

Understanding NetPBM's PNM nonlinear RGB color space for converting to grayscale

I am trying to understand how to properly work with the RGB values found in PNM formats in order to inevitably convert them to Grayscale.
Researching the subject, it appears that if the RGB values are nonlinear, then I would need to first convert them to a linear RGB color space, apply my weights, and then convert them back to the same nonlinear color space.
There appears to be an expected format http://netpbm.sourceforge.net/doc/ppm.html:
In the raster, the sample values are "nonlinear." They are proportional to the intensity of the ITU-R Recommendation BT.709 red, green, and blue in the pixel, adjusted by the BT.709 gamma transfer function.
So I take it these values are nonlinear, but not sRGB. I found some thread topics around ImageMagick that say they might save them as linear RGB values.
Am I correct that PNM specifies a standard, but various editors like Photoshop or GIMP may or may not follow it?
From http://netpbm.sourceforge.net/doc/pamrecolor.html
When you use this option, the input and output images are not true Netpbm images, because the Netpbm image format specifies a particular color space. Instead, you are using a variation on the format in which the sample values in the raster have different meaning. Many programs that ostensibly use Netpbm images actually use a variation with a different color space. For example, GIMP uses sRGB internally and if you have GIMP generate a Netpbm image file, it really generates a variation of the format that uses sRGB.
Else where I see this http://netpbm.sourceforge.net/doc/pgm.html:
Each gray value is a number proportional to the intensity of the
pixel, adjusted by the ITU-R Recommendation BT.709 gamma transfer
function. (That transfer function specifies a gamma number of 2.2 and
has a linear section for small intensities). A value of zero is
therefore black. A value of Maxval represents CIE D65 white and the
most intense value in the image and any other image to which the image
might be compared.
BT.709's range of channel values (16-240) is irrelevant to PGM.
Note that a common variation from the PGM format is to have the gray
value be "linear," i.e. as specified above except without the gamma
adjustment. pnmgamma takes such a PGM variant as input and produces a
true PGM as output.
Most sources out there assume they are dealing with linear RGB and just apply their weights and save, possibly not preserving the luminance. I assume that any complaint renderer will assume that these RGB values are gamma compressed... thus technically displaying different grayscale "colors" than what I had specified. Is this correct? Maybe to ask it differently, does it matter? I know it is a loaded question, but if I can't really tell if it is linear or nonlinear, or how it has been compressed or expected to be compressed, will the image processing algorithms (binarization) be greatly effected if I just assume linear RGB values?
There may have been some confusion with my question, so I would like to answer it now that I have researched the situation much further.
To make a long story short... it appears like no one really bothers to re-encode an image's gamma when saving to PNM format. Because of that, since almost everything is sRGB, it will stay sRGB as opposed to the technically correct BT.709, as per the spec.
I reached out to Bryan Henderson of NetPBM. He held the same belief and stated that the method of gamma compression is not as import as knowing if it was applied or not and that we should always assume it is applied when working with PNM color formats.
To reaffirm the effect of that opinion in regard to image processing, please read "Color-to-Grayscale: Does the Method Matter in Image Recognition?", 2012 by Kanan and Cottrell. Basically if you calculate the Mean of the RGB values you will end up in one of three situations: Gleam, Intensity', or Intensity. After comparing the effects of different grayscale conversion formulas, taking into account when and how gamma correction was applied, he discovered that Gleam and Intensity' where the best performers. They differ only by when the gamma correction was added (Gleam has the gamma correction on the input RGB values, while Intensity' takes in linear RGB and applies gamma afterwords). Sadly you drop from 1st and 2nd place down to 8th when no gamma correction is added, aka Intensity. It's interesting to note that it was the simple Mean formula that worked the best, not one of the more popular grayscale formulas most people tout. All of that to say that if you use the Mean formula for converting PNM color to grayscale for image processing applications, you will ensure great performance since we can assume some gamma compression will have been applied. My comment about ImageMagick and linear values appears only to apply to their PGM format.
I hope that helps!
There is only one way good way to convert colour signal to greyscale: going to linear space and add light (and so colour intensities). In this manner you have effective light, and so you can calculate the brightness. Then you can "gamma" correct the value. This is the way light behave (linear space), and how the brightness was measured by CIE (by wavelength).
On television it is standard to build luma and then black and white images) from non-linear R,G,B. This is done because simplicity and the way analog colour television (NTSC and PAL) worked: black and white signal (for BW television) as main signal, and then adding colours (as subcarrier) to BW image. For this reason, the calculations are done in non linear space.
Video could use often such factors (on non-linear space), because it is much quick to calculate, and you can do it easily with integers (there are special matrix to use with integers).
For edge detection algorithms, it should not be important which method you are using: we have difficulty to detect edge with similar L or Y', so we do no care if computers have similar problem.
Note: our eyes are non linear on detecting light intensities, and with similar gamma as phosphors on our old televisions. For this reason using gamma corrected value is useful: it compress the information in a optimal way (or in "analog-TV" past: it reduce perceived noise).
So you if you want Y', do with non linear R',G',B'. But if you need real grey scale, you need to calculate real greyscale going to linear space.
You may see differences especially on mid-greys, and on purple or yellow, where two of R,G,B are nearly the same (and as maximum value between the three).
But on photography programs, there are many different algorithms to convert RGB to greyscale: we do not see the world in greyscale, so different weight (possibly non linear) could help to make out some part of image, which it is the purpose of greyscale photos (by remove distracting colours).
Note Rec.709 never specified the gamma correction to apply (the OETF on the standard is not useful, we need EOTF, and often one is not the inverse of the other, for practical reasons). Only on a successive recommendation this missing information were finally provided. But because many people speak about Rec.709, the inverse of OETF is used as gamma, which it is incorrect.
How to detect: classical yellow sun on blue sky, choosing yellow and blue with same L. If you see sun in grey image, you are transforming with non-linear space (Y' is not equal). If you do no see the sun, you transform linearly.

OpenCV erosion and dilation on colour images

Erosion on a binary image decreases the white regions, while dilation increases it. I tried the same on colour images using OpenCV and got similar results. I tried do erode/dilate on binary jpeg images. Due to lossy compression, the image had intensities in [0,5] and [250,255]. The results I found were interesting. Erosion causes the image to search for the smallest value within a structuring element and replace it. Dilation uses the largest value.
In case of colour images,how are colours considered to be smaller or larger? Do they indirectly convert values to gray, see the intensity and then decide which is larger? Or do they use the mean of the three colours? A third possibility is that they erode/dilate separately on all three colours(R,G,B). Which one of these methods is used?
These morphological operations are uneasy to define for color images as colors convey a vector information (three components) and cannot be compared as smaller/larger.
The common implementations just treat the color planes independently. This has the disadvantage of having no good mathematical justification and introduces colors that aren't present in the original image.
Another option is possible, but nowhere in use, it seems: if you choose one arbitrary color, you can dilate/erode by choosing the color of the pixel which is closest/farthest from the chosen one, in the neighborhoods considered.
Each of R,G and B channels are processed separately.
From the manual (emphasis mine):
The function dilates the source image using the specified structuring
element that determines the shape of a pixel neighborhood over which
the maximum is taken ... The
function supports the in-place mode. Dilation can be applied several (
iterations ) times. In case of multi-channel images, each channel is
processed independently.

How to add a colour palette to b/w image?

I have noticed in some tile based game engines, tiles are saved as grayscale or sometimes even black or white, and the colour is then added through storing a 'palette' along with it to apply to certain pixels however i've never seen how it knows which pixels.
Just to name a few engines i've seen use this, Notch's Minicraft and the old Pokemon games for Gameboy. This is what informed me of how a colour palette is used in old games: deconstructulator
From the little i've seen of people use this technique in tutorials it uses a form of bit-shifting however i'd like to know how that was so efficient that it was next to mandatory in old 8-bit consoles - how it is possible to apply red, green and blue to specific pixels of an image every frame instead of saving the whole coloured image (some pseudo-code would be nice).
The efficient thing about it is that it saves memory. Storing the RGB values usually requires 24 bits (8 bits per channel). While having a palette of 256 colors (requiring 256*24 bits = 768 bytes) each pixel requires just 8 bits (2^8 = 256 colors). So three times as many pixels can be stored in the same amount of memory (if you don't count the needed palette), but with a limitied set of colors obviously. This used to be supported in hardware so the graphics memory could be more efficiently used too (well this is actually still supported in modern PC hardware but almost never used since the graphics memory isn't that limited anymore).
Some hardware (including the Super Gameboy supported by the first Pokemon games) used more than one hardware palette at a time. Different sets of tiles is mapped to different palettes. But the way the tiles are mapped to different palettes is very hardware dependent and often not very straight forward.
The way the bits of the image is stored isn't always so straight forward either. If the pixels are 8 bits it can be an array of bytes where every byte simply is one pixel (as for the classic VGA mode used in many old DOS games). But the Gameboy for example uses 2 bits per pixel and others used 4, that is 2^4 = 16 colors per palette. A common way to arrange the bits in memory is by using bitplanes. That is (in the case of 16 color graphics) storing 4 separate b/w images. But the bitplanes can in some cases also be interleaved in different ways. So there is no simple and generic answer to how to decode/encode graphics this way. So I guess you have to be more specific what you want pseudo code for.
And I'm not sure how any of this applies to Minicraft. Maybe just how the graphics is stored on disk. But it has no significance once the graphics is loaded into graphics memory. (Maybe you have some other feature of Minicraft in mind?)

Understanding just what is an image

I suppose the simplest understanding of what a (bitmap) image is would be an array of pixels. After that, it gets pretty technical.
I've been trying to understand the sort of information that an image may provide and have come across a large collection of technical terms like "mipmap", "pitch", "stride", "linear", "depth", as well as other format-specific things.
These seem to pop up across a lot of different formats so it'd probably be useful to understand what purpose they serve in an image. Looking at the DDS, BMP, PNG, TGA, JPG documentations has only made it clear that an image is pretty confusing.
Though searching around for some hours, there wasn't any nice tutorial-like break-down of just what an image is and all of the different properties.
The eventual goal would be to take proprietary image formats and convert them to more common formats like DDS or BMP. Or to make up some image format.
Any good readings?
Even your simplified explanation of an image doesn't encompass all the possibilities. For example an image can be divided by planes, where the red pixel values are all together followed by the green pixel values, followed by the blue pixel values. Such layouts are uncommon but still possible.
Assuming a simple layout of pixels you must still determine the pixel format. You might have a paletted image where some number of bits (1, 4, or 8) will be an index into a palette or color table which will define the RGB color of the pixel along with the transparency of the pixel (one index will typically be reserved as a transparent pixel). Otherwise the pixel will be 3 or 4 bytes depending on whether a transparency or alpha value is included. The order of the values (R,G,B) or (B,G,R) will depend on the format - Windows bitmaps are B,G,R while everything else will most likely be R,G,B.
The stride is the number of bytes between rows of the image. Windows bitmaps for example will take the width of the image times the number of bytes per pixel and round it up to the next multiple of 4 bytes.
I've never heard of DDA, and BMP is only common in the Windows world (and there's a lot more computing in the non-windows world than you might think). Rather than worry about all of the technical details of this, why not just use an existing toolkit such as image magick, which can already batch convert from dozens of formats to your one common format?
Unless you're doing specialized work, where you would need something fancy like hdr (which most image formats don't even support -- so most of your sources would not have it in the first place), you're probably best off picking something standard like PNG or JPG. They both have plusses and minuses. You might want to support both of those depending on the image.

RAW Image file - what format is my data in?

I'm working on processing .raw image files, but I'm not sure how the image is being stored. Each pixel is a unsigned 16-bit value, with typical values ranging from 0 to about 1000 (in integer form). This isnt enough bits for hex values, and its not RGB (0-255) so I'm not quite sure what it is.
Bonus: if you have any idea on how to convert this to grayscale in OpenCV (or just mathematically) that would be a huge help too.
The name RAW comes from the fact that the values stored in the file are not pixel RGB values, but the raw values that were measured from the camera itself. The values have meaning only if you know how the camera works. There are some standards, but really, you should just consider RAW to be a collection of poorly defined, undocumented, proprietary formats that probably won't intuitively match any idea you have about how images are stored.
Check out DCRaw -- it's the code that nearly every program that supports RAW uses
https://www.dechifro.org/dcraw/
The author reverse-engineered and implemented nearly every proprietary RAW format -- and keeps it up to date.
The other answers are correct, RAW is not a standard, it's shorthand. Camera CCDs often do not have separate red, green and blue pixels for each pixel, instead, they will use what's called a Bayer Pattern and then only save the pixel values for that pattern. Then, you will need to convert that pattern to rgb values.
Also, for the bonus question, if you are simply trying to convert a RGB image to grayscale, or something like that, you can either use the matrix operators, or call convertTO
Forgot what the R/G/B of 16-bit was:
"there can be 5 bits for red, 6 bits for green, and 5 bits for blue"
http://en.wikipedia.org/wiki/Color_depth#16-bit_direct_color
Seen it used in game code before.
Complete shot in the dark though being as there are also proprietary RAW formats.

Resources