RAW Image file - what format is my data in? - image-processing

I'm working on processing .raw image files, but I'm not sure how the image is being stored. Each pixel is a unsigned 16-bit value, with typical values ranging from 0 to about 1000 (in integer form). This isnt enough bits for hex values, and its not RGB (0-255) so I'm not quite sure what it is.
Bonus: if you have any idea on how to convert this to grayscale in OpenCV (or just mathematically) that would be a huge help too.

The name RAW comes from the fact that the values stored in the file are not pixel RGB values, but the raw values that were measured from the camera itself. The values have meaning only if you know how the camera works. There are some standards, but really, you should just consider RAW to be a collection of poorly defined, undocumented, proprietary formats that probably won't intuitively match any idea you have about how images are stored.
Check out DCRaw -- it's the code that nearly every program that supports RAW uses
https://www.dechifro.org/dcraw/
The author reverse-engineered and implemented nearly every proprietary RAW format -- and keeps it up to date.

The other answers are correct, RAW is not a standard, it's shorthand. Camera CCDs often do not have separate red, green and blue pixels for each pixel, instead, they will use what's called a Bayer Pattern and then only save the pixel values for that pattern. Then, you will need to convert that pattern to rgb values.
Also, for the bonus question, if you are simply trying to convert a RGB image to grayscale, or something like that, you can either use the matrix operators, or call convertTO

Forgot what the R/G/B of 16-bit was:
"there can be 5 bits for red, 6 bits for green, and 5 bits for blue"
http://en.wikipedia.org/wiki/Color_depth#16-bit_direct_color
Seen it used in game code before.
Complete shot in the dark though being as there are also proprietary RAW formats.

Related

How to get color depth of DICOM pixel data in reliable way?

A DICOM file may have either uncompressed pixel data or compressed pixel data. It's PhotometricInterpretation (0028,0004) can be MONOCHROME1/MONOCHROME2/RGB/PALETTE COLOR/YBR etc. There is also a Pydicom page about color space.
But from these pages or any other DICOM websites, it is not clear to me that how to get the color depth.
Is either BitsAllocated (0028,0100) or BitsStored (0028,0101) tag referring to color depth? Can its color depth be different than these two tag values?
How to get color depth of DICOM pixel data in reliable way?
Bits Stored is the number of bits that is used for the actual color or grayscale data, so it is at least related to the color depth. Bits Allocated is always a multiple of 8, as the data is always organized in bytes, where some of the upper bits may not be used for data (with the exception of Bit Data, where it is 1).
Getting the bit depth is not as straightforward as it may seem. While the number of bits used for the data can mostly be defined, the resolution of the data (e.g. the distance between adjacent values) may also depend on the Photometric Interpretation, and of course on the resolution provided by the modality itself.
The easiest case is monochrome data (Photometric Interpretation is MONOCHROME1 or MONOCHROME2), where the color depth is directly defined by Bits Stored typical values being 12, 14 or 16. The same is mostly true for RGB data (e.g. data originally recorded as RGB), and while it is true that Bits Stored can have different values for JPEG2000 encoded images as correctly mentioned by #kritzel_sw, I yet have to see any RGB data with Bits Stored different from 8. Update: I still haven't seen this, but found that RTDOSE images can have 32 Bits Stored.
For color data in the YBR color space (Photometric Interpretation is YBR_xxx) this is less clear. It somewhat depends on your definition of color depth. Given that the used color space is YBR instead of RGB, and the number of bits used for each component maybe different (for example in YBR_FULL_422, which is used for some JPEG compressed images, 2 channels our downsampled), the resulting image if converted into RGB (what is mostly done) uses 8 bits for each color component, but the actual number of possible values is less than 256 for that reason. So if your definition of color depth depends on the number of bits used per RGB channel, the answer would probably be 8 in this case, but if you define the color depth per YBR channel, the answer could be different and depends both on the Photometric Interpretation and Bits Stored.
A special case is the PhotometricInterpretation of PALETTE COLOR, where the possible colors are defined in the color table. In this case, the number of colors per color component is defined in the first value of the Palette Color Lookup Table Descriptor (0028,1101-1104), which is equal for all 3 tables (e.g. for the Red, Green and Blue components). The actual color depth has to be derived from that value.
Given all that the answer is probably: it depends. I'll also add the note by #kritzel_sw, that many of the IODs limit the degrees of freedom of how pixel data is encoded significantly, which will narrow down the possibilities for the color depth for any concrete type of images.
I'm interested if anybody has a more straightforward answer.

Understanding NetPBM's PNM nonlinear RGB color space for converting to grayscale

I am trying to understand how to properly work with the RGB values found in PNM formats in order to inevitably convert them to Grayscale.
Researching the subject, it appears that if the RGB values are nonlinear, then I would need to first convert them to a linear RGB color space, apply my weights, and then convert them back to the same nonlinear color space.
There appears to be an expected format http://netpbm.sourceforge.net/doc/ppm.html:
In the raster, the sample values are "nonlinear." They are proportional to the intensity of the ITU-R Recommendation BT.709 red, green, and blue in the pixel, adjusted by the BT.709 gamma transfer function.
So I take it these values are nonlinear, but not sRGB. I found some thread topics around ImageMagick that say they might save them as linear RGB values.
Am I correct that PNM specifies a standard, but various editors like Photoshop or GIMP may or may not follow it?
From http://netpbm.sourceforge.net/doc/pamrecolor.html
When you use this option, the input and output images are not true Netpbm images, because the Netpbm image format specifies a particular color space. Instead, you are using a variation on the format in which the sample values in the raster have different meaning. Many programs that ostensibly use Netpbm images actually use a variation with a different color space. For example, GIMP uses sRGB internally and if you have GIMP generate a Netpbm image file, it really generates a variation of the format that uses sRGB.
Else where I see this http://netpbm.sourceforge.net/doc/pgm.html:
Each gray value is a number proportional to the intensity of the
pixel, adjusted by the ITU-R Recommendation BT.709 gamma transfer
function. (That transfer function specifies a gamma number of 2.2 and
has a linear section for small intensities). A value of zero is
therefore black. A value of Maxval represents CIE D65 white and the
most intense value in the image and any other image to which the image
might be compared.
BT.709's range of channel values (16-240) is irrelevant to PGM.
Note that a common variation from the PGM format is to have the gray
value be "linear," i.e. as specified above except without the gamma
adjustment. pnmgamma takes such a PGM variant as input and produces a
true PGM as output.
Most sources out there assume they are dealing with linear RGB and just apply their weights and save, possibly not preserving the luminance. I assume that any complaint renderer will assume that these RGB values are gamma compressed... thus technically displaying different grayscale "colors" than what I had specified. Is this correct? Maybe to ask it differently, does it matter? I know it is a loaded question, but if I can't really tell if it is linear or nonlinear, or how it has been compressed or expected to be compressed, will the image processing algorithms (binarization) be greatly effected if I just assume linear RGB values?
There may have been some confusion with my question, so I would like to answer it now that I have researched the situation much further.
To make a long story short... it appears like no one really bothers to re-encode an image's gamma when saving to PNM format. Because of that, since almost everything is sRGB, it will stay sRGB as opposed to the technically correct BT.709, as per the spec.
I reached out to Bryan Henderson of NetPBM. He held the same belief and stated that the method of gamma compression is not as import as knowing if it was applied or not and that we should always assume it is applied when working with PNM color formats.
To reaffirm the effect of that opinion in regard to image processing, please read "Color-to-Grayscale: Does the Method Matter in Image Recognition?", 2012 by Kanan and Cottrell. Basically if you calculate the Mean of the RGB values you will end up in one of three situations: Gleam, Intensity', or Intensity. After comparing the effects of different grayscale conversion formulas, taking into account when and how gamma correction was applied, he discovered that Gleam and Intensity' where the best performers. They differ only by when the gamma correction was added (Gleam has the gamma correction on the input RGB values, while Intensity' takes in linear RGB and applies gamma afterwords). Sadly you drop from 1st and 2nd place down to 8th when no gamma correction is added, aka Intensity. It's interesting to note that it was the simple Mean formula that worked the best, not one of the more popular grayscale formulas most people tout. All of that to say that if you use the Mean formula for converting PNM color to grayscale for image processing applications, you will ensure great performance since we can assume some gamma compression will have been applied. My comment about ImageMagick and linear values appears only to apply to their PGM format.
I hope that helps!
There is only one way good way to convert colour signal to greyscale: going to linear space and add light (and so colour intensities). In this manner you have effective light, and so you can calculate the brightness. Then you can "gamma" correct the value. This is the way light behave (linear space), and how the brightness was measured by CIE (by wavelength).
On television it is standard to build luma and then black and white images) from non-linear R,G,B. This is done because simplicity and the way analog colour television (NTSC and PAL) worked: black and white signal (for BW television) as main signal, and then adding colours (as subcarrier) to BW image. For this reason, the calculations are done in non linear space.
Video could use often such factors (on non-linear space), because it is much quick to calculate, and you can do it easily with integers (there are special matrix to use with integers).
For edge detection algorithms, it should not be important which method you are using: we have difficulty to detect edge with similar L or Y', so we do no care if computers have similar problem.
Note: our eyes are non linear on detecting light intensities, and with similar gamma as phosphors on our old televisions. For this reason using gamma corrected value is useful: it compress the information in a optimal way (or in "analog-TV" past: it reduce perceived noise).
So you if you want Y', do with non linear R',G',B'. But if you need real grey scale, you need to calculate real greyscale going to linear space.
You may see differences especially on mid-greys, and on purple or yellow, where two of R,G,B are nearly the same (and as maximum value between the three).
But on photography programs, there are many different algorithms to convert RGB to greyscale: we do not see the world in greyscale, so different weight (possibly non linear) could help to make out some part of image, which it is the purpose of greyscale photos (by remove distracting colours).
Note Rec.709 never specified the gamma correction to apply (the OETF on the standard is not useful, we need EOTF, and often one is not the inverse of the other, for practical reasons). Only on a successive recommendation this missing information were finally provided. But because many people speak about Rec.709, the inverse of OETF is used as gamma, which it is incorrect.
How to detect: classical yellow sun on blue sky, choosing yellow and blue with same L. If you see sun in grey image, you are transforming with non-linear space (Y' is not equal). If you do no see the sun, you transform linearly.

How can I write a histogram-like kernel filter for CoreImage?

In the docs for Kernel Routine Rules, it says 'A kernel routine computes an output pixel by using an inverse mapping back to the corresponding pixels of the input images. Although you can express most pixel computations this way—some more naturally than others—there are some image processing operations for which this is difficult, if not impossible. For example, computing a histogram is difficult to describe as an inverse mapping to the source image.'
However, apple obviously is doing it somehow because they do have a CIAreaHistogram Core Image Filter that does just that.
I can see one theoretical way to do it with the given limitations:
Lets say you wanted a 256 element red-channel histogram...
You have a 256x1 pixel output image. The kernel function gets called for each of those 256 pixels. The kernel function would have to read EVERY PIXEL IN THE ENTIRE IMAGE each time its called, checking if that pixel's red value matches that bucket and incrementing a counter. When its processed every pixel in the entire image for that output pixel, it divides by the total number of pixels and sets that output pixel value to that calculated value. The problem is, assuming it actually works, this is horribly inefficient, since every input pixel is accessed 256 times, although every output pixel is written only once.
What would be optimal would be a way for the kernel to iterate over every INPUT pixel, and let us update any of the output pixels based on that value. Then the input pixels would each be read only once, and the output pixels would be read and written a total of (input width)x(input height) times altogether.
Does anyone know of any way to get this kind of filter working? Obviously there's a filter available from apple for doing a histogram, but I need it for doing a more limited form of histogram. (For example, a blue histogram limited to samples that have a red value in a given range.)
The issue with this is that custom kernel code in Core Image works like a function which remaps pixel by pixel. You don't actually have a ton of information to go off of except for the pixel that you are currently computing. A custom core image filter sort of goes like this
for i in 1 ... image.width
for j in 1 ... image.height
New_Image[i][j] = CustomKernel(Current_Image[i][j])
end
end
So actually, it's not really plausible to make your own histogram via custom kernels, because you literally do not have any control over the new image other than in that CustomKernel function that has been made. This is actually one of the reasons that CIImageProcessor was created for iOS10, you probably would have an easier time making a histogram via that function(and also producing other cool affects via image processing), and I suggest checking out the WWDC 2016 video on it ( Raw images and live images session).
IIRC, if you really want to make a histogram, it is still possible, but you will have to work with the UIImage version, and then convert the resulting image to an RBG image for which you can do the counting, and storing them in bins. I would recommend Simon Gladman's book on this, as he has a chapter devoted to histograms, but there is a lot more that goes into the core image default version because they have MUCH more control over the image than we do using the framework.

Client-side conversion of rgb-jpg to 8-bit-jpg using Canvas+HTML5

Many articles shows ways of converting jpeg files to grayscale using canvas+html5 at the client-side. But what I need is to convert an image to 8bit grayscale to reduce its size before uploading to my server.
Is it possible to do it using canvas+html5?
The whatwg specification mentions a toBlob method, which is supposed to convert the canvas to a jpeg or png and give you the binary representation. Unfortunately, it isn't widely supported yet.
So all you can do is use getImageData to get an array of the bytes of the raw image data. In this array, every pixel is represented by 4 bytes: red, green, blue and alpha. You can easily calculate the grayscale values from this (gray = (red + green + blue) / 3 * alpha / 255;). But the resulting array will be completely uncompressed, so it will likely be even larger than the original jpeg, even though it only uses 8 bit per pixel. In order to reduce the size, you will have to implement an image compression algorithm yourself. You might consider to use the DEFLATE algorithm used by PNG instead of JPEG encoding - it's a lot easier to implement, doesn't introduce further artifacts because it's lossless, and performs pretty well on 8bit images.
The boilerplate data to turn this compressed data stream into a vialid PNG/JPEG file should be added on the server (when you need it).

Understanding just what is an image

I suppose the simplest understanding of what a (bitmap) image is would be an array of pixels. After that, it gets pretty technical.
I've been trying to understand the sort of information that an image may provide and have come across a large collection of technical terms like "mipmap", "pitch", "stride", "linear", "depth", as well as other format-specific things.
These seem to pop up across a lot of different formats so it'd probably be useful to understand what purpose they serve in an image. Looking at the DDS, BMP, PNG, TGA, JPG documentations has only made it clear that an image is pretty confusing.
Though searching around for some hours, there wasn't any nice tutorial-like break-down of just what an image is and all of the different properties.
The eventual goal would be to take proprietary image formats and convert them to more common formats like DDS or BMP. Or to make up some image format.
Any good readings?
Even your simplified explanation of an image doesn't encompass all the possibilities. For example an image can be divided by planes, where the red pixel values are all together followed by the green pixel values, followed by the blue pixel values. Such layouts are uncommon but still possible.
Assuming a simple layout of pixels you must still determine the pixel format. You might have a paletted image where some number of bits (1, 4, or 8) will be an index into a palette or color table which will define the RGB color of the pixel along with the transparency of the pixel (one index will typically be reserved as a transparent pixel). Otherwise the pixel will be 3 or 4 bytes depending on whether a transparency or alpha value is included. The order of the values (R,G,B) or (B,G,R) will depend on the format - Windows bitmaps are B,G,R while everything else will most likely be R,G,B.
The stride is the number of bytes between rows of the image. Windows bitmaps for example will take the width of the image times the number of bytes per pixel and round it up to the next multiple of 4 bytes.
I've never heard of DDA, and BMP is only common in the Windows world (and there's a lot more computing in the non-windows world than you might think). Rather than worry about all of the technical details of this, why not just use an existing toolkit such as image magick, which can already batch convert from dozens of formats to your one common format?
Unless you're doing specialized work, where you would need something fancy like hdr (which most image formats don't even support -- so most of your sources would not have it in the first place), you're probably best off picking something standard like PNG or JPG. They both have plusses and minuses. You might want to support both of those depending on the image.

Resources