How to read a bitmap in OCAML? - image-processing

I want to read a bitmap file (from the file system) using OCAML and store the pixels (the colors) inside an array which have th dimension of the bitmap, each pixel will take one cell in the array.
I found the function Graphics.dump_image image -> color array array
but it doesn't read from a file.

CAMLIMAGE should do it. There is also a debian package (libcamlimage-ocmal-dev), as well as an installation through godi, if you use that to manage your ocaml packages.
As a useful example of reading and manipulating images in ocaml, I suggest looking over the code for a seam removal algorithm over at eigenclass.
You can also, as stated by jonathan --but not well-- call C functions from ocaml, such as ImageMagick. Although you're going to do a lot of manipulation of the image data to bring the image into ocaml, you can always write c for all your functions to manipulate the image as an abstract data type --this seems to be completely opposite of what you want though, writing most of the program in C not ocaml.
Since I recently wanted to play around with camlimages (and had some trouble installing it --I had to modify two of the ml files from compilation errors, very simple ones though). Here is a quick program, black_and_white.ml, and how to compile it. This should get someone painlessly started with the package (especially, dynamic image generation):
let () =
let width = int_of_string Sys.argv.(1)
and length = int_of_string Sys.argv.(2)
and name = Sys.argv.(3)
and black = {Color.Rgb.r = 0; g=0; b=0; }
and white = {Color.Rgb.r = 255; g=255; b=255; } in
let image = Rgb24.make width length black in
for i = 0 to width-1 do
for j = 0 to (length/2) - 1 do
Rgb24.set image i j white;
done;
done;
Png.save name [] (Images.Rgb24 image)
And to compile,
ocamlopt.opt -I /usr/local/lib/ocaml/camlimages/ ci_core.cmxa graphics.cmxa ci_graphics.cmxa ci_png.cmxa black_and_white.ml -o black_and_white
And to run,
./black_and_white 20 20 test1.png

I don't know of an out-of-the box way to do it. You could open the file with open_in and read it byte at a time with input_char, suck in the header and the data and build up the color array array that way for simple formats (e.g. BMPs) but for anything like JPGs or PNGs a roll your-own solution would probably be more work than you want to get into.

You could also use one of the numerous SDL bindings for OCaml, specifically the SDL_image ones, which let you load all kinds of images easily, and provides functions to access individual pixels and raw data as an array.
OCamlSDL is a popular one.

If you don't want to use CAMLIMAGE, usually raw RGB or PNM/PPM (which have an easy to create header format followed by RGB values) images are used. ImageMagick allows you to then view this formats or convert them into more usable formats.

Related

Mathematical Operations on an Image Stack in ImageJ (Fiji)

I am writing an imageJ/Fiji plugin in Jython using the pydev plugin in eclipse.The plugin will be the ImageJ version of an already existing denoising software called CANDLE written as a matlab program. Changing the value of every pixel(voxel) of an image in matlab is trivial:
InputImage = 2 * sqrt(InputImage + (3/8));
Median3DFilteredImage = 2 * sqrt(Median3DFiltered + (3/8));
Here "InputImage" and "Median3DFilteredImage" are 3D Matrices, with the last dimension being time (slices). To reproduced the following operation on an ImageJ image, I had to employ two for loops, one to iterate through the image slices (3rd dimension) and the other loop to iterate over all the pixels in a particular slice:
medFiltStack = medianFilteredImage.getStack()
newMedFiltStack = ImageStack(medianFilteredImage.width, medianFilteredImage.height)
InputStack = InputImage.getStack()
newInputStack = ImageStack(InputImage.width, InputImage.height)
for i in xrange(1 , medianFilteredImage.getNSlices() + 1):
ip = medFiltStack.getProcessor(i).convertToFloat()
ip2 = InputStack.getProcessor(i).convertToFloat()
pixels = ip.getPixels()
pixels2 = ip2.getPixels()
for j in xrange (len(pixels)):
pixels[j] = 2 * javaMath.sqrt(pixels[j] + (3.0/8.0) )
pixels2[j] = 2 * javaMath.sqrt(pixels2[j] + (3.0/8.0) )
newMedFiltStack.addSlice(ip)
newInputStack.addSlice(ip2)
medianFilteredImage = ImagePlus("MedianFiltered-Image", newMedFiltStack)
InputImage = ImagePlus("Input-Image", newInputStack)
My question is as follows: Is there a way to perform mathematical operations on an image Stack, i.e. on every pixel (voxel) in the image stack, without having to write code that explicitly visits every pixel in every slice of the image, i.e. for loops. It just seems to be a very primitive way of going about it and I am wondering if there isn't an optimal way of doing this operation. I also had to work with copies and then gave the new images the same names as before as opposed to working with the original images and editing them directly. So is there a way to edit the pixel values of the original images rather than copies of the images? Any help would be appreciated as there are plenty of more math operations that I have to perform. It would be super useful to find a way to do mathematical operations on images in an optimal way both in terms of the amount of code and if possible, in terms of speed.
In pure ImageJ 1.x, the answer is: no, there's no other way than to visit every slice and get its ImageProcessor. That's the way how ImageJ1 deals with its limited number of dimensions (z, time, channel), you always have a (Hyper-)Stack of 2D planes.
There is however a more powerful way of dealing with n-dimensional images called ImgLib, which is included into Fiji together with ImageJ2.
To avoid re-inventing the wheel, you should have a look a Jean-Yves Tinevez's great plugin Image Expression Parser. Use it headlessly with Fiji, or just have look at its source code (it uses a previous version though, ImgLib1, but the idea is the same: you avoid hard-coding the dimensions by using Java generics), see e.g. for the sqrt function:
public final <R extends RealType<R>> float evaluate(final R alpha) {
return (float) Math.sqrt(alpha.getRealDouble());
}

OpenCV imwrite increases the size of png image

I am doing image manipulation on the png images. I have the following problem. After saving an image with imwrite() function, the size of the image is increased. For example previously image is 847KB, after saving it becomes 1.20 MB. Here is a code. I just read an image and then save it, but the size is increased. I tried to set compression params but it doesn't help.
Mat image;
image = imread("5.png", -1);
vector<int> compression_params;
compression_params.push_back(CV_IMWRITE_PNG_COMPRESSION);
compression_params.push_back(9);
compression_params.push_back(0);
imwrite("output.png",image,compression_params);
What could be a problem? Any help please.
Thanks.
PNG has several options that influence the compression: deflate compression level (0-9), deflate strategy (HUFFMAN/FILTERED), and the choice (or strategy for dynamically chosing) for the internal prediction error filter (AVERAGE, PAETH...).
It seems OpenCV only lets you change the first one, and it hasn't a good default value for the second. So, it seems you must live with that.
Update: looking into the sources, it seems that compression strategy setting has been added (after complaints), but it isn't documented. I wonder if that source is released. Try to set the option CV_IMWRITE_PNG_STRATEGY with Z_FILTERED and see what happens
See the linked source code for more details about the params.
#Karmar, It's been many years since your last edit.
I had similar confuse to yours in June, 2021. And I found out sth which might benefit others like us.
PNG files seem to have this thing called mode. Here, let's focus only on three modes: RGB, P and L.
To quickly check an image's mode, you can use Python:
from PIL import Image
print(Image.open("5.png").mode)
Basically, when using P and L you are attributing 8 bits/pixel while RGB uses 3*8 bits/pixel.
For more detailed explanation, one can refer to this fine stackoverflow post: What is the difference between images in 'P' and 'L' mode in PIL?
Now, when we use OpenCV to open a PNG file, what we get will be an array of three channels, regardless which mode that
file was saved into. Three channels with data type uint8, that means when we imwrite this array into a file, no matter
how hard you compress it, it will be hard to beat the original file if it was saved in P or L mode.
I guess #Karmar might have already had this question solved. For future readers, check the mode of your own 5.png.

How do I choose a pixel format when creating a new Texture2D?

I'm using the SharpDX Toolkit, and I'm trying to create a Texture2D programmatically, so I can manually specify all the pixel values. And I'm not sure what pixel format to create it with.
SharpDX doesn't even document the toolkit's PixelFormat type (they have documentation for another PixelFormat class but it's for WIC, not the toolkit). I did find the DirectX enum it wraps, DXGI_FORMAT, but its documentation doesn't give any useful guidance on how I would choose a format.
I'm used to plain old 32-bit bitmap formats with 8 bits per color channel plus 8-bit alpha, which is plenty good enough for me. So I'm guessing the simplest choices will be R8G8B8A8 or B8G8R8A8. Does it matter which I choose? Will they both be fully supported on all hardware?
And even once I've chosen one of those, I then need to further specify whether it's SInt, SNorm, Typeless, UInt, UNorm, or UNormSRgb. I don't need the sRGB colorspace. I don't understand what Typeless is supposed to be for. UInt seems like the simplest -- just a plain old unsigned byte -- but it turns out it doesn't work; I don't get an error, but my texture won't draw anything to the screen. UNorm works, but there's nothing in the documentation that explains why UInt doesn't. So now I'm paranoid that UNorm might not work on some other video card.
Here's the code I've got, if anyone wants to see it. Download the SharpDX full package, open the SharpDXToolkitSamples project, go to the SpriteBatchAndFont.WinRTXaml project, open the SpriteBatchAndFontGame class, and add code where indicated:
// Add new field to the class:
private Texture2D _newTexture;
// Add at the end of the LoadContent method:
_newTexture = Texture2D.New(GraphicsDevice, 8, 8, PixelFormat.R8G8B8A8.UNorm);
var colorData = new Color[_newTexture.Width*_newTexture.Height];
_newTexture.GetData(colorData);
for (var i = 0; i < colorData.Length; ++i)
colorData[i] = (i%3 == 0) ? Color.Red : Color.Transparent;
_newTexture.SetData(colorData);
// Add inside the Draw method, just before the call to spriteBatch.End():
spriteBatch.Draw(_newTexture, new Vector2(0, 0), Color.White);
This draws a small rectangle with diagonal lines in the top left of the screen. It works on the laptop I'm testing it on, but I have no idea how to know whether that means it's going to work everywhere, nor do I have any idea whether it's going to be the most performant.
What pixel format should I use to make sure my app will work on all hardware, and to get the best performance?
The formats in the SharpDX Toolkit map to the underlying DirectX/DXGI formats, so you can, as usual with Microsoft products, get your info from the MSDN:
DXGI_FORMAT enumeration (Windows)
32-bit-textures are a common choice for most texture scenarios and have a good performance on older hardware. UNorm means, as already answered in the comments, "in the range of 0.0 .. 1.0" and is, again, a common way to access color data in textures.
If you look at the Hardware Support for Direct3D 10Level9 Formats (Windows) page you will see, that DXGI_FORMAT_R8G8B8A8_UNORM as well as DXGI_FORMAT_B8G8R8A8_UNORM are supported on DirectX 9 hardware. You will not run into compatibility-problems with both of them.
Performance is up to how your Device is initialized (RGBA/BGRA?) and what hardware (=supported DX feature level) and OS you are running your software on. You will have to run your own tests to find it out (though in case of these common and similar formats the difference should be a single digit percentage at most).

How to save CV_32F type CV::Mat to a file without loosing precision?

I'm using cv::PCA class for a face recognition project. I convert photos of faces to one row vectors, concatenate them to one big array and feed to pca, to acquire a new space in which I can try to use distance for recognition. Problem is, that calculating the pca from scratch each time I start the program is really time consuming (almost five minutes). I figured out that I need to save the calculated pca to hard drive, and load it when I start the program again. And here is the problem. As I can see, all cv::Mat objects in cv::PCA are of type CV_32F. When i try to save it as a normal picture, its converted to 8 bit image, and there is some data lost. When i use XML/YAML persistence, the generated file is really big, and data is also lost (I have saved it, loaded to another structure and ran cerr<<sum(pca_orginal.mean==pca_loaded.mean)[0]<<endl to check how big is the difference). Right now I'm trying to use std::ofstream::write with std::ofstream::binary flag, and istream::read, but there are some type issues (out.write(_pca.mean.data,_pca.mean.rows*_pca.mean.cols*4/*CV_32F->4*CV_8U*/\); generates error: no matching function for call to ‘std::basic_ofstream<char, std::char_traits<char> >::write(uchar*&, int). I've also heard about openexr library and it's file format, but I would rather avoid using additional libraries. I'm using OpenCV 2.3.1 and OpenCV 2.2.
edit:
I'm sorry for the confusion. I misread cv::Mat operator== description, and thought that it works the opposite way that it does, so sum(pca_orginal.mean==pca_loaded.mean)[0] giving 0 is the worse possible result, not the best. It means that XML/YML works fine apart from generating huge files. Also, after using c-style casting I was able to make the binary streams work, but the files generated are also big (over 150MB).
In the C interface, there are functions cvSave and cvLoad for saving arbitrary matrices. There are probably C++ interface counterparts, too.

Using existing tools, how can I extract into separate images the Luma, Cb, Cr channels of a JPEG image?

I am seeking a method to extract into separate images the Luma (Y), Cb (blue component), Cr (red component), channels of the JPEG images:
Seattle Police Department image #1
Seattle Police Department image #2
Seattle Police Department image #3
I would like results equivalent to this example from Wikipedia.
The output must be calculated directly from the JPEG Start-of-Scan (SOS) data and other data in the JPEG, rather than 'back calculated' from the RGB images output by a decompressor. The purpose of this task is to produce images which represent the 'raw data' as closely as possible.
Are there existing tools which can accomplish this? I am considering throwing together something using Python, PyImage, etc. but I am surprised my search for open source or free tools has come up empty. I am aware there are many libraries which could help, although I am open to becoming aware of more libraries.
For this question, the correct answer would be a tool chain of free and/or open-source tools which can do the job. Tools with source are preferred. These tools can run on any platform, but Linux or Win32 would be immediately useful.
Answer inspired by codelogic
Given the libjpeg implementation, change djpeg.c and wrppm.c.
wrppm.c:
189: case JCS_RGB:
190: + case JCS_YCbCr:
191: /* emit header for raw PPM format */
djpeg.c
560: case FMT_PPM:
561: + cinfo.quantize_colors = 0;
562: + cinfo.out_color_space = JCS_YCbCr;
Obviously, this is quick hack, because I have a private copy where PPM output is always forced to YCbCr, but it works and I thank you, codelogic, for your Stone Code Logic.
As suggested your best bet would be to use libjpeg directly. Specifically, you might be able to set jpeg_decompress_struct's out_color_space member to be JCS_YCbCr instead of JCS_RGB and read the scanlines as usual. Here's some sample code (GPL).
Well the obvious one is libjpg.

Resources