How to save CV_32F type CV::Mat to a file without loosing precision? - opencv

I'm using cv::PCA class for a face recognition project. I convert photos of faces to one row vectors, concatenate them to one big array and feed to pca, to acquire a new space in which I can try to use distance for recognition. Problem is, that calculating the pca from scratch each time I start the program is really time consuming (almost five minutes). I figured out that I need to save the calculated pca to hard drive, and load it when I start the program again. And here is the problem. As I can see, all cv::Mat objects in cv::PCA are of type CV_32F. When i try to save it as a normal picture, its converted to 8 bit image, and there is some data lost. When i use XML/YAML persistence, the generated file is really big, and data is also lost (I have saved it, loaded to another structure and ran cerr<<sum(pca_orginal.mean==pca_loaded.mean)[0]<<endl to check how big is the difference). Right now I'm trying to use std::ofstream::write with std::ofstream::binary flag, and istream::read, but there are some type issues (out.write(_pca.mean.data,_pca.mean.rows*_pca.mean.cols*4/*CV_32F->4*CV_8U*/\); generates error: no matching function for call to ‘std::basic_ofstream<char, std::char_traits<char> >::write(uchar*&, int). I've also heard about openexr library and it's file format, but I would rather avoid using additional libraries. I'm using OpenCV 2.3.1 and OpenCV 2.2.
edit:
I'm sorry for the confusion. I misread cv::Mat operator== description, and thought that it works the opposite way that it does, so sum(pca_orginal.mean==pca_loaded.mean)[0] giving 0 is the worse possible result, not the best. It means that XML/YML works fine apart from generating huge files. Also, after using c-style casting I was able to make the binary streams work, but the files generated are also big (over 150MB).

In the C interface, there are functions cvSave and cvLoad for saving arbitrary matrices. There are probably C++ interface counterparts, too.

Related

Caffe mean file creation without database

I run caffe using an image_data_layer and don't want to create an LMDB or LevelDB for the data, But The compute_image_mean tool only works with LMDB/LevelDB databases.
Is there a simple solution for creating a mean file from a list of files (the same format that image_data_layer is using)?
You may notice that recent models (e.g., googlenet) do not use a mean file the same size as the input image, but rather a 3-vector representing a mean value per image channel. These values are quite "immune" to the specific dataset used (as long as it is large enough and contains "natural images").
So, as long as you are working with natural images you may use the same values as e.g., GoogLenet is using: B=104, G=117, R=123.
The simplest solution is to create a LMDB or LevelDB database of the image set.
The complicated solution is to write a tool similar to compute_image_mean, which takes image inputs and do the transformations and find the mean!

Tesseract on iOS - bad results

After spending over 10 hours to compile tesseract using libc++ so it works with OpenCV, I've got issue getting any meaningful results. I'm trying to use it for digit recognition, the image data I'm passing is a small square (50x50) image with either one or no digits in it.
I've tried using both eng and equ tessdata (from google code), the results are different but both get guess 0 digits. Using eng data I get '4\n\n' or '\n\n' as a result most of the time (even when there's no digit in the image), with confidence anywhere from 1 to 99.
Using equ data I get '\n\n' with confidence 0-4.
I also tried binarizing the image and the results are more or less the same, I don't think there's a need for it though since images are filtered pretty good.
I'm assuming that there's something wrong since the images are pretty easy to recognize compared to even simplest of the example images.
Here's the code:
Initialization:
_tess = new TessBaseAPI();
_tess->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");
_tess->SetVariable("tessedit_char_whitelist", "0123456789");
_tess->SetVariable("classify_bln_numeric_mode", "1");
Recognition:
char *text = _tess->TesseractRect(imageData, (int)bytes_per_pixel, (int)bytes_per_line, 0, 0, (int)imageSize.width, (int)imageSize.height);
I'm getting no errors. TESSDATA_PREFIX is set properly and I've tried different methods for recognition. imageData looks ok when inspected.
Here are some sample images:
http://imgur.com/a/Kg8ar
Should this work with the regular training data?
Any help is appreciated, my first time trying tessarect out and I could have missed something.
EDIT:
I've found this:
_tess->SetPageSegMode(PSM_SINGLE_CHAR);
I'm assuming it must be used in this situation, tried it but got the same results.
I think Tesseract is a bit overkill for this stuff. You would be better off with a simple neural network, trained explicitly for your images. At my company, recently we were trying to use Tesseract on iOS for an OCR task (scanning utility bills with the camera), but it was too slow and inaccurate for our purposes (scanning took more than 30 seconds on an iPhone 4 at a tremendously low FPS). At the end, I trained a neural-network specifically for our target font, and this solution not only beat Tesseract (it could scan stuff flawlessly even on an iPhone 3Gs), but also a commercial ABBYY OCR engine, which we were given a sample from the company.
This course's material would be a good start in machine learning.

Unsupported format or combination of formats when using cv::reduce method in OpenCV

I am using OpenCV 2.4.2 and I am trying to take projections of two matrices (tmpl(32x44), subj(32x44)) along row and column. I have initialised a result matrix as rowProjectionSubj(subj.rows,1,CV_8UC1) Then I call cv::reduce(subj,rowProjectionSubj,1,CV_REDUCE_SUM,-1);
Why is this complaining about the type mismatch? I have kept the types same (by keeping dtype=-1 in cv::reduce. I get the tmpl and subj objects by doing cv::imread("image_path",0) i.e. scanning grayscale images in.
I might not be right, but after I saw this:
http://answers.opencv.org/question/3698/cvreduce-gives-unsupported-format-exception/?answer=3701#post-id-3701
and with a little experiment and using an old friend called "register math", I realised that when you add two 8-bit numbers, you need to consider a 8+1+1 bit register to store the sum because it potentially has carry output. so any result of reduce should have bigger space than the source i.e. if the source is 8-bit unsigned, it should be at least 16-bit unsigned or signed; might as well be 32-bit if it is going to be used for some product calculation and stuff...
NOTE: The destination type must be EXPLICITLY stated in the cv::reduce method. Please follow my openCV link for further information.

Interpolation and Morphing of an image in labview and/or openCV

I am working on an image manipulation problem. I have an overhead projector that projects onto a screen, and I have a camera that takes pictures of that. I can establish a 1:1 correspondence between a subset of projector coordinates and a subset of camera pixels by projecting dots on the screen and finding the centers of mass of the resulting regions on the camera. I thus have a map
proj_x, proj_y <--> cam_x, cam_y for scattered point pairs
My original plan was to regularize this map using the Mathscript function griddata. This would work fine in MATLAB, as follows
[pgridx, pgridy] = meshgrid(allprojxpts, allprojypts)
fitcx = griddata (proj_x, proj_y, cam_x, pgridx, pgridy);
fitcy = griddata (proj_x, proj_y, cam_y, pgridx, pgridy);
and the reverse for the camera to projector mapping
Unfortunately, this code causes Labview to run out of memory on the meshgrid step (the camera is 5 megapixels, which apparently is too much for labview to handle)
I then started looking through openCV, and found the cvRemap function. Unfortunately, this function takes as its starting point a regularized pixel-pixel map like the one I was trying to generate above. However, it made me hope that functions for creating such a map might be available in openCV. I couldn't find it in the openCV 1.0 API (I am stuck with 1.0 for legacy reasons), but I was hoping it's there or that someone has an easy trick.
So my question is one of the following
1) How can I interpolate from scattered points to a grid in openCV; (i.e., given z = f(x,y) for scattered values of x and y, how to fill an image with f(im_x, im_y) ?
2) How can I perform an image transform that maps image 1 to image 2, given that I know a scattered mapping of points in coordinate system 1 to coordinate system 2. This could be implemented either in Labview or OpenCV.
Note: I am tagging this post delaunay, because that's one method of doing a scattered interpolation, but the better tag would be "scattered interpolation"
So this ends up being a specific fix for bugs in Labview 8.5. Nevertheless, since they're poorly documented, and I've spent a day of pain on them, I figure I'll post them so someone else googling this problem will come across it.
1) Meshgrid bombs. Don't know when this was fixed, definitely a bug in 8.5. Solution: use the meshgrid-like function on the interpolation&extrapolation pallet instead. Or upgrade to LV2009 which apparently works (thanks Underflow)
2) Griddata is defective in 8.5. This is badly documented. The 8.6 upgrade notes say that a problem with griddata and the "cubic" setting, but it is fact also a problem with the DEFAULT LINEAR setting. Solutions in descending order of kludginess: 1) pass 'v4' flag, which does some kind of spline interpolation, but does not have bugs. 2) upgrade to at least version 8.6. 3) Beat the ni engineers with reeds until they document bugs properly.
3) I was able to use the openCV remap function to do the actual transformation from one image to another. I tried just using the matlab built in interp2 vi, but it choked on large arrays and gave me out of memory errors. On the other hand, it is fairly straightforward to map an IMAQ image to an IPL image, so this isn't that bad, except for the addition of the outside library.

.VTX File Format?

I've recently taken the plunge into DirectX and have been messing around a little with Anim8or, and have discovered several file types that models can be exported to that are text based. I've particularly taken to VTX files. I've learned how to parse some basics out of it, but I'm obviously missing a few things.
It starts with a .Faceset with is immediately (on the same line) followed by the number of meshes in the file.
For each mesh, there is one .Vertex section and one .Index section in that order and the first pair of .Vertex/.Index sections are the first mesh, the second set are the second mesh and so on as you'd expect.
In a .Vertex section of the file, there's 8 numbers per line and an undefined number of lines (unless you want to trust the comments Anim8or has put just before the section, but that doesn't seem to be part of the specs of the file, just Anim8or being kind). The first 3 numbers correspond to X, Y, and Z coordinates for a particular point that'll later be used as a vertex, the other 5 I have no idea. A majority of the time, the last 2 numbers are both 0, but I've noticed that's not ALWAYS true, just usually true.
Next comes the matching .Index section. This section has 4 numbers. The first 3 are reference numbers to the Vertexes previously stated and the 3 points mark a triangle in the model. 0 meaning the first mentioned Vertex, 1 meaning the next one, and so on, like a zero-based array. The 4th number appears to always be -1, I can't figure out what importance it has and I can't promise it's ALWAYS -1. In case you can't tell, I'm not too certain about anything in this file type.
There's also other information in the file that I'm choosing to ignore right now because I'm new and don't want to overcomplicate things too much. Such as after every .Index section is:
.Brdf
// Ambient color
0.431 0.431 0.431
// Diffuse color
0.431 0.431 0.431
// Specular color and exponent
1 1 1 2
// Kspecular = 0.5
// end of .Brdf
It appears to me this is about the surface of the mesh just described. But it's not needed for placement of meshes so I moved past it for now.
Moving on to the real problem... I can load a VTX file when there's only one mesh in the VTX file (meaning the .FaceSet is 1). I can almost successfully load a VTX file that has multiple meshes, each mesh is successfully structured, but not properly placed in relation to the other meshes. I downloaded an AT-AT model from an Anim8or thread in a forum and it's made up of 344 meshes, when I load the file just using the specs I've mentioned so far, it looks like the AT-AT is exploded out as if it were a diagram of how to make it (when loaded in Anim8or, all pieces are close and resemble a fully assembled AT-AT). All the pieces are oriented correctly and have the same up direction, but there's plenty of extra space between the pieces.
Does somebody know how to properly read a VTX file? Or know of a website that'll explain what those other numbers mean?
Edit:
The file extension .VTX is used for a lot of different things and has a lot of different structures depending on what the expected use is. Valve, Visio, Anim8or, and several others use VTX, I'm only interested in the VTX file that Anim8or exports and the structure that it uses.
I have been working on a 3D Modeling program myself and wanted a simple format to be able to bring objects in to the editor to be able to test the speed of my drawing routines with large sets of vertices and faces. I was looking for an easy one where I could get models quickly and found the .vtx format. I googled it and found your question. When I was unable to find the format on the internet, I played around and compared .OBJ exports with .vtx ones. (Maybe it was created just for Anim8or?) Here is what I found:
1) Yes, the vertices have eight numbers on each line. The first three are, as you guessed, the x, y, and z coordinates. The next three are the vertex normals, nx, ny, and nz. You may notice that each vertex appears multiple times with different normals for each face that contains it. The last two numbers are texture coordinates.
2) As for the faces, I reached the same conclusions as you did. The first three numbers are indices into the vertex list above. The last number does appear to always be -1. I am going to assume that it has something to do with the facing of the face. (e.g. facing in or out.) Since most models are created with the faces all facing appropriately, it stands to reason that this would be the same number for all of them.
3) One additional note: When comparing the .obj with the .vtx, I did notice that the positions of the vertices changed. This was also true when comparing with the .an8 file. This should not be a "HUGE" problem as long as they are all offset by the same amount in each vertex and every file. At least then it could be compensated for.
Have you considered using the .obj file format? It is text-based and is not extremely difficult to parse or understand. There is quite a bit of information about it online.
I am going to add that, after a few hours inspection, the vtx export in Anim8or seems to be broken. I experienced the same problem as you did that the pieces were not located properly. My assumption would be that anim8or exports these objects using the local coordinates for each mesh and not accounting for transformations that have been applied. I do also note that it will not IMPORT the vtx file...
Based on some googling, it seems you're at the wrong end of the pipeline. As I understand it: A VTX file is a Valve Proprietary File Format that is the result of a set of steps.
The final output of Studiomdl for each
Half-Life model is a group of files in
the gamedirectory/models folder ready
to be used by the Game Engine:
an .MDL
file which defines the structure of
the model along with animation,
bounding box, hit box, material, mesh
and LOD information,
a .VVD file which
stores position independent flat data
for the bone weights, normals,
vertices, tangents and texture
coordinates used by the MDL, currently
three separate types of VTX file:
.sw.vtx (Software),
.dx80.vtx (DirectX
8.0) and
.dx90.vtx (DirectX 9.0) which store hardware optimized material,
skinning and triangle strip/fan
information for each LOD of each mesh
in the MDL,
often a .PHY file
containing a rigid or jointed
(ragdoll) collision model, and
sometimes
a .ANI file for To do:
something to do with model animations
Valve
Now the Valve Source SDK may have some utilities in it to read VTX's (it seems to have the ability to make them anyway). Some people may have made 3rd party tools or have code to read them, but it's likely to not work on all files just cause it's a 3rd party format. I also found this post which might help if you haven't seen it before.

Resources