Tamura's coarseness textural feature - image-processing

I am making used of a Tamura texture feature extraction from a library (JFeatureLib).
Tamura feature is an approach that explores texture representation from a different angle since it is motivated by the psychological studies on human visual perception of textures. One of its most important feature is coarseness (the other being brightness and contrast).
I am having difficulties with understanding the real meaning of the coarseness feature. From the literature I found that a coarse texture consists of a small number of large primitives, while a fine texture contains a large amount of small primitives. A more precise definition might be:
The higher the coarseness value is, the rougher is the texture. If
there are two different textures, one macro texture of high coarseness
and one micro texture of low coarseness, the macro texture is
However, I cannot see any relation between the coarseness value and the roughness of the image.
Example: in my opinion the coarseness value of the images below should increase from left to right. However, I am getting the following coarseness values:
1.93155, 3.23740, 2.40476, 3.11979 (left to right).
I am finding it quite strange that coarseness_2 is higher than coarseness_3 and coarseness_4. Even worst, for the following images I am getting the coarseness values (almost the complete opposite):
7.631, 8.821, 9.0664, 10.564 (left to right)
I tested with several other other images..these are just two of them.
I know that my interpretation coarseness may be too literal, but again Tamura is said to extract (unlike many other features) in a way that correspond to the human visual system. Am I misunderstanding the real meaning of coarseness or is it a problem of accuracy with the Tamura feature that I am using?

Coarseness has a direct relationship to scale and repetition rates and was
seen by Tamura et al as the most fundamental texture feature. An image will
contain textures at several scales; coarseness aims to identify the largest size at which a texture exists, even where a smaller micro texture exists.


Type of graph cut algorithm for 3D reconstruction

I have read several papers on using graph cuts for 3D reconstruction and I have noticed that there seem to be two alternative approaches to posing this problem.
One approach is volumetric and describes a 3D region of voxels for which a graph cut is used to infer a binary labelling (contains object of interest or does not) for each voxel. Papers which take this approach include Multi-View Stereo via Volumetric Graph Cuts and Occlusion Robust Photo-Consistency and A Surface Reconstruction Using Global Graph Cut Optimization.
The second approach is 2D and seeks to label each pixel of a reference image with the depth of the 3D point that projects there. Papers which take this approach include Computing Visual Correspondence with Occlusions via Graph Cuts.
I want to understand the advantages/disadvantages of each method and which are the most significant when choosing which method to use. So far I understand that some advantages of the first approach are:
It is a binary problem, so is solvable exactly with Max-Flow algorithms.
Provides simple methods of modelling occlusion.
And some advantages of the second approach are:
Smaller neighbor set for each node of the graph.
Easier to model smoothness (but does it give better results?).
Additionally, I would be interested in which situations I would be better off choosing one representation or the other and why.
The most significant difference is the type of scenes the algorithms are typically used with, and the way they represent the 3D shape of the object.
Volumetric approaches perform best...
with a large number of images...
taken from different viewpoints, well distributed around the object,...
of a more or less compact "object" (e.g. an artifact, in contrast, for example, to an outdoor scene observed by a vehicle camera).
Volumetric approaches are popular for reconstructing "objects" (especially artifacts). Given sufficient views (i.e. images), the algorithms give a complete volumetric (i.e. voxel) representation of the object's shape. This can be converted to a surface representation using Marching Cubes or similar method.
The second type of algorithms you identified are called stereo algorithms, and graph cuts are just one of many methods of solving such problems. Stereo is best...
if you have only two images...
with a fairly narrow baseline (i.e. distance between cameras)
Generalizations to more than two images (with narrow baselines) exist, but most of the literature deals with the binocular (i.e. two image) case. Some algorithms generalize more easily to more views than others.
Stereo algorithms only give you a depth map, i.e. an image with a depth value for each pixel. This does not allow you to look "around" the object. There are, however, 3D reconstruction systems that start with stereo on image pairs and combine the depth maps in order to get a representation of the complete object, which is a non-trivial problem of its own right. Interestingly, this is often approached using a volumetric representation as an intermediate step.
Stereo algorithms can be and are often used to for "scenes", e.g. the road observed by a pair of cameras in a vehicle, or people in a room for 3D video conferencen.
Some closing remarks
For both stereo and volumetric reconstruction, graph cuts are just one of several methods to solve the problem. Stereo, for example, can also be formulated as a continuous optimization problem, rather than a discrete one, which implies other optimization methods for its solution.
My answer contains a bunch of generalizations and simplifications. It is not meant to be a definitive treatment of the subject.
I don't necessarily agree that smoothness is easier in the stereo case. Why do you think so?

Object Detection using OpenCV in C++

I'm currently working on a vision system for a UAV I am building. The goal of the system is to find target objects, which are rather well defined (see below), in a video stream that will be a 2-D flyover view of the ground. So far I have tried training and using a Haar-like feature based cascade, a la Viola Jones, to do the detection. I am training it with 5000+ images of the targets at different angles (perspective shifts) and ranges (sizes in the frame), but only 1900 "background" images. This does not yield good results at all, as I cannot find a suitable number of stages to the cascade that balances few false positives with few false negatives.
I am looking for advice from anyone who has experience in this area, as to whether I should: 1) ditch the cascade, in favor of something more suitable to objects defined by their outline and color (which I've read the VJ cascade is not).
2) improve my training set for the cascade, either by adding positives, background frames, organizing/shooting them better, etc.
3) Some other approach I can't fathom currently.
A description of the targets:
Primary shapes: triangles, squares, circles, ellipses, etc.
Distinct, solid, primary (or close to) colors.
Smallest dimension between two and eight feet (big enough to be seen easily from a couple hundred feet AGL
Large, single alphanumeric in the center of the object, with its own distinct, solid, primary or almost primary color.
My goal is use something very fast, such as the VJ cascade, to find possible objects and their associated bounding box, and then pass these on to finer processing routines to determine the properties (color of the object and AN, value of the AN, actual shape, and GPS location). Any advice you can give me towards completing this goal would be much appreciated. The source code I currently have is a little lengthy for post here, but is freely available should you like to see it for reference. Thanks in advance!
I would recommend ditching Haar classification, since you know a lot about your objects. In these cases you should start by checking what features you can use:
1) overhead flight means, as you said, you can basically treat these as fixed shapes on a 2D plane. There will be scaling, rotations and some minor affine transformations, which depends a lot on how wide-angled your camera is. If it isn't particularly wide-angled, that part can probably be ignored. Also, you probably know your altitude, by which you can probably also make very good assumption on the target size (scaling).
2) You know the colors, which also makes it quite easy to find objects. If these are very defined as primary color, then you can just filter the image based on color and find those contours. If you want to do something a little more advanced (which to me doesn't seem necessary though...) you can do a backprojection, which in my experience is very effective and fast. Note, if you're creating the objects, it would be better to use Red Green and Blue instead of primary colors (red green and yellow). Then you can simply split the image into it's respective channels and use a very high threshold.
3) You know the geometric shapes. I've never done this myself, but as far as I know, the options are using moments or using Hough transforms (although openCV only has hough algorithms for lines and circles, so you'd have to write your own for other shapes...). You might already have sufficiently good results without this step though...
If you want more specific recommendations, it would be very useful to upload a couple sample images. :)
May be solved but I came across a paper recently with an open-source license for generic object detection using normalised gradient features : http://mmcheng.net/bing/comment-page-9/
The details of the algorithms performance against illumination, rotation and scale may require a little digging. I can't remember on the top of my head where the original paper is.

EMGU OpenCV disparity only on certain pixels

I'm using the EMGU OpenCV wrapper for c#. I've got a disparity map being created nicely. However for my specific application I only need the disparity values of very few pixels, and I need them in real time. The calculation is taking about 100 ms now, I imagine that by getting disparity for hundreds of pixel values rather than thousands things would speed up considerably. I don't know much about what's going on "under the hood" of the stereo solver code, is there a way to speed things up by only calculating the disparity for the pixels that I need?
First of all, you fail to mention what you are really trying to accomplish, and moreover, what algorithm you are using. E.g. StereoGC is a really slow (i.e. not real-time), but usually far more accurate) compared to both StereoSGBM and StereoBM. Those last two can be used real-time, providing a few conditions are met:
The size of the input images is reasonably small;
You are not using an extravagant set of parameters (for instance, a larger value for numberOfDisparities will increase computation time).
Don't expect miracles when it comes to accuracy though.
Apart from that, there is the issue of "just a few pixels". As far as I understand, the algorithms implemented in OpenCV usually rely on information from more than 1 pixel to determine the disparity value. E.g. it needs a neighborhood to detect which pixel from image A map to which pixel in image B. As a result, in general it is not possible to just discard every other pixel of the image (by the way, if you already know the locations in both images, you would not need the stereo methods at all). So unless you can discard a large border of your input images for which you know that you'll never find your pixels of interest there, I'd say the answer to this part of your question would be "no".
If you happen to know that your pixels of interest will always be within a certain rectangle of the input images, you can specify the input image ROIs (regions of interest) to this rectangle. Assuming OpenCV does not contain a bug here this should speedup the computation a little.
With a bit of googling you can to find real-time examples of finding stereo correspondences using EmguCV (or plain OpenCV) using the GPU on Youtube. Maybe this could help you.
Disclaimer: this may have been a more complete answer if your question contained more detail.

minimum texture image dimension for effective classification

Iam a beginner in image mining. I would like to know the minimum dimension required for effective classification of textured images. As what i feel if a image is too small feature extraction step will not extract enough features. And if the image size goes beyond a certain dimension the processing time will increase exponentially with image size.
This is a complex question that requires a bit of thinking.
Short answer: It depends.
Long answer: It depends on the type of texture you want to classify and the type of feature your classification is based on. If the feature extracted is, say, color only, you can use "texture" as small as 1x1 pixel (in that case, using the word "texture" is a bit of an abuse). If you want to classify, say for example characters, you can usually extract a lot of local information from edges (Hough transform, Gabor filters, etc). The image plane just have to be big enough to hold the characters (say 16x16 pixels for Latin alphabet).
If you want to be able to classify any kind of images in any kind of number, you can also base your classification on global information, like entropy, correlogram, energy, inertia, cluster shade, cluster prominence, color and correlation. Those features are used for content based image retrieval.
From the top of my head, I would try using texture as small as 32x32 pixels if the kind of texture you are using is a priori unknown. If on the contrary the kind of texture is a priori known, I would choose one or more feature that I know would classify the images according to my needs (1x1 pixel for color-only, 16x16 pixels for characters, etc). Again, it really depends on what you are trying to achieve. There isn't a unique answer to your question.

Algorithm for measuring the Euclidean distance between pixels in an image

I have a number of images where I know the focal length, pixel count, dimensions and position (from GPS). They are all in a high oblique manner, taken on the ground with commercially available cameras.
alt text http://desmond.yfrog.com/Himg411/scaled.php?tn=0&server=411&filename=mjbm.jpg&xsize=640&ysize=640
What would be the best method for calculating the euclidean distances between certain pixels within an image? If it is indeed possible.
Assuming you're not looking for full landscape modelling but a simple approximation then this shouldn't be too hard. Basically a first approximation of your image reduces to a camera with know focal length looking along a plane. So we can create a model of the system in 3D very easily - it's not too far from the classic observer looking over a checkerboard demo.
Normally our graphics problem would be to project the 3D model into 2D so we could render the image. Although most programs nowadays use an API (such as OpenGL) to do this the equations are not particularly complex or difficult to understand. I wrote my first code using the examples from 3D Graphics In Pascal which is a nice clear treatise, but there will be lots of other similar source (although probably less nowadays as a hardware API is invariably used).
What's useful about this is that the projection equations are commutative, in that if you have a point on the image and the model you can run the data back though the projection to retrieve the original 3D coordinates - which is what you wish to do.
So a couple of approaches suggest: either write the code to do the above yourself directly, or probably more simply use OpenGL (I'd recommend the GLUT toolkit for this). If your math is good and manipulating matrices causes you no issue then I'd recommend the former as the solution will be tighter and it's interesting stuff - otherwise take the OpenGL approach. You'd probably want to turn the camera/plane approximation into camera/sphere fairly early too.
If this isn't sufficient for your needs then in theory going to actual landscape modelling would be feasible. The SRTM data is freely available (albeit not in the friendliest of forms) so combined with your GPS position it should be possible to create a mesh model in with which you apply the same algorithms as above.
