I need to prepare training data which I will then use with OpenCV's cascaded classifier. I understand that for training data I'll need to provide rectangular images as samples with aspect ratios that correspond to the -w and -h parameters in OpenCV's training commands.
I was fine with this idea, but then I saw web-based annotation tool LabelMe.
People have labelled in LabelMe using complex polygons!
Can these polygons be somehow used in cascaded training?
Wouldn't using irregular polygons improve the classification results?
If not, then what is the use of the complex polygons that outline objects in LabelMe'd images?

Data sets annotated with LabelMe are used for many different purposes. Some of them, like image segmentation, require tight boundaries, rather than bounding boxes.
On the other hand, the cascade classifier in OpenCV is designed to classify rectangular image regions. It is then used as part of a sliding-window object detector, which also works with bounding boxes.
Whether tight boundaries help improve object detection is an interesting question. There is evidence that the background pixels caught by the bounding box actually help the classification.


Labeling runways for localization and detection using deep learning

Shown above is a sample image of runway that needs to be localized(a bounding box around runway)
i know how image classification is done in tensorflow, My question is how do I label this image for training?
I want model to output 4 numbers to draw bounding box.
In CS231n they say that we use a classifier and a localization head.
but how does my model knows where are the runnway in 400x400 images?
In short How do I LABEL this image for training? So that after training my model detects and localizes(draw bounding box around this runway) runways from input images.
Please feel free to give me links to lectures, videos, github tutorials from where I can learn about this.
**********Not CS231n********** I already took that lecture and couldnt understand how to solve using their approach.
If you want to predict bounding boxes, then the labels are also bounding boxes. This is what most object detection systems use for training. You can just have bounding box labels, or if you want to detect multiple object classes, then also class labels for each bounding box would be required.
Collect data from google or any resources that contains only runway photos (From some closer view). I would suggest you to use a pre-trained image classification network (like VGG, Alexnet etc.) and fine tune this network with downloaded runway data.
After building a good image classifier on runway data set you can use any popular algorithm to generate region of proposal from the image.
Now take all regions of proposal and pass them to classification network one by one and check weather this network is classifying given region of proposal as positive or negative. If it classifying as positively then most probably your object(Runway) is present in that region. Otherwise it's not.
If there are a lot of region of proposal in which object is present according to classifier then you can use non maximal suppression algorithms to reduce number of positive proposals.

CNN Object Localization Preprocessing?

I'm trying to use a pretrained VGG16 as an object localizer in Tensorflow on ImageNet data. In their paper, the group mentions that they basically just strip off the softmax layer and either toss on a 4D/4000D fc layer for bounding box regression. I'm not trying to do anything fancy here (sliding windows, RCNN), just get some mediocre results.
I'm sort of new to this and I'm just confused about the preprocessing done here for localization. In the paper, they say that they scale the image to 256 as its shortest side, then take the central 224x224 crop and train on this. I've looked all over and can't find a simple explanation on how to handle localization data.
Questions: How do people usually handle the bounding boxes here?...
Do you use something like the tf.sample_distorted_bounding_box command, and then rescale the image based on that?
Do you just rescale/crop the image itself, and then interpolate the bounding box with the transformed scales? Wouldn't this result in negative box coordinates in some cases?
How are multiple objects per image handled?
Do you just choose a single bounding box from the beginning ,crop to that, then train on this crop?
Or, do you feed it the whole (centrally cropped) image, and then try to predict 1 or more boxes somehow?
Does any of this generalize to the Detection or segmentation (like MS-CoCo) challenges, or is it completely different?
Anything helps...
Localization is usually performed as an intersection of sliding windows where the network identifies the presence of the object you want.
Generalizing that to multiple objects works the same.
Segmentation is more complex. You can train your model on a pixel mask with your object filled, and you try to output a pixel mask of the same size

Object Recognition by Outlines vs Features

I have the RGB-D video from a Kinect, which is aimed straight down at a table. There is a library of around 12 objects I need to identify, alone or several at a time. I have been working with SURF extraction and detection from the RGB image, preprocessing by downscaling to 320x240, grayscale, stretching the contrast and balancing the histogram before applying SURF. I built a lasso tool to choose among detected keypoints in a still of the video image. Then those keypoints are used to build object descriptors which are used to identify objects in the live video feed.
SURF examples show successful identification of objects with a decent amount of text-like feature detail eg. logos and patterns. The objects I need to identify are relatively plain but have distinctive geometry. The SURF features found in my stills are sometimes consistent but mostly unimportant surface features. For instance, say I have a wooden cube. SURF detects a few bits of grain on one face, then fails on other faces. I need to detect (something like) that there are four corners at equal distances and right angles. None of my objects has much of a pattern but all have distinctive symmetric geometry and color. Think cellphone, lollipop, knife, bowling pin. My thought was that I could build object descriptors for each significantly different-looking orientation of the object, eg. two descriptors for a bowling pin: one standing up and one laying down. For a cellphone, one laying on the front and one on the back. My recognizer needs rotational invariance and some degree of scale invariance in case objects are stacked. Ability to deal with some occlusion is preferable (SURF behaves well enough) but not the most important characteristic. Skew invariance would be preferable and SURF does well with paper printouts of my objects held by hand at a skew.
Am I using the wrong SURF parameters to find features at the wrong scale? Is there a better algorithm for this kind of object identification? Is there something as readily usable as SURF that uses the depth data from the Kinect along with or instead of the RGB data?
I was doing something similar for a project, and ended up using a super simple method for object recognition, which was using OpenCV blob detection, and recognizing objects based on their areas. Obviously, there needs to be enough variance for this method to work.
You can see my results here:
I know there are other methods out there, one possible solution for you could be approxPolyDP, which is described here:
How to detect simple geometric shapes using OpenCV
Would love to hear about your progress on this!

Detecting object class using shape descriptors in computer vision

I want to differentiate between two classes of objects through the differences in the shape of blob(blob is in the form of binary image) using shape descriptors and machine learning .I want to ask if there is any good shape feature which I can use to detect the descriptors for the irregular contour or blob obtained ?
there is a large body of work associated with shape descriptors, these methods work on either the outer edge detected pixels (the boundary) or the full filled-in binary shape. Both approaches rely on making the shape descriptors invariant to translation, rotation and scaling, and some to skew. The classical boundary method is Fourier Descriptors and the classic filled in method is Moment Invariants, both are covered in most good image processing textbooks and are easy to implement with OpenCV.
The answer is very subjective on the kinds of shapes you are looking for. If the contours of the shapes are discriminative enough, you can try shape context. To classify shapes, feed in these features into any classifier -- SVM or random forests for instance.
If the shapes have consistently occuring corners, then you can extract the corners using FAST or SURF, and describe the regions around the corners using SIFT or SURF. In this case, shapes are best recognised by feature matching or bags of words.

OpenCV - Haar classifier for long objects with different angles

I have used Haar classifier with OpenCV before succesfully. Unfortunately it seems to work only on square objects and fixed angles (i.e. faces). However I need to find "long" (rectangular) objects which have different angles (see sample input image).
Is there a way to train Haar classifier to find such objects? All I can find are tutorials for face recognition. Any other alternative approches?
Haar classifiers are known to work with rigid object only. You need a classifier for each of the view. For example, the side-face classifier in OpenCV doesn't work as good as front-face classifer(due to the reason being, side face has more variation in yaw-pitch-roll than front face).
There is no perfect way of answering your question.
However, in your case whatever you are trying to classify (microbes I suppose) are overlapping on each other. Its a complex issue. But, you can isolate the region where microbes occur (not isolate each microbe like a face).
You can refer fingerprint segmentation techniques that are known to enhance the ridges on a fingerprint (here in your case its microbe edges) from the background and isolate the image.
Check "ridgesegmentation.m" in the following page:
