I am trying to create a new object detection model in autoML vision.
I followed this & this guides about how to prepare and format my training data.
For some unknown reason, there are a lot of missing bounding boxes when importing the data. For example, an image with 84 bounding boxes only loads 12 in autoML.
I have checked for the minimum bounding box size, maximum number of bounding boxes per image and maximum image size.
Is anybody else experiencing the same issues?
There are two additional requirements that you didn't mention in the description, and I'm adding them here, just in case.
Bounding box edge length: At least 0.01 * length of a side of an image. For example, a 1000 * 900 pixel image would require bounding boxes of at least 10 * 9 pixels
All the bounding boxes should be inside images
Also, there seems to be some limits regarding bounding boxes and labels in the UI; however, the limit is supposed to be 50, and 12 is really far from that.
If you are sure that your bounding boxes, as well as the CSV, satisfies all the requirements, I suggest to open a new Issue Tracker to the Cloud Vision team, so they can take a deeper look at your issue.
Related
I have approximately 100K X ray pics of dimension 1024x1024. Only ~970 of them have pre existing bounding box coordinates. I am training the model on 70:30 training and testing ratio. My question is, how do i train the model if the rest of the images do not have bounding box? Since I'm no medical expert, I can't manually draw a bounding box around the image. There are 14 classes and it gets really difficult to draw bounding box manually
If you have a knowledge about the remaining not labelled images, for example if you know if an image has a particular class you can use weakly supervised learning to train image detection on all of them
I am interested in detecting single object more precisely a fire extinguisher which has no inter class variability (all fire extinguisher looks same). However, The application is supposedly realtime i.e a robot is exploring the environment and whenever it sees the object of interest it should be able to detect it and give pixel coordinates of it.
My question is which algorithm will be good choice for this task?
1. Is this a classification problem and should we use features(sift/surf etc) + bow +svm?
2. some other solution (no idea yet).
Any kind of input will be appreciated.
Thanks.
(P.S bear with me i am newbie to computer vision and stack over flow)
update1:
Height varies all are mounted on the wall but with different height. I tried with SIFT features and bow but it is expensive to extract bow descriptors in testing part. Moreover I have no idea how to locate the object(pixel coordinates) inside the image after its been classified positive.
update 2:
I finally used sift + bow + svm and am able to classify the object. But using this technique, i only get output interms of whether the object is present in the scene or not?
How can i detect the object i.e getting the bounding box or centre of the object. what is the compatible approach with the above method for achieving these results.
Thank you all.
I would suggest using color as the main feature to look for, and only try other features as needed. The fire extinguisher red is very distinctive, and should not occur too often elsewhere in an office environment. Other, more computationally expensive tests can then be performed only in regions of the right color.
Here is a good tutorial for color detection that also explains how to find good thresholds for your desired color.
I would suggest the following approach:
denoise your image with a median filter
convert the image to HSV format (Hue, Saturation, Value)
select pixels close to that particular shade of red with InRange()
Now you have a binary image image that contains only the pixels that are red.
count the number of red pixels with CountNonZero()
If that number is too small, abort
remove noise from the binary image by morphological opening / closing
find contours of all blobs in your picture with findContours or the CvBlob library
check if there are blobs of the correct width, correct height and correct width/height ratio
since your fire extinguishers are vertical cylinders, the width/height ratio will be constant from every angle. The width and height will of course vary somewhat with distance to the camera.
if the width and height do not match, abort
repeat these steps to find the black-colored part on the bottom of the extinguisher,
abort if there is no black region with correct width/height below the red region
(perhaps also repeat these steps for the metallic top and the yellow rectangle)
These tests should all be very fast. If they are too slow, you could reduce the resolution of your input images.
Depending on your environment, it is possible that this is already a robust enough test. If not, you can proceed with sift/surf feature matching, but only in a small region around the blobs with the correct color. You also do not necessarily have to do that for each frame, each n-th frame should be be enough for confirmation.
This is a old question .. but will still like to give my recommendation to use YOLO algorithm to solve this problem.
YOLO fits very well to this scenario.
I am posting 3 images of my dataset to show how my image visually looks:
http://s1306.photobucket.com/user/Bidisha_Chakraborty/library/?page=1
I am using VLFFeat DSIFT implementation. I am using per descriptor 4 orientations instead of 8. So in my case it is 64 dimensional vector instead of 128. I am using the original scale for the image, since my image data does is originally taken from fixed distance. I am computing descriptors densely at 4/8 pixels interval. I have conducted several experiments by varying the window size from 80*80 pixels to 20*20 pixels. I did a clustering approach with various number of cluster centers. And finally I used earth mover's Distance to compute the similarity metric.
After various parameter tuning of window size, number of words, I see that even when I have nearly similar images like 1 and 3, the distance metric says image 1 is more similar to image 2 then image 1 to image 3.
I did Principal Component Analysis to see the variance of the data. I expected image 1 and image 2 to have separated clusters and image 1 and 3 to have overlapped clusters. Since I plotted first 3 dimensions and these 3 dimensions accounted for less than 30percentage of data, I am sure including all dimensions(which I of course could not visualize) will give worse results.
Should I conclude that SIFT is not the best thing for my application or I am missing out something. I already used GLCM for these and did not get a good result.
Any suggestion for any other feature space is most welcome.
thanks for any kind of insight.
In my application I am getting images (captured by a high speed camera) containing projections of some light sources on the screen.
1-My first task is to plot a PDF or intensity distribution plot for the light intensity, which should come as bell shape or Gaussian, since at the center the light intensity will be maximum and at the ends it will be diminishing. Like this(just for example, not the exact case for me):
In worst cases I will be having a series of light sources illuminated simultaneously. In such cases theoretically I should get overlapping bell or Gaussian curves, some what like this:
How do I plot such a curve given the Images of light projection (like the one in the figure)?
2-After the Gaussian curve is drawn, the next job is to analyze the same such as finding width and height of the curve. How do I go for this?
I want an executable for this application, so a solution given by MATLAB or similar tool is not acceptable to my client. Also i want the solution to work in real time or near real time.
I guess OpenCV can be used here. But before I start I would like to know opinions of Image processing gurus on this forum. Especially for the step -1 above, I need some inputs.
Any pointers here?
Rgrds,
Heshsham
Note: Image is taken from http://pentileblog.com.
To get the 1D Gaussian out of the 2D one, you can do a couple of things depending on what you want exactly.
- You could sum over every column of the image;
- You could find the local maximum in intensity and copy the intensity profile of that row of the image only;
- You could threshold the image (in case your maximum will be saturated and therefore a plateau), determine the center of gravity of the remaining blob, and copy that row's intensity profile;
- You could threshold, find contours, determine multiple local maxima, and grab multiple intensity profiles if the application calls for it (e.g. if the blobs are not horizontally aligned).
To get the height and width, it's pretty easy, just find the maximum and the points left and right of it where the curve drops to half of the maximum. The standard deviation is the distance between the two points divided by 2.35 (wikipedia link).
Well I solved it:
Algorithms is as follows:
1-use cvSampleLine for reading a particual line of image
2- use cvMinMaxLoc to know the maximum pixel value in a line
3- Note which of these lines is having highest pixel value. Lets say line no. 150
4- Plot pixel value for line 150.
I used MATLAB for verifying my results and graphs, and the OpenCV result is exactly the same.
Thanks for your suggestions guys.
First of all, I very much appreciate the help provided by the experts here at SO. The questions posed by many and answered by the experts has been of immense benefit to me. It had helped me with a very crucial problem few months back when I was a student doing my thesis.
Right now I am working on a problem to detect (and then recognize) numbers in a complex scene image. You can check out these images here: http://imageshack.us/g/823/dsc1757w.jpg/. These are pictures of marathon runners with their numbers on the front of their shirts. I have to detect all the numbers that appear in the image and then recognize them. The recognition wont be difficult as these appear to be OCR friendly characters. The crucial thing is how to detect these numbers.
I had an idea to first color filter it for black color. But when I tried in Matlab, the results were not encouraging, as we can see that many of the regions in the image qualify this criteria (the clothes, some shadows behind the runners, the shadows in the foliage, etc). Either I need to classify these characters from these other regions or need some other good technique.
There are papers available and I have gone through some of them, like the SWT, DWT, etc., but I have a feeling they wont be of much help. I was thinking some kind of training algorithm might be useful. There is another reason for this, in future there might be other photos with possibly different fonts, etc., so I think a dedicated algorithmic approach might fail. Can anyone point me in the right direction?
I am not a novice in image processing, but not an expert either. So, any and all help/suggestion in this regard will be greatly appreciated :) .
Thanks,
MD
You know that your problem is not a simple one, but it seems very interesting!
Although I don't have any solutions for you, I will just share my thoughts in hope that you can make something out of it.
Let's take 2 of your photos as examples:
Photo-A: http://imageshack.us/photo/my-images/59/dsc0275a.jpg/
It shows a single person with a relative "big" green label with numbers in his shirt.
Photo-B: http://imageshack.us/photo/my-images/546/dsc0243u.jpg/
It shows a lot of people with red smaller labels in their shirts.
(The labels' height in pixels is about 1/5 of the label in Photo-A)
Considering the above photos, I will try to write some random thoughts which may help...
(a) Define your scale: There is no point to apply a search algorithm to find labels from 2x2 pixels up-to the full image resolution. You must define the minimum/maximum limits for width & height of a label. Those limits may depend on many different factors:
(1) One factor is the real size of labels (defined by the distance of people from camera) which can be defined as a percentage of the image width & height.
(2) Another factor is the actual reading accurracy of the OCR you are going to use. If the numbers' image height is smaller than Y1 pixels or bigger than Y2 pixels the OCR will not be able to read it (it sounds strange but it's true: big images may seem very clear to the human eye, but an OCR may have problems reading it).
(b) Find the area(s) of interest: In your case, this is equivalent to "Find the approximate position of labels". We can define an athlete label roughly as "An (almost) rectangular area, which may be a bit inclined relative to photo borders, and contains: A central area of black + color C1 [e.g. red or green] + a white (=neutral) area on top and/or bottom of it".
A possible algorithm to find the approximate position of a label is:
(1) Traverse all image left-to-right, top-to-bottom and examine a square area of MinHeight/2 x MinHeight/2
(2) Create the histogram of the square area (or posterize it e.g. to 8 levels) and try to find if there is only Black + Another color C1 in a percentage of e.g. Black: 40% +/- 10, Color: 60% +/- 10%
(3) If (2) is true try to expand the area to Right and Bottom while the percentages are kept in the specified limits
(4) If the square is fully expanded, check if the expanded area size is inside the min/max limits of width/height you specified in (a). If not, go to step 1
(5) Process the expanded area to read the numbers - see (c) bellow
(6) Goto to step 1
(c) Process the area(s) of interest: Try the following steps:
(1) Convert each image-area to Grayscale by applying a color filter that burn Color C1 to white.
(2) Equalize the Grayscale to make the black letters stand-out
(3) If an inclination has been detected, perform a reverse rotation on the image-area to make the letters as horizontal as possible.
(4) Feed the area to an OCR trained only for numbers
Good luck with your project!
You could try to contact the author of this software:
Yaroslav is an active member of StackOverflow.