Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
A lot of research papers that I am reading these days just abstractly write image1-image2
I imagine they mean gray scale images. But how to extend these to color images ?
Do I take the intensities and subtract ? How would I compute these intensities by taking the average or by taking the weighted average as illustrated here?
Also I would prefer if you could quote the source of this as well preferably from a research paper or a textbook.
Edit: I am working on motion detection where there are tons of algorithms which create a background model of the video(image) and then we subtract the current frame(again a image) from this model. We see if this difference exceeds a given threshold in which case we classify the pixel as foreground pixel. So far I have been subtracting the intensities directly but don't know whether other approach is possible.
Subtraction directly at RGB space or after converting to grayscale space is possible to miss useful information, and at the same time induce many unwanted outliers. It is possible that you don't need the subtraction operation. By investigating the intensity difference between background and object at all three channels, you can determine the range of background at the three channels, and simply set them to zero. This study demonstrated such method is robust against non-salient motion (such as moving leaves) with the presence of shadows at various environments.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm learning image classification with Pytorch. I found some papers code use 'CenterCrop' to both train set and test set,e.g. Resize to larger size,then apply CenterCrop to obtain a smaller size. The smaller size is a general size in this research direction.
In my experience, I found apply CenterCrop can get a significant improvement(e.g. 1% or 2%) on test, compare to without CenterCrop on test set.
Because it is used in the top conference papers, confused me. So, Should CenterCrop be used to test set this count as cheating? In addition, should I use any data augmentation to test set except 'Resize' and 'Normalization'?
Thank you for your answer.
That is not cheating. You can apply any augmentation as long as the label is not used.
In image classification, sometimes people use a FiveCrop+Reflection technique, which is to take five crops (Center, TopLeft, TopRight, BottomLeft, BottomRight) and their reflections as augmentations. They would then predict class probabilities for each crop and average the results, typically giving some performance boost with 10X running time.
In segmentation, people also use similar test-time augmentation "multi-scale testing" which is to resize the input image to different scales before feeding it to the network. The predictions are also averaged.
If you do use such kind of augmentation, do report them when you compare with other methods for a fair comparison.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I understand that the the convert -unsharp from ImageMagick is using Unsharp Masking to sharpen the image. What kind of algorithm is behind convert -adaptive-sharpen? When I want to sharpen my lanscape images, which algorithm should I use? What are the advantages and disadvantages for the two algorithms?
I'm not an expert on the algorithm, but both operations achieve the same goal by creating a "mask" to scale the intensity of the sharpening. They differ on how the generate the "mask", and the arithmetic operations.
With -unsharp
Given...
For demonstration, let's break this down into channels.
Create a "mask" by applying a Gaussian blur.
Apply the gain of the inverse mask if threshold applies.
Ta-Da
With -adaptive-sharpen
Given...
For demonstration, let's break this down into channels (again).
Create "mask" by applying edge detection, and then Gaussian blur.
Apply sharpen, but scale the intensity against the above mask.
Fin
Which command will give the better results for normal outdoor images?
That depends on the subject matter. It's a good rule-of-thumb to use -adaptive-sharpen if the image contains large empty space (sky, sea, grass, &etc), or bokeh/blurred background. Else -unsharp will work just fine.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'd love to write a program that will take a scanned invoice (original is A4 paper, scanned as JPEG file (wrapped in a PDF), ~4000 pixels wide) and look for logotypes. If a logotype is found, the invoice file (PDF) will be tagged with those tags associated with the logotypes found in the invoice.
I expect 20 or so logotypes to look for, and about 2500 invoices (so yes, a pain to do manually).
My ideas are drawn towards OpenCV since I know that's used behind the scenes by Sikuli. I would only look for logos in certain areas, ie logo A should only be looked for in top left corner of every invoice, logo B top right etc. Dropping the JPG to monochrome with high contrast I assume would help too?
"20 or so logotypes" is a good number to use keypoints (corners, blobs etc) and it's descriptors(SIFT, SURF, FREAK etc) in find-nearest-neighbor-way. Steps are:
1 train
create a training set of logo (take it from your documents)
calculate a set of keypoints and it's descriptors for every logo
2 find
do picture equalization and noise filtering
find keypoints and it's descriptors
find best matching descriptors (find nearest neighbor) in you training set
find homography for matching keypoints position to be sure it is a complete logo but not just one accidental point
All this steps are implemented in openCV. But you will need some time to play with parameters to have the best solution. Anyway you have very low level of logo distortion so you will have high level of "True Positive" results and low level of "False Positive" ones.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am doing a project which is hole detection in road. I am using a laser to emit beam on the road and using a camera to take a image of the road. the image may be like this
Now i want to process this image and give a result that is it straight or not. if it curve then how big the curve is.
I dont understand how to do this. i have search a lot but cant find a appropriate result .Can any one help me for that?
This is rather complicated and your question is very broad, but lets have a try:
Perhaps you have to identify the dots in the pixel image. There are several options to do this, but I'd smoothen the image by a blur filter and then find the most red pixels (which are believed to be the centers of the dots). Store these coordinates in a vector array (array of x times y).
I'd use a spline interpolation between the dots. This way one can simply get the local derivation of a curve touching each point.
If the maximum of the first derivation is small, the dots are in a line. If you believe, the dots belong to a single curve, the second derivation is your curvature.
For 1. you may also rely on some libraries specialized in image processing (this is the image processing part of your challenge). One such a library is opencv.
For 2. I'd use some math toolkit, either octave or a math library for a native language.
There are several different ways of measuring the straightness of a line. Since your question is rather vague, it's impossible to say what will work best for you.
But here's my suggestion:
Use linear regression to calculate the best-fit straight line through your points, then calculate the mean-squared distance of each point from this line (straighter lines will give smaller results).
You may need to read this paper, it is so interesting one to solve your problem
As #urzeit suggested, you should first find the points as accurately as possible. There's really no way to give good advice on that without seeing real pictures, except maybe: try to make the task as easy as possible for yourself. For example, if you can set the camera to a very short shutter time (microseconds, if possible) and concentrate the laser energy in the same time, the "background" will contribute less energy to the image brightness, and the laser spots will simply be bright spots on a dark background.
Measuring the linearity should be straightforward, though: "Linearity" is just a different word for "linear correlation". So you can simply calculate the correlation between X and Y values. As the pictures on linked wikipedia page show, correlation=1 means all points are on a line.
If you want the actual line, you can simply use Total Least Squares.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I was given this question on a job interview and think I really messed up. I was wondering how others would go about it so I could learn from this experience.
You have one image from a surveillance video located at an airport which includes line of people waiting for check-in. You have to assess if the line is big/crowded and therefore additional clerks are necessary. You can assume anything that may help your answer. What would you do?
I told them I would try to
segment the area containing people from the rest by edge detection
use assumptions on body contour such as relative height/width to denoise unwanted edges
use color knowledges; but then they asked how to do that and I didn't know
You failed to mention one of the things that makes it easy to identify people standing in a queue — the fact that they aren't going anywhere (at least, not very quickly). I'd do it something like this (Warning: contains lousy Blender graphics):
You said I could assume anything, so I'll assume that the airport's floor is a nice uniform green colour. Let's take a snapshot of the queue every 10 seconds:
We can use a colour range filter to identify the areas of floor that are empty in each image:
Then by calculating the maximum pixel values in each of these images, we can eliminate people who are just milling around and not part of the queue. Calculating the queue length from this image should be very easy:
There are several ways of improving on this. For example, green might not be a good choice of colour in Dublin airport on St Patrick's day. Chequered tiles would be a little more difficult to segregate from foreground objects, but the results would be more reliable. Using an infrared camera to detect heat patterns is another alternative.
But the general approach should be fairly robust. There's absolutely no need to try and identify the outlines of individual people — this is really very difficult when people are standing close together.
I would just use a person detector, for example OpenCV's HOG people detection:
http://docs.opencv.org/modules/gpu/doc/object_detection.html
or latent svm with the person model:
http://docs.opencv.org/modules/objdetect/doc/latent_svm.html
I would count the number of people in the queue...
I would estimate the color of the empty floor, and go to a normalized color space (like { R/(R+G+B), G/(R+G+B) } ). Also do this for the image you want to check, and compare these two.
My assumption: where the difference is larger than a threshold T it is due to a person.
When this is happening for too much space it is crowded and you need more clerks for check-in.
This processing will be way more robust than trying to recognize and count individual persons, and will work with quite row resolution / low amount of pixels per person.