Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 10 months ago.
The community reviewed whether to reopen this question 10 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I've been looking around for some papers (or info) on this topic.
To avoid a misunderstanding: I'm not talking about finding a supplied pattern in multiple locations.
Repeating patterns can also be understood to mean repeating images. The definition of pattern here isn't abstract. Imagine, for instance, a brick wall. The wall is composed of individual bricks. A picture of the wall is composed of the repeating image of a brick.
The solution must preferably find the largest repeating pattern. Large in this context can be defined two ways: pixel area or number of repetitions.
In the above example, you can cut the bricks in half. In order to make a brick, you can rotate a segment and attach the halves. While the complete brick is the largest repeating image in terms of pixel area, there are 2x more repetitions of half blocks.
Any thoughts?
A number of methods come to mind:
Fourier Transformation of the image
Wavelet Analysis
Autocorrelation
I'd start with fourier analysis: Any shape repeating in a regular pattern in the image creates a very distinct spatial frequency spectrum: One major frequency and some harmonics.
I'm not sure if this is what you're looking for, but I suggest searching for "Texture based segmentation". Take a look at this bibliography, for example: http://www.visionbib.com/bibliography/segment366.html
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
The community reviewed whether to reopen this question 6 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm trying to implement this paper right now:
Automatic Skin and Hair Masking Using Convolutional Neural Networks
I've gotten the FCN and CRF part working, and I found the code to generate the alpha mask once I have the trimap.
I'm stuck on the part between (c) and (d), though.
How do I generate a trimap given the binary mask? The paper says:
We apply morphological operators on the binary segmentation
mask for hair and skin, obtaining a trimap that indicates
foreground (hair/skin), background and unknown pixels. In
order to deal with segmentation inaccuracies, and to best capture
the appearance variance of both foreground and background,
we first erode the binary mask with a small kernel,
then extract the skeleton pixels as part of foreground constrain
pixels. We also erode the binary mask with a larger kernel to
get more foreground constrain pixels. The final foreground
constrain pixels is the union of the two parts. If we only keep
the second part then some thin hair regions will be gone after
erosion with a large kernel. If a pixel is outside the dilated
mask then we take it as background constrain pixel. All other
pixels as marked as unknown, see figure 2 (d).
OpenCV supports morphological operations.
Please see this tutorial explaining how to use erode and dilate functions.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'd love to write a program that will take a scanned invoice (original is A4 paper, scanned as JPEG file (wrapped in a PDF), ~4000 pixels wide) and look for logotypes. If a logotype is found, the invoice file (PDF) will be tagged with those tags associated with the logotypes found in the invoice.
I expect 20 or so logotypes to look for, and about 2500 invoices (so yes, a pain to do manually).
My ideas are drawn towards OpenCV since I know that's used behind the scenes by Sikuli. I would only look for logos in certain areas, ie logo A should only be looked for in top left corner of every invoice, logo B top right etc. Dropping the JPG to monochrome with high contrast I assume would help too?
"20 or so logotypes" is a good number to use keypoints (corners, blobs etc) and it's descriptors(SIFT, SURF, FREAK etc) in find-nearest-neighbor-way. Steps are:
1 train
create a training set of logo (take it from your documents)
calculate a set of keypoints and it's descriptors for every logo
2 find
do picture equalization and noise filtering
find keypoints and it's descriptors
find best matching descriptors (find nearest neighbor) in you training set
find homography for matching keypoints position to be sure it is a complete logo but not just one accidental point
All this steps are implemented in openCV. But you will need some time to play with parameters to have the best solution. Anyway you have very low level of logo distortion so you will have high level of "True Positive" results and low level of "False Positive" ones.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
A lot of research papers that I am reading these days just abstractly write image1-image2
I imagine they mean gray scale images. But how to extend these to color images ?
Do I take the intensities and subtract ? How would I compute these intensities by taking the average or by taking the weighted average as illustrated here?
Also I would prefer if you could quote the source of this as well preferably from a research paper or a textbook.
Edit: I am working on motion detection where there are tons of algorithms which create a background model of the video(image) and then we subtract the current frame(again a image) from this model. We see if this difference exceeds a given threshold in which case we classify the pixel as foreground pixel. So far I have been subtracting the intensities directly but don't know whether other approach is possible.
Subtraction directly at RGB space or after converting to grayscale space is possible to miss useful information, and at the same time induce many unwanted outliers. It is possible that you don't need the subtraction operation. By investigating the intensity difference between background and object at all three channels, you can determine the range of background at the three channels, and simply set them to zero. This study demonstrated such method is robust against non-salient motion (such as moving leaves) with the presence of shadows at various environments.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I was given this question on a job interview and think I really messed up. I was wondering how others would go about it so I could learn from this experience.
You have one image from a surveillance video located at an airport which includes line of people waiting for check-in. You have to assess if the line is big/crowded and therefore additional clerks are necessary. You can assume anything that may help your answer. What would you do?
I told them I would try to
segment the area containing people from the rest by edge detection
use assumptions on body contour such as relative height/width to denoise unwanted edges
use color knowledges; but then they asked how to do that and I didn't know
You failed to mention one of the things that makes it easy to identify people standing in a queue — the fact that they aren't going anywhere (at least, not very quickly). I'd do it something like this (Warning: contains lousy Blender graphics):
You said I could assume anything, so I'll assume that the airport's floor is a nice uniform green colour. Let's take a snapshot of the queue every 10 seconds:
We can use a colour range filter to identify the areas of floor that are empty in each image:
Then by calculating the maximum pixel values in each of these images, we can eliminate people who are just milling around and not part of the queue. Calculating the queue length from this image should be very easy:
There are several ways of improving on this. For example, green might not be a good choice of colour in Dublin airport on St Patrick's day. Chequered tiles would be a little more difficult to segregate from foreground objects, but the results would be more reliable. Using an infrared camera to detect heat patterns is another alternative.
But the general approach should be fairly robust. There's absolutely no need to try and identify the outlines of individual people — this is really very difficult when people are standing close together.
I would just use a person detector, for example OpenCV's HOG people detection:
http://docs.opencv.org/modules/gpu/doc/object_detection.html
or latent svm with the person model:
http://docs.opencv.org/modules/objdetect/doc/latent_svm.html
I would count the number of people in the queue...
I would estimate the color of the empty floor, and go to a normalized color space (like { R/(R+G+B), G/(R+G+B) } ). Also do this for the image you want to check, and compare these two.
My assumption: where the difference is larger than a threshold T it is due to a person.
When this is happening for too much space it is crowded and you need more clerks for check-in.
This processing will be way more robust than trying to recognize and count individual persons, and will work with quite row resolution / low amount of pixels per person.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Given N x-ray images with different exposure doses, I must combine them into a single one which condenses the information from the N source images. If my research is right, this problem falls in the HDRI cathegory.
My first approach is a weighted average. For starters, I'll work with just two frames.
Let A be the first image, which is the one with lowest exposure and thus is set to weigh more in order to highlight details. Let B be the second, overexposed image, C the resulting image and M the maximum possible pixel value. Thus, for each pixel i:
w[i] = A[i]/M
C = w[i] * A[i] + ( 1 - w[i] ) B[i]
An example result of applying this idea:
Notice how the result (third image) nicely captures the information from both source images.
The problem is that the second image has discontinuities around the object edges (this is unavoidable in overexposed images), and that carries on to the result. Looking closer...
The best reputed HDR software seems to be Photomatix, so I fooled around with it and no matter how I tweaked it, the discontinuities always appear in the result.
I think that I should somehow ignore the edges of the second image, but I must be do it in a "smooth way". I tried using a simple threshold but the result looks even worse.
What do you suggest? (only open source libraries welcome)
The problem here is that each image has a different exposure dose associated. Any HDR algorithm must take this into account.
I asked the people who created the x-ray images, and the exposure dose for the second image is approximately 4.2 times that of the first one. I was giving wrong EV values to Photomatix because I didn't know that EV is expressed in terms of stops, 1 stop meaning twice the reference value. So, assigning 0 EV to the first image and +2.1 EV to the second one, the discontinuities were gone, keeping all information.
Next problem was that I had no idea how Photomatix did this. So then I tried doing the same using Luminance HDR, aka qtpfsgui, which is open source.
To sum it up, the exposure bracketed images must be fed to an HDR compression algorithm, which creates an HDR image. Basically, that's a float point image which contains the information of all images. There are many algorithms to do this. Luminance HDR calls this HDR creation model and offers two of them: Debevec, and Robertson.
However, an HDR image cannot be displayed directly on a conventional display (i.e. monitor). So we need to convert it to a "normal" (LDR) image while keeping as much color information as possible. This is called tone-mapping, and there also various algorithms available for this; Luminance calls these Tonemap Operators and offers several. It also selects the most suitable one. The Pattanaik operator worked great for these images.
So now I'm reading Luminance's code in order to understand it and make my own implementation.