Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Given N x-ray images with different exposure doses, I must combine them into a single one which condenses the information from the N source images. If my research is right, this problem falls in the HDRI cathegory.
My first approach is a weighted average. For starters, I'll work with just two frames.
Let A be the first image, which is the one with lowest exposure and thus is set to weigh more in order to highlight details. Let B be the second, overexposed image, C the resulting image and M the maximum possible pixel value. Thus, for each pixel i:
w[i] = A[i]/M
C = w[i] * A[i] + ( 1 - w[i] ) B[i]
An example result of applying this idea:
Notice how the result (third image) nicely captures the information from both source images.
The problem is that the second image has discontinuities around the object edges (this is unavoidable in overexposed images), and that carries on to the result. Looking closer...
The best reputed HDR software seems to be Photomatix, so I fooled around with it and no matter how I tweaked it, the discontinuities always appear in the result.
I think that I should somehow ignore the edges of the second image, but I must be do it in a "smooth way". I tried using a simple threshold but the result looks even worse.
What do you suggest? (only open source libraries welcome)
The problem here is that each image has a different exposure dose associated. Any HDR algorithm must take this into account.
I asked the people who created the x-ray images, and the exposure dose for the second image is approximately 4.2 times that of the first one. I was giving wrong EV values to Photomatix because I didn't know that EV is expressed in terms of stops, 1 stop meaning twice the reference value. So, assigning 0 EV to the first image and +2.1 EV to the second one, the discontinuities were gone, keeping all information.
Next problem was that I had no idea how Photomatix did this. So then I tried doing the same using Luminance HDR, aka qtpfsgui, which is open source.
To sum it up, the exposure bracketed images must be fed to an HDR compression algorithm, which creates an HDR image. Basically, that's a float point image which contains the information of all images. There are many algorithms to do this. Luminance HDR calls this HDR creation model and offers two of them: Debevec, and Robertson.
However, an HDR image cannot be displayed directly on a conventional display (i.e. monitor). So we need to convert it to a "normal" (LDR) image while keeping as much color information as possible. This is called tone-mapping, and there also various algorithms available for this; Luminance calls these Tonemap Operators and offers several. It also selects the most suitable one. The Pattanaik operator worked great for these images.
So now I'm reading Luminance's code in order to understand it and make my own implementation.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
For a research project at the institute I am working at, we are systematically collecting Street View Panoramas in certain areas.
In our country (Germany), a lot of buildings are censored. As I understand it, this is because according to our laws, Google must remove any personally identifying information upon request.
That is fine and I'm not looking to take away people's constitutional rights.
What I would like to be able to do is programmatically determine whether an image has one (or a certain percentage) of these blurred tiles in it, so we can exclude them as they are not useful to us.
I had a look at the metadata that I receive from a street view api request, but it did not look like there was such a parameter. Maybe I'm looking in the wrong place, though?
Thank you for your help :)
PS: "Alternative" solutions are also welcome - I have looked quickly into whether this kind of thing might be able to be done with certain image evaluation algorithms.
This might be a difficult/impossible task.
Blurred areas should have a lower noise amplitude, and you can enhance this by taking the gradient amplitude (possibly followed by equalization to increase contrast).
Anyway, real world images can also feature very uniform areas or slow shades, and if the image has low noise, there will be no way to distinguish them from blurred areas.
In addition, the images may be JPEG compressed, so that JPEG artefacts can be present and can strongly alter the uniformity and/or noise.
If a censored area is displayed as big pixels, then you have more luck: you can detect small squares of a uniform color, arranged in a grid. This never occurs in natural images. (But unfortunately again, lossy compression will make it harder.)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
First of all this Theory confuse me could someone explain it for me in some words.?
also the word scale in computer vision context does it means the various size of objects
Or the various units measurement of objects ( i.e meter , cm etc) or what I think is the various degrees smoothing/blurring for the same interesting Image ?
Second making multi-scale of Image by using smooth/blur operator which one I know the Gaussian blur operator. why they do a numbers of Smoothing for the Same Image , what the point of making numbers of smooth Images with different details/resolution but not different in size for the same scene (i.e one smooth operator on the interest image with size 256X256 and another time with 512X512 ).
I'm talking in context of Features extraction & description .
I will be thankful if some one could clarify the subject for me sorry for my Language !.
"Scale" here alludes to both the size of the image as well as the size of the objects themselves... at least for current feature detection algorithms. The reason why you construct a scale space is because we can focus on features of a particular size depending on what scale we are looking at. The smaller the scale, the coarser or smaller features we can concentrate on. Similarly, the larger the scale, the finer or larger features we can concentrate on.
You do all of this on the same image because this is a common pre-processing step for feature detection. The whole point of feature detection is to be able to detect features over multiple scales of the image. You only output those features that are reliable over all of the different scales. This is actually the basis of the Scale-Invariant Feature Transform (SIFT) where one of the objectives is to be able to detect keypoints robustly that can be found over multiple scales of the image.
What you do to create multiple scales is decompose an image by repeatedly subsampling the image and blurring the image with a Gaussian filter at each subsampled result. This is what is known as a scale space. A typical example of what a scale space looks like is shown here:
The reason why you choose a Gaussian filter is fundamental to the way the scale space works. At each scale, you can think of each image produced as being a more "simplified" version of the one found from the previous scale. With typical blurring filters, they introduce new spurious structures that don't correspond to those simplifications made in the finer scales. I won't go into the details, but there is a whole bunch of scale space theory where in the end, scale space construction using the Gaussian blur is the most fundamental way to do this, because new structures are not created when going from a fine scale to any coarse scale. You can check out that Wikipedia article I linked you to above that talks about the scale space for more details.
Now, traditionally a scale space is created by convolving your image with a Gaussian filter of various standard deviations, and that Wikipedia article has a nice pictorial representation of that. However, when you look at more recent feature detection algorithms like SURF or SIFT, they use a combination of blurring using different standard deviations as well as subsampling the image, which is what I talked about at the beginning of this post.
Either way, check out that Wikipedia post for more details. They talk about about this stuff more in depth than what I've done here.
Good luck!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'd love to write a program that will take a scanned invoice (original is A4 paper, scanned as JPEG file (wrapped in a PDF), ~4000 pixels wide) and look for logotypes. If a logotype is found, the invoice file (PDF) will be tagged with those tags associated with the logotypes found in the invoice.
I expect 20 or so logotypes to look for, and about 2500 invoices (so yes, a pain to do manually).
My ideas are drawn towards OpenCV since I know that's used behind the scenes by Sikuli. I would only look for logos in certain areas, ie logo A should only be looked for in top left corner of every invoice, logo B top right etc. Dropping the JPG to monochrome with high contrast I assume would help too?
"20 or so logotypes" is a good number to use keypoints (corners, blobs etc) and it's descriptors(SIFT, SURF, FREAK etc) in find-nearest-neighbor-way. Steps are:
1 train
create a training set of logo (take it from your documents)
calculate a set of keypoints and it's descriptors for every logo
2 find
do picture equalization and noise filtering
find keypoints and it's descriptors
find best matching descriptors (find nearest neighbor) in you training set
find homography for matching keypoints position to be sure it is a complete logo but not just one accidental point
All this steps are implemented in openCV. But you will need some time to play with parameters to have the best solution. Anyway you have very low level of logo distortion so you will have high level of "True Positive" results and low level of "False Positive" ones.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am doing a project which is hole detection in road. I am using a laser to emit beam on the road and using a camera to take a image of the road. the image may be like this
Now i want to process this image and give a result that is it straight or not. if it curve then how big the curve is.
I dont understand how to do this. i have search a lot but cant find a appropriate result .Can any one help me for that?
This is rather complicated and your question is very broad, but lets have a try:
Perhaps you have to identify the dots in the pixel image. There are several options to do this, but I'd smoothen the image by a blur filter and then find the most red pixels (which are believed to be the centers of the dots). Store these coordinates in a vector array (array of x times y).
I'd use a spline interpolation between the dots. This way one can simply get the local derivation of a curve touching each point.
If the maximum of the first derivation is small, the dots are in a line. If you believe, the dots belong to a single curve, the second derivation is your curvature.
For 1. you may also rely on some libraries specialized in image processing (this is the image processing part of your challenge). One such a library is opencv.
For 2. I'd use some math toolkit, either octave or a math library for a native language.
There are several different ways of measuring the straightness of a line. Since your question is rather vague, it's impossible to say what will work best for you.
But here's my suggestion:
Use linear regression to calculate the best-fit straight line through your points, then calculate the mean-squared distance of each point from this line (straighter lines will give smaller results).
You may need to read this paper, it is so interesting one to solve your problem
As #urzeit suggested, you should first find the points as accurately as possible. There's really no way to give good advice on that without seeing real pictures, except maybe: try to make the task as easy as possible for yourself. For example, if you can set the camera to a very short shutter time (microseconds, if possible) and concentrate the laser energy in the same time, the "background" will contribute less energy to the image brightness, and the laser spots will simply be bright spots on a dark background.
Measuring the linearity should be straightforward, though: "Linearity" is just a different word for "linear correlation". So you can simply calculate the correlation between X and Y values. As the pictures on linked wikipedia page show, correlation=1 means all points are on a line.
If you want the actual line, you can simply use Total Least Squares.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I was given this question on a job interview and think I really messed up. I was wondering how others would go about it so I could learn from this experience.
You have one image from a surveillance video located at an airport which includes line of people waiting for check-in. You have to assess if the line is big/crowded and therefore additional clerks are necessary. You can assume anything that may help your answer. What would you do?
I told them I would try to
segment the area containing people from the rest by edge detection
use assumptions on body contour such as relative height/width to denoise unwanted edges
use color knowledges; but then they asked how to do that and I didn't know
You failed to mention one of the things that makes it easy to identify people standing in a queue — the fact that they aren't going anywhere (at least, not very quickly). I'd do it something like this (Warning: contains lousy Blender graphics):
You said I could assume anything, so I'll assume that the airport's floor is a nice uniform green colour. Let's take a snapshot of the queue every 10 seconds:
We can use a colour range filter to identify the areas of floor that are empty in each image:
Then by calculating the maximum pixel values in each of these images, we can eliminate people who are just milling around and not part of the queue. Calculating the queue length from this image should be very easy:
There are several ways of improving on this. For example, green might not be a good choice of colour in Dublin airport on St Patrick's day. Chequered tiles would be a little more difficult to segregate from foreground objects, but the results would be more reliable. Using an infrared camera to detect heat patterns is another alternative.
But the general approach should be fairly robust. There's absolutely no need to try and identify the outlines of individual people — this is really very difficult when people are standing close together.
I would just use a person detector, for example OpenCV's HOG people detection:
http://docs.opencv.org/modules/gpu/doc/object_detection.html
or latent svm with the person model:
http://docs.opencv.org/modules/objdetect/doc/latent_svm.html
I would count the number of people in the queue...
I would estimate the color of the empty floor, and go to a normalized color space (like { R/(R+G+B), G/(R+G+B) } ). Also do this for the image you want to check, and compare these two.
My assumption: where the difference is larger than a threshold T it is due to a person.
When this is happening for too much space it is crowded and you need more clerks for check-in.
This processing will be way more robust than trying to recognize and count individual persons, and will work with quite row resolution / low amount of pixels per person.