Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I know if my question is pretty basic but I can't seem to find the answer on the internet. So I want to know the people tend to convert image into grayscale prior to adding salt and pepper noise?
It's implied in the name, salt and pepper are black and white. Noise is always present to some degree in images captured by cameras. As such it is often added to images during testing to check if the solution is robust to real world cases.
Noise in a color image can take two forms, chromatic noise or luminance noise. Luminance noise will be consistent across color channels, chromatic noise will vary by color channel. Chromatic noise is caused by the camera sensor's sensitivities to the various wavelengths of light. Luminance noise is caused by the camera system's electrical "noise floor" which is a product of overall sensitivity.
You can add noise in grayscale or in color, the process is the same. For academic purposes, writing a solution that works on a grayscale image with noise is a similar, though possibly less complex, problem than writing a solution for full color images. Computer vision is often only done on grayscale images, so it is common to test against grayscale images.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I understand that the the convert -unsharp from ImageMagick is using Unsharp Masking to sharpen the image. What kind of algorithm is behind convert -adaptive-sharpen? When I want to sharpen my lanscape images, which algorithm should I use? What are the advantages and disadvantages for the two algorithms?
I'm not an expert on the algorithm, but both operations achieve the same goal by creating a "mask" to scale the intensity of the sharpening. They differ on how the generate the "mask", and the arithmetic operations.
With -unsharp
Given...
For demonstration, let's break this down into channels.
Create a "mask" by applying a Gaussian blur.
Apply the gain of the inverse mask if threshold applies.
Ta-Da
With -adaptive-sharpen
Given...
For demonstration, let's break this down into channels (again).
Create "mask" by applying edge detection, and then Gaussian blur.
Apply sharpen, but scale the intensity against the above mask.
Fin
Which command will give the better results for normal outdoor images?
That depends on the subject matter. It's a good rule-of-thumb to use -adaptive-sharpen if the image contains large empty space (sky, sea, grass, &etc), or bokeh/blurred background. Else -unsharp will work just fine.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
The community reviewed whether to reopen this question 6 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm trying to implement this paper right now:
Automatic Skin and Hair Masking Using Convolutional Neural Networks
I've gotten the FCN and CRF part working, and I found the code to generate the alpha mask once I have the trimap.
I'm stuck on the part between (c) and (d), though.
How do I generate a trimap given the binary mask? The paper says:
We apply morphological operators on the binary segmentation
mask for hair and skin, obtaining a trimap that indicates
foreground (hair/skin), background and unknown pixels. In
order to deal with segmentation inaccuracies, and to best capture
the appearance variance of both foreground and background,
we first erode the binary mask with a small kernel,
then extract the skeleton pixels as part of foreground constrain
pixels. We also erode the binary mask with a larger kernel to
get more foreground constrain pixels. The final foreground
constrain pixels is the union of the two parts. If we only keep
the second part then some thin hair regions will be gone after
erosion with a large kernel. If a pixel is outside the dilated
mask then we take it as background constrain pixel. All other
pixels as marked as unknown, see figure 2 (d).
OpenCV supports morphological operations.
Please see this tutorial explaining how to use erode and dilate functions.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
First of all this Theory confuse me could someone explain it for me in some words.?
also the word scale in computer vision context does it means the various size of objects
Or the various units measurement of objects ( i.e meter , cm etc) or what I think is the various degrees smoothing/blurring for the same interesting Image ?
Second making multi-scale of Image by using smooth/blur operator which one I know the Gaussian blur operator. why they do a numbers of Smoothing for the Same Image , what the point of making numbers of smooth Images with different details/resolution but not different in size for the same scene (i.e one smooth operator on the interest image with size 256X256 and another time with 512X512 ).
I'm talking in context of Features extraction & description .
I will be thankful if some one could clarify the subject for me sorry for my Language !.
"Scale" here alludes to both the size of the image as well as the size of the objects themselves... at least for current feature detection algorithms. The reason why you construct a scale space is because we can focus on features of a particular size depending on what scale we are looking at. The smaller the scale, the coarser or smaller features we can concentrate on. Similarly, the larger the scale, the finer or larger features we can concentrate on.
You do all of this on the same image because this is a common pre-processing step for feature detection. The whole point of feature detection is to be able to detect features over multiple scales of the image. You only output those features that are reliable over all of the different scales. This is actually the basis of the Scale-Invariant Feature Transform (SIFT) where one of the objectives is to be able to detect keypoints robustly that can be found over multiple scales of the image.
What you do to create multiple scales is decompose an image by repeatedly subsampling the image and blurring the image with a Gaussian filter at each subsampled result. This is what is known as a scale space. A typical example of what a scale space looks like is shown here:
The reason why you choose a Gaussian filter is fundamental to the way the scale space works. At each scale, you can think of each image produced as being a more "simplified" version of the one found from the previous scale. With typical blurring filters, they introduce new spurious structures that don't correspond to those simplifications made in the finer scales. I won't go into the details, but there is a whole bunch of scale space theory where in the end, scale space construction using the Gaussian blur is the most fundamental way to do this, because new structures are not created when going from a fine scale to any coarse scale. You can check out that Wikipedia article I linked you to above that talks about the scale space for more details.
Now, traditionally a scale space is created by convolving your image with a Gaussian filter of various standard deviations, and that Wikipedia article has a nice pictorial representation of that. However, when you look at more recent feature detection algorithms like SURF or SIFT, they use a combination of blurring using different standard deviations as well as subsampling the image, which is what I talked about at the beginning of this post.
Either way, check out that Wikipedia post for more details. They talk about about this stuff more in depth than what I've done here.
Good luck!
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
A lot of research papers that I am reading these days just abstractly write image1-image2
I imagine they mean gray scale images. But how to extend these to color images ?
Do I take the intensities and subtract ? How would I compute these intensities by taking the average or by taking the weighted average as illustrated here?
Also I would prefer if you could quote the source of this as well preferably from a research paper or a textbook.
Edit: I am working on motion detection where there are tons of algorithms which create a background model of the video(image) and then we subtract the current frame(again a image) from this model. We see if this difference exceeds a given threshold in which case we classify the pixel as foreground pixel. So far I have been subtracting the intensities directly but don't know whether other approach is possible.
Subtraction directly at RGB space or after converting to grayscale space is possible to miss useful information, and at the same time induce many unwanted outliers. It is possible that you don't need the subtraction operation. By investigating the intensity difference between background and object at all three channels, you can determine the range of background at the three channels, and simply set them to zero. This study demonstrated such method is robust against non-salient motion (such as moving leaves) with the presence of shadows at various environments.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
My task is to pin-point where is the plate number in an image. The image does not only contain the plate number. It may contain the whole car or anything. I used gaussian blur then grayscale then contrast then laplacian of gaussian to detect the edges.
Now, I am at loss on how to detect where is the plate number in the image. I am not going to read the license number, just make the system know where is the license number.
Can you direct me to a study regarding this? Or perhaps the algorithm that can be used to do this.
Thank you!
I think a more robust way to tackle this is a train a detector if you have enough training images of the license plate in different scenarios. Few things you can try is Haar cascade classifier in Opencv library. It does a multiscale detection of learned patterns.
You could try edge detection or some form of Hough transforms.
For example, do edge detection and then look for rectangles (or if the images aren't straight on, parallelograms) in the image. If you know that the plates will all be the same shape and size ratios, you can use that to speed up your search.
EDIT:
Found this for you.
Using some feature recognition algorithm e.g. SIFT would be a good starting point. Do you need real-time recognition or not? I recommend trying to tighten search space first, for example by filtering out regions from the image (is your environment controlled or not?). There is an article about recognising license plates using SIFT here (I just skimmed it but it looks reasonable).
License-plates or number plates of vehcles come with 2 striking properties.
They have specified color pattern (Black letters on white, yellow or gray background)
Aspect ratio
These properties can be used to extract only the license plate. First threshold the image using adaptive thresholding. Then find contours in the image with aspect ratio in a close range to standard value. This method should work for most of the cases. You can also try erosion followed by dilation of thresholded image to remove noise.