I have a large set of "apple" images in various shapes, sizes, lighting, color, etc. These "apple" images were part of a larger image from different angles.
Now I want to train Darknet to detect "apple"s in images. I don't want to go through annotation process as I already have cropped out ready jpg images of apples.
Can I use these ready and cropped "apple" images to train Darknet or do I still have to go through annotation process?
In object detection models, you annotate the object in an image because it will understand where the object is in a particular image. If you have an entire dataset containing only apple images, the model will learn in a way such that every image you provide will contain the only apple. So even if you provide an "orange" as a test image, it might still give apple because it doesn't know another class except for apple.
So there are two important points to consider:
Have dataset in such a way that there are apples, apples with other fruits or other objects. This will help the model to understand clearly what apple is.
As the coordinates of the bounding box are inputs for the detection, although you can give the regular dimensions of the image as the bounding box, it won't learn effectively well as mentioned above. Therefore, have multiple objects in the image and then annotate well so that the model can learn well
Your answer relates to a process that we called "Data Augmentation". You can google it how others do.
Since your apple images are all cropped-ready, you can assume all apple images were already tagged by their full sizes. Then collect some background images of which the sizes are all bigger than any of your apple images. And now you can write a tool to randomly select an apple image and combine it to your randomly-selected background to generate 'new' apple images with backgrounds. Since you must know the size of each apple image, you can definitely calculate the size of the bounding box and its position and then generate its tag file.
Related
I am dealing with an issue while using my model to predict masks on MRI images. The thing is, I have two images which have different dimensions. My goal is to find out how different the mask is. However because my model only takes (256,256) images, I have to resize them. During the resizing process the organ gets very dissimilar in both images because the original dimensions were different. Is there any image processing technique there using which I can resize both my input images in a way their content remains as before.
You could also CenterCrop (https://pytorch.org/vision/stable/generated/torchvision.transforms.CenterCrop.html) your images. Especially if their dimensions are already close to your desired dimension, you won't lose a lot of information, and mostly you are interested in information of the center of your image anyway.
import torchvision.transforms.functional as F
img = F.center_crop(img, crop_size)
I am working on a project, which will intake multiple images (lets say 2 for the moment) and combine them to generate a better image. The resultant image will be a combination of those input images. As a requirement I want to achieve this by using OpenCV. I read about Image Stitching and saw some example images in the process and now I am confused whether image overlapping is equal to image stitching, or can the Stitcher class in OpenCV do Image overlapping? A little clarity as to how can I achieve the above project problem thru OpenCV.
"Image overlapping" is not really a term used in the CV literature. The general concept of matching images via transformations is most often called image registration. Image registration is taking many images and inserting them all into one shared coordinate system. Image stitching relies on that same function, but additionally concerns itself with how to blend multiple images. Furthermore, image stitching tries to take into account multiple images at once and makes small adjustments to the paired image registrations.
But it seems you're interested in producing higher quality images from multiple images of the same space (or from video feed of the space for example). The term for that is not image overlapping but super-resolution; specifically, super-resolution from multiple images. You'll want to look into specialized filters (after warping to the same coordinates) to combine those multiple views into a high resolution image. There are many papers on this topic (e.g.). Even mean or median filters (that is, taking the mean or median at every pixel location across the images) can work well, assuming your transformations are very good.
I read Dalal and Triggs paper for HOG description and a blog by Chris McCormick regarding the same. The blog says that the image needs to be re-sampled at different scales to recognize different person.
My question is: Already we have a window which we place on the image having a size of 64*128 and which slides over the image. Then why re-sampling instead of sliding the whole window over the image which can detect the persons instead. ?
Please rectify if I am wrong, thanks in advance !!
You're right about the fact that the size of 64*128 is trained to be classified as either 'person' or 'non person'. But do all the persons in real world images always come in a handy 64*128 size?
That is where the scaling comes to play. By progressively making image smaller, the same 64*128 pixel region will cover larger area in the original image allowing detection of multiple sizes people.
For example,Here is an example from one of my models after running the detection on multiple scales. The result presented is after applying non-maximal supression to weed out extreneous detection windows.
Given a logo image as a reference image, how to detect/recognize it in a cluttered natural image?
The logo may be quite small in the image, it can appear in clothes, hats, shoes, background wall etc. I have tried SIFT feature for matching without any other preprocessing, and the result is good for cases in which the size of the logo in images is big and the logo is clear. However, it fails for some cases where the scene is quite cluttered and the proportion of the logo size is quite small compared with the whole image. It seems that SIFT feature is sensitive to perspective distortions.
Anyone know some better features or ideas for logo detection/recognition in natural images? For example, training a classifier to locate candidate regions first, and then apply directly SIFT matching for further recognition. However, training a model needs many data, especially it needs manually annotating logo regions in images, and it needs re-training (needs to collect and annotate new image) if I want to apply it for new logos.
So, any suggestions for this? Detailed workflow/code/reference will be highly appreciated, thanks!
There are many algorithms from shape matching to haar classifiers. The best algorithm very depend on kind of logo.
If you want to continue with feature registration, i recommend:
For detection of small logos, use tiles. Split whole image to smaller (overlapping) tiles and perform usual detection. It will use "locality" of searched features.
Try ASIFT for affine invariant detection.
Use many template images for reference feature extraction, with different lightning , different background images (black, white, gray)
I have a dataset of about 2000 images. This database contains some blurred images.
How can I automatically remove the blurred images from this database?
I read about fourier transformation to remove the blurred images. First I need to transform my images into fourier domain and then by applying some threshold I will be able to identify the blurred images. Could anybody give me some sample code in matlab for this? I don't know how to determine the threshold. Are there any way to determining this threshold?
This task is really not so simple, if you remove all the images that doesn't contain high frequencies you will end up removing many images that contain smooth scenes even though they are not blurred.
There is no 100% in computer vision, the best thing for you (in my opinion) is to make a human aided software, your software should suggest on the images that it thinks should be removed, but the final call must be made by a human being.