I am building an image classification in caffe. All images are fixed size and I know their nature. Now I want to guide Neural Network to look closer to the specific area, as I know that this area carries a lot of information. And in the end, I want to combine to layer stacks back. This is what I achieved with a custom python layer which crops a specific area for me.
The thing is Python layer even doing its job is very slow. I know about caffe crop layer, but it looks like it doesn't do what I want, as far as I understood it only aligns 1 shape to another, However, I can't specify are direct.
Am I missing something here? It looks like it should be simpler solution here.
The solution is to use Crop layer with DummyData layer as a second bottom layer
Related
Problem i am trying to tackle is very simple, however, i am stuck on thinking how to do that. I have extracted masks using segmentation algorithms for pipe visible openings (not all the openings are clear, most results in partial circles). I want to determine how much percentage the mask is of overall pipe opening.
I am looking for some generic solution since images i am dealing with are of different scales and sizes. An idea i have in mind is to somehow fit the given mask in the circular shape (draw circle from given partial circular shape), however, unable to find any support.
Any idea, how i can tackle this problem. Thanks in advance.
Image 1 is the ideal image where the openings are clear. I can extract the masks as separate and want to determine how much part the mask is of the circle. In this case ideally it should say 100% which means all clear.
Image 2 on the other hand is the partial blocked opening. This is where i want to fit those masks as circle and determine how much part of circle that part is, in this case less than 50%, which means 50% blocked. Hope it clears the situation
An Ideal image with Mask
Image with partial masks
I got an project needs to detect person in anime-like style vedios
I just tested YOLOv3 608x608 with COCO in GTX 1050TI
however speed is only at about ~1.5FPS , but I need at least 10 FPS on 1050TI for my project
1.I want to know that does the number of the classes will effect detection speed? (I assume COCO is about finding 80 kinds object in picture? if I just need find one kind of object, will it go 80x faster?)
2.when I input image for training ,original image are 1920*1080, should I resize them to 608x608 before labeling and training?
3.is there any labeling tool should I use? in README.md at https://github.com/AlexeyAB/darknet <x> <y> <width> <height> seems need to be calculate and input by hand which seems too hard, maybe there is a tool I just need to crop where the object is in image?
4.if the object is not a square in image, how does YOLO know which part are object? How to avoid it train background as object?
do I have to remove all background and fill it as black, only keep the object in image?
5.is the output always a box? can I train and get output as mask? if I detect as mask, will it slower then box because it seems to be more information?
6.to get a good result, how many training image and test image should I make?
I know it's just some noob question in CV, however I really want to know this without spending weeks in training and find out answer myself , an answer will be appreciated!
3.
https://en.wikipedia.org/wiki/List_of_manual_image_annotation_tools
You should be able to get output of corners coordinates by using some image annotation tool.
4.
With enough images with different background for training, supposedly the model should be able to ignore background. A black background is still a background. I guess that's a kind of data augmentation, so it might help reduce overfitting.
5.
If it does not support mask out-of-the-box, maybe you want to do background-subtraction as an extra step to process the output.
1) In my opinion, GTX 1050Ti is not enough to test YOLO v3. Because, the model size (i.e. the number of layers) of the YOLO v3 becomes extremely large compared with the previous versions. The number of classes will be not matter in this case. If you want fast test computing speed, you should upgrade your GPU like 1070Ti.
2) Whatever the size of input images, it will be resized into the pre-defined size, which is depicted as cfg file, by force, so you don't need to resize the input image.
1) I think it may affect the speed a bit in because as you use less classes you get less convolutional filters before each YOLO layer (you set it up in the .cfg file), but it's not likely gonna be an 80x speed up
2) Maybe? I mean, YOLO's gonna resize them when training and then testing, so maybe if you really want to you could, but high res images usually work better, in my experience.
3)I like the OpenLabelling (you can just Google it and it's on GitHub)
4) You may wanna give YOLO negative images that have nothing in them to prevent them picking up on a background, where there's nothing there
5)YOLO doesn't do masks
6)About 1k per class is what probably will work, you can get by with 500 but the rule of thumb is that the more, the better)
If you're interested, I've put out the whole series on YOLO on YouTube, so you may wanna check it out: https://youtu.be/TP67icLSt1Y
I have an image that is both pretty noisy, small (the relevant portion is 381 × 314) and the features are very subtle.
The source image and the cropped relevant area are here as well: http://imgur.com/a/O8Zc2
The task is to count the number of white-ish dots within the relevant area using Python but I would be happy with just isolating the lighter dots and lines within the area and removing the background structure (in this case the cell).
With OpenCV I've tried Histogram equalization (destroys the details), finding contours (didn't work), using color ranges (too close in color?)
Any suggestions or guidance on other things to try? I don't believe I can get a higher res image so is this task possible with the rather difficult source?
(This is not a Python answer, since I never used the Python/OpenCV binding. The images below were created using Mathematica. But I just used basic image processing functions, so you should be able to implement that in Python on your own.)
A very general "trick" in image processing is to think about removing the thing you're looking for, instead of actually looking for it. Because often, removing it is much easier than finding it. You could for instance apply a morphological opening, median filter or a gaussian filter to it:
These filters effectively remove details smaller than the filter size, and leave the coarser structures more or less untouched. So you can just take the difference from the original image and look for local maxima:
(You'll have to play around with different "detail removal filters" and filter sizes. There's no way to tell which one works best with just one image.)
I am working on images that are partially blur on some sections. These are noises that should be taken care of, but here is the problem:
Are there methods to detect whether an image is blur or partially blur at some sections of an image? For instance, take a look at sample image below:
You can see in the image that there are 3 sections that are visually blur: bottom-left, near center region and top-right. Now, is it possible to detect that any portion of an image is blur programming-wise or mathematically?
As lain_b pointed out, with an image like this you can use an edge detector and look for an absence of edges. I tried it on your image and it seems to work pretty well. First I used the kernel
[0,1,0,
1,-4,1,
0,1,0]
Which is a simple edge detector. Its result was
Then I used a threshold to get
Then I closed the image and opened it to get
This is obviously not a finished version, the top right portion did not recognize well at all. Perhaps you could improve it by blurring before performing thresholding, or by choosing better values for the threshold and the radii of the opening and closing operations. A lot of the decisions you will need to make depend on the constraints you can put on your problem. I think this technique will work for you though.
Edit
If you are looking for blur detection of arbitrary images you are going to have to investigate a wide variety of techniques. Things are much easier if you can make assumptions about your set of input images. Without any assumptions I don't know what will work best for you. Here is some reading on the topic
Image Blur Metrics
Reserach paper on using the Harr wavelet transform
Similar SO Question and look at the question that question links to
Blur detection is a very active research field, there is no one answer. You will just need to try all the methods you can find (these were found by googling detect blur in image).
This paper may be of some help. It does blur estimation (mostly for out of focus, but I think it also does blur) to recreate a similarly blurred object in the image.
I think you should be able to use it to detect the blurred areas, and how blurred they are. It should be especially relevent to your problem as it is designed to work with real-world images.
I have a random 2D image. I would like to be able to present the image in 3D. This doesn't have to be very detailed, even if the image were arbitrarily broken into layers like a pop-up cutout from a children's book.
The goal would be that a given image would look normal when directly viewed but that if a viewer were to move/tilt left, right, up, down there would be a 3d effect.
This is similar but not exactly the same as this question here:
How to create 3D streoscopic images using MATLAB with image tool?
This is complete over-kill:
http://make3d.cs.cornell.edu/
And this is probably on the right track:
http://www.imagemagick.org/Usage/distorts/#perspective
My ideal implementation would be a automated PHP script with ImageMagick take is fed an image and spits out as a result either (in order of preference):
Images representing each layer, from
nearest to deepest (closer to the
childs pop-up book layer analogy)
5 images representing the said views
(direct, left, right, top, bottom)
Has this been done (either of the above ideal implementations), or does anyone know how to do all, or part, of this?
As far as the first part of your question is concerned, it sounds like your ideal implementation is http://make3d.cs.cornell.edu/, except that:
you want it simpler (return images from a fixed set of angles as opposed to a walkthrough)
you want it with imagemagick and PHP
I think that last restriction is unrealistic because there's a fair amount of maths and computer vision behind this kind of problem. Imagemagick will help you with lower level-image processing tasks like affine transforms, but it doesn't really provide the required higher-level computer vision functionality like 3D image reconstruction.
So my advice would be to try and work around that restriction somehow. If you implement the approach using more suitable tools (like C++ and OpenCV, for example, or Matlab, as the Make3D guys did), then you can wrap that in a CGI application so your PHP scripts can access it. Cornell (the authors of Make3D) had a similar thing going a while back, but it looks like they're not doing it any more.
For the second part of your question, the theory behind what you want to do has been fairly well-researched. See here for a list of depth estimation papers. Here is what things look like in source.