How to draw bounding boxes for predicted classes for a custom trained image classifier using fastai - bounding-box

I have created and guns and knives image classifier using fastai model resnet34. I am able to predict the class in an image but I want to extend it for real time analysis of a video footage. I want to get bounding boxes around the guns and knives but am not able to understand how I can get this. Can somebody please guide me through.

So I would do it in two steps. First, learn how to get bounding boxes in fastai. Example here.
If you have that you can use opencv to load a video frame by frame and label it. OpenCV example.
When you have both it's quite easy to do it in real time. Just label frames loaded from your webcam. This process is quite expensive so you probably want to lower image resolution before inference. I've done it in tensorflow and it worked quite nicely. This repository was extremely useful

Related

training custom object in YOLOv3, how does it work?

I got an project needs to detect person in anime-like style vedios
I just tested YOLOv3 608x608 with COCO in GTX 1050TI
however speed is only at about ~1.5FPS , but I need at least 10 FPS on 1050TI for my project
1.I want to know that does the number of the classes will effect detection speed? (I assume COCO is about finding 80 kinds object in picture? if I just need find one kind of object, will it go 80x faster?)
2.when I input image for training ,original image are 1920*1080, should I resize them to 608x608 before labeling and training?
3.is there any labeling tool should I use? in README.md at https://github.com/AlexeyAB/darknet <x> <y> <width> <height> seems need to be calculate and input by hand which seems too hard, maybe there is a tool I just need to crop where the object is in image?
4.if the object is not a square in image, how does YOLO know which part are object? How to avoid it train background as object?
do I have to remove all background and fill it as black, only keep the object in image?
5.is the output always a box? can I train and get output as mask? if I detect as mask, will it slower then box because it seems to be more information?
6.to get a good result, how many training image and test image should I make?
I know it's just some noob question in CV, however I really want to know this without spending weeks in training and find out answer myself , an answer will be appreciated!
3.
https://en.wikipedia.org/wiki/List_of_manual_image_annotation_tools
You should be able to get output of corners coordinates by using some image annotation tool.
4.
With enough images with different background for training, supposedly the model should be able to ignore background. A black background is still a background. I guess that's a kind of data augmentation, so it might help reduce overfitting.
5.
If it does not support mask out-of-the-box, maybe you want to do background-subtraction as an extra step to process the output.
1) In my opinion, GTX 1050Ti is not enough to test YOLO v3. Because, the model size (i.e. the number of layers) of the YOLO v3 becomes extremely large compared with the previous versions. The number of classes will be not matter in this case. If you want fast test computing speed, you should upgrade your GPU like 1070Ti.
2) Whatever the size of input images, it will be resized into the pre-defined size, which is depicted as cfg file, by force, so you don't need to resize the input image.
1) I think it may affect the speed a bit in because as you use less classes you get less convolutional filters before each YOLO layer (you set it up in the .cfg file), but it's not likely gonna be an 80x speed up
2) Maybe? I mean, YOLO's gonna resize them when training and then testing, so maybe if you really want to you could, but high res images usually work better, in my experience.
3)I like the OpenLabelling (you can just Google it and it's on GitHub)
4) You may wanna give YOLO negative images that have nothing in them to prevent them picking up on a background, where there's nothing there
5)YOLO doesn't do masks
6)About 1k per class is what probably will work, you can get by with 500 but the rule of thumb is that the more, the better)
If you're interested, I've put out the whole series on YOLO on YouTube, so you may wanna check it out: https://youtu.be/TP67icLSt1Y

Cropping faces from an image

I have a collection of face images, with 1 or sometimes 2 faces in each image. What I wanna do, is find the face in each image and then crop It.
I've tested a couple of methods, which are implemented in python using openCV, but the results weren't that good. These methods are:
1- Implementation 1
2- Implementation 2
There's one more model that I've tested, but I'm not allowed to post more than two links.
The problem is that these Haar-Feature based algorithms, are not robust to face size, and when I tried them on images which were taken close to the face, they couldn't find any faces.
Someone mentioned to try deep learning based algorithms, but I couldn't find one corresponding to what I want to do. Basically, I guess I need a pre-trained model, which can give me the coordinates of the face bounding box in the image, or better, a pre-trained model which gives out the cropped face image as output.
You don't need machine learning algorithms, Graph-Algorithms is enough. For example Snapchats face recognition algorithm works as follows:
Create a Graph with Nodes and Edges from a most common Face ("Standard Face").
Deform that Graph / Recoordinate the Nodes to the fitted pixels in the Input Image
voila you got the face recognized in the Input Image.
Easy said, but harder to code. We implemented in our university the Dijkstra Algorithm for example and I can hand you my "Graph" Class if you need it. But I wrote it in C++.
With these graph-algorithm you can crop out the faces more efficient.

Image recognition and Uniqueness detection

I am new to AI/ML and am trying to use the same for solving the following problem.
I have a set of (custom) images which while having common characteristics also will have a unique pattern/signature and color value. What set of algorithms should I use to have the pass in following manner:
1. Recognize the common characteristic (like presence of a triangle at any position in a 10x10mm image). If present, proceed, else exit.
2. Identify the unique pattern/signature to identify each image individually. The pattern/signature could be shape (visible to human eye or hidden like using an overlay shape using background image with no boundaries).
3. Store color tone/hue/saturation to determine any loss/difference (maybe because the capture source is different from the original one).
While this is in way similar to face recognition algo, for me saturation/shadow will matter while being direction independent.
I figure that using CNN may be the way to go for step#2 and SVN for step#1, any input on training, specifics will be appreciated. What about step#3, use BGR2HSV? The objective is to use ML/AI and not get into machine-vision.
Recognize the common characteristic (like presence of a triangle at any position in a 10x10mm image). If present, proceed, else exit.
In a sense, what you want is a classifier that can detect patterns in an image. However, we can train classifiers to detect certain types of patterns in images.
For example, I can train a classifier to recognise squares and circles, but if I show it an image with a triangle in it, I cannot expect it to tell me it is a triangle, because it has never seen it before. The downside is, your classifier will end up misclassifying it as one of the shapes it knows to exist: either square or circle. The upside is, you can prevent this.
Identify the unique pattern/signature to identify each image individually.
What you want to do is train a classifier on a large amount of labelled data. If you want the classifier to detect squares, circles, or triangles in an image, you must train it with a large amount of labelled images of squares, circles and triangles.
Store color tone/hue/saturation to determine any loss/difference (maybe because the capture source is different from the original one).
Now, you are leaving the territory of simple image labelling and entering the world of computer vision. This is not as simple as a vanilla image classifier, but it is possible and there are a lot of online tools to help you do this. For example, you may take a look at OpenCV. They have an implementation in python and C++.
I figure that using CNN may be the way to go for step#2 and SVN for
step#1
You can combine step 1 and step 2 with a Convolutional Neural Network (CNN). You do not need to use a two step prediction process. However, beware, if you pass the CNN an image of a car, it will still label it as a shape. You can, again circumvent this by training it on a million positive samples of shapes, and a million negative samples of random other images with the class "Other". This way, anything that is not a shape will get classified into "Other". This is one possibility.
What about step#3, use BGR2HSV? The objective is to use ML/AI and not
get into machine-vision.
With the inclusion of this step, there is no option but to get into computer vision. I am not exactly sure how to go about this, but I can guarantee OpenCV will provide you a way to do this. In fact, with OpenCV, you will no longer need to implement your own CNN, because OpenCV has its own image labelling libraries.

Processing 20x20 image before training for machine learning

I have 10,000 examples 20x20 png image (binary image) about triangle. My mission is build program, which predict new image is whether triangle. I think I should convert these image to 400 features example, but I don't know how convert fastest.
Can you show me the way?
Here are a image .
Your question is too broad as you dont specify which technologies you are using , but in general you need to create a vector from an array , that depends on your tools , for example if you use python(and the numpy library) you could use flatten().
image_array.flatten();
If you want to do it manually you just need to move every row to a single row.
The previous answer is correct. Yet I want to add something to it:
The example image that you provided is noisy. This is rather problematic as you are working with only binary images. Therefore I want to suggest preprocessing, such as gaussian filter or edge detection. Denoising will improve your clustering algorithms accuracy stronlgy (to my knowledge).
One important question:
What are the other pictures showing? Do you have to seperate triangles from circles? You will get much better answers if you provide more information.
Anyhow, my key message is: Preprocessing is vital for image-processing.

OCR detection with openCV

I'm trying to create a simpler OCR enginge by using openCV. I have this image: https://dl.dropbox.com/u/63179/opencv/test-image.png
I have saved all possible characters as images and trying to detect this images in input image.
From here I need to identify the code. I have been trying matchTemplate and FAST detection. Both seem to fail (or more likely: I'm doing something wrong).
When I used the matchTemplate method I found the edges of both the input image and the reference images using Sobel. This provide a working result but the accuracy is not good enough.
When using the FAST method it seems like I cant get any interresting descriptions from the cvExtractSURF method.
Any recomendations on the best way to be able to read this kind of code?
UPDATE 1 (2012-03-20)
I have had some progress. I'm trying to find the bounding rects of the characters but the matrix font is killing me. See the samples below:
My font: https://dl.dropbox.com/u/63179/opencv/IMG_0873.PNG
My font filled in: https://dl.dropbox.com/u/63179/opencv/IMG_0875.PNG
Other font: https://dl.dropbox.com/u/63179/opencv/IMG_0874.PNG
As seen in the samples I find the bounding rects for a less complex font and if I can fill in the space between the dots in my font it also works. Is there a way to achieve this with opencv? If I can find the bounding box of each character it would be much more simple to recognize the character.
Any ideas?
Update 2 (2013-03-21)
Ok, I had some luck with finding the bounding boxes. See image:
https://dl.dropbox.com/u/63179/opencv/IMG_0891.PNG
I'm not sure where to go from here. I tried to use matchTemplate template but I guess that is not a good option in this case? I guess that is better when searching for the exact match in a bigger picture?
I tried to use surf but when I try to extract the descriptors with cvExtractSURF for each bounding box I get 0 descriptors... Any ideas?
What method would be most appropriate to use to be able to match the bounding box against a reference image?
You're going the hard way with FASt+SURF, because they were not designed for this task.
In particular, FAST detects corner-like features that are ubiquituous iin structure-from-motion but far less present in OCR.
Two suggestions:
maybe build a feature vector from the number and locations of FAST keypoints, I think that oyu can rapidly check if these features are dsicriminant enough, and if yes train a classifier from that
(the one I would choose myself) partition your image samples into smaller squares. Compute only the decsriptor of SURF for each square and concatenate all of them to form the feature vector for a given sample. Then train a classifier with these feature vectors.
Note that option 2 works with any descriptor that you can find in OpenCV (SIFT, SURF, FREAK...).
Answer to update 1
Here is a little trick that senior people taught me when I started.
On your image with the dots, you can project your binarized data to the horizontal and vertical axes.
By searching for holes (disconnections) in the projected patterns, you are likely to recover almost all the boudnig boxes in your example.
Answer to update 2
At this point, you're back the my initial answer: SURF will be of no good here.
Instead, a standard way is to binarize each bounding box (to 0 - 1 depending on background/letter), normalize the bounding boxes to a standard size, and train a classifier from here.
There are several tutorials and blog posts on the web about how to do digit recognition using neural networks or SVM's, you just have to replace digits by your letters.
Your work is almost done! Training and using a classifier is tedious but straightforward.

Resources