How do I classify the number of cars in an image from a 4k aerial camera attached to DJI drone at 400 feet ? Will not allow me to attach an image to my post.
You can use the TensorFlow Object Detection API. You will find some pretrained models on datasets such as COCO. Definitely they should be able to recognize cars but I have no idea whether that would work with aerial pictures where you would only see the top of the cars. You can always generate yourself a dataset and fine tune a pretrained model.
Related
I have a collection of face images, with 1 or sometimes 2 faces in each image. What I wanna do, is find the face in each image and then crop It.
I've tested a couple of methods, which are implemented in python using openCV, but the results weren't that good. These methods are:
1- Implementation 1
2- Implementation 2
There's one more model that I've tested, but I'm not allowed to post more than two links.
The problem is that these Haar-Feature based algorithms, are not robust to face size, and when I tried them on images which were taken close to the face, they couldn't find any faces.
Someone mentioned to try deep learning based algorithms, but I couldn't find one corresponding to what I want to do. Basically, I guess I need a pre-trained model, which can give me the coordinates of the face bounding box in the image, or better, a pre-trained model which gives out the cropped face image as output.
You don't need machine learning algorithms, Graph-Algorithms is enough. For example Snapchats face recognition algorithm works as follows:
Create a Graph with Nodes and Edges from a most common Face ("Standard Face").
Deform that Graph / Recoordinate the Nodes to the fitted pixels in the Input Image
voila you got the face recognized in the Input Image.
Easy said, but harder to code. We implemented in our university the Dijkstra Algorithm for example and I can hand you my "Graph" Class if you need it. But I wrote it in C++.
With these graph-algorithm you can crop out the faces more efficient.
I'm trying to train a model to detect the basic shapes like Circle, Square, Rectangle, etc. using Tensorflow. What would be the best input data set? To load the shapes directly or to find the edge of the image using OpenCV and load only the edge image.
We can detect shapes using OpenCV too. What would be the added advantage to use Machine Learning.
Sample images given for training the model.
I would recommend starting with this guide for doing classification, not object detection:
https://kiosk-dot-codelabs-site.appspot.com/codelabs/tensorflow-for-poets/#0
Classification is for one unique tag for one picture (99% square, 1%circle). Object Detection is for classification of several objects within the picture (x_min=3,y_min=8,x_max=20,y_max30, 99% square). Your case looks more like a classification problem.
You don't need the full Docker installation as in the guide.
If you have Python 3.6 on your system, you can just do:
pip install tensorflow
And then jump to "4. Retrieving the images"
I had to try it out myself, so I downloaded the first 100 pictures of squares and circles from Google with the add-on "fatkun batch download image" from Chrome Web Store.
On my first 10 tests I get accuracy between 92,0% (0.992..) and 99,58%. If your examples are more uniform than a lot of different pictures from Google, you will probably get better results.
You may want to checkout objective detection in tensorflow.
https://github.com/tensorflow/models/tree/master/research/object_detection
There is a pre-trained model here
http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_11_06_2017.tar.gz
One potential advantage of using neural nets to do the detection is that it can reduce the cpu cycles to calculate. This is useful on mobile devices.
For example - the Hough transform https://en.wikipedia.org/wiki/Hough_transform is too expensive to calculate / but if a convolutional neural net was used instead - more possibilities open up for real time image processing.
To actually train a new model - see here https://www.tensorflow.org/tutorials/deep_cnn
I'm trying to use the hog detector in openCV, to detect 3 types of object from a video feed through a fish eye. The types are:
People
Books (when held by some person)
Chairs
The snapshot of the video I have looks like this image from this website - :
I setup the hog classifier using the default people detector and tried do first detect the people. I noticed when the people were of the size that you would expect from a non-fish eye lens (something you would get with a standard 35mm lens), they would get detected. If not the people would not get detected. This seemed logical as the classifier would expect people to be a standard size.
I was wondering how I could modify the classifier to detect people thorough a fish eye lens. The options I see are these:
Undistort the fish eye effect and run the classifier - I do not like to do this, because currently, I'm not in a position to calibrate the camera and get the distortion coefficients
Distort people images from a people image data set to around the distortion I would get through my video and re-train the classifier - I think this would work, but would like to understand would this not work as I think it work.
My question is:
What would be a valid approach for this problem? Will #2 of my options work for all 3 types of objects (people, books and chairs).
What is good classifier that can be trained to identify the 3 types of objects (cascade or hog or anything else - please suggest a library as well)? Will my #2 method of distorting and training with positive and negative examples be a good solution?
Retraining the HOG cascade to the performance level of the cascade included with OpenCV would be a pretty involved process. You would also have to simulate the distortion of your specific lens to modify the training data.
For the quickest solution I would recommend your first option of distorting the image. If you are willing to put in the time and resources to retrain the classifier (which you may have to do depending on how you are detecting chairs and books) then there are some publicly available pedestrian datasets that will be useful.
1) http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/
2) http://pascal.inrialpes.fr/data/human/
Its unlikely that you'll be able to find a chair cascade due to the variability in chair design. I would recommend you train your own cascade on the specific chairs you intend to detect. I don't know of any existing cascade for books and a quick google search didn't yield any promising results. A good resource for data if you intend on training your own cascade for books is ImageNet.
I am working on an image classification problem where I should be able to classify an image as say a watch with a rectangular dial/ a watch with a circular dial/ a shoe etc..
I have looked into Content Based Image Retrieval (using Dense SIFT for feature detection and Bag of Words + SVM for classification) and am currently exploring Convolutional Neural Networks (Unsupervised Feature Learning).
My problem is that the image is a photo taken from a camera and hence contains other elements (not there in training data). For example, my training data for watches with rectangular dials contains only the watch whereas my test image has the watch and a portion of the hand as well or my test image of a shoe has the shoe oriented in a different direction (when compared with the training data for shoes).
How do I address this issue?
Is CNN (Unsupervised Feature Learning) the correct approach or should I stick to D-SIFT + BOW + SVM?
How do I collect appropriate training data?
Thank You
I am doing a project in computer vision and I need some help.
The objective of my project is to extract the attributes of any object - for example if I have a Nike running shoe, I should be able to figure out that it is a shoe in the first place, then figure out that it is a Nike shoe and not an Adidas shoe (possibly because of the Nike tick) and then figure out that it is a running shoe and not football studs.
I have started off by treating this as an image classification problem and I am using the following steps:
I have taken training samples (around 60 each) of say shoes, heels, watches and extracted their features using Dense SIFT.
Creating a vocabulary using k-means clustering (arbitrarily chosen the vocabulary size to be 600).
Creating a Bag-Of-Words representation for the images.
Training an SVM classifier to obtain a bag-of-words (feature vector) for every class (shoe,heel,watch).
For testing, I extracted the feature vector for the test image and found its bag-of-words representation from the already created vocabulary.
I compared the bag-of-words of the test image with that of each class and returned the class which matched closest.
I would like to know how I should proceed from here? Will feature extraction using D-SIFT help me identify the attributes as it only represents the gradient around certain points?
And sometimes, my classification goes wrong, for example if I have trained the classifier with the images of a left shoe, and a watch, a right shoe is classified as a watch. I understand that I have to include right shoes in my training set to solve this problem, but is there any other approach that I should follow?
Also is there any way to understand the shape? For example if I have trained the classifier for watches, and there are watches with both circular and rectangular dials in the training set, can I identify the shape of any new test image? Or do I simply have train it separately for watches with circular and rectangular dials?
Thanks