I'm currently performing a research that involves identification of food items using image classification techniques, I'm well versed in the theories and maths of SVM, yet I'm completely lost when it comes to implementing it using Matlab.
I would like some guiding steps to perform full image classification of food, I believe it will involve color, texture, shape and size features. I just wanted to know where should I start?
Thank you very much
Related
I am new to deep learning. I was trying to understand the basics of image classification and followed some tutorials on MNIST data set classification. I saw various standard models used 224,224 as the image size. I got stuck at a point when it comes to image size which can be used for classification. Is it possible to use images as small as 4x4 to perform classification task in deep learning using ANN or other techniques? or is there any lower image dimensions limit which must be followed strictly? please guide me on this. Thanks in advance.
I am completing a project in college about robotic perception and the mathematics behind it. I am currently looking into computer vision as a means of robotic perception and image/video feed analysis. I have stumbled upon noise reduction filters such as the Median, Bilateral and Gaussian filter after some experimenting with the OpenCV library but wanted to know if applying these filters made it easier for the images to be analysed. For example, if I were to perform Houghlines on an image to find lines, would it be useful to reduce noise in an image beforehand? Are there any applications of noise reduction for computer vision? I cannot find anything online and imagine that it may have some uses but am not sure.
I am talking about ordinary photographs that a robot may take with a camera to then use for object recognition. For example, if a robot were to determine between two types of animals in an image based on certain mathematically specified criteria, would it be helpful to reduce the noise in an image so that the image and the animals are easier to detect? I understand that this also blurs the image and edges can be lost/reduced. If noise filters are not helpful in this type of scenario, are there any you can think of that it may be useful for in terms of enabling a robot to intepret an image more accurately.
I apologise if this is not clear, this area of knowledge is something I am unfamiliar with and is something I am pushing myself to understand to include in my project.
I am interested in the possibility of training a TensorFlow model to modify images, but I'm not quite sure where to get started. Almost all of the examples/tutorials dealing with images are for image classification, but I think I am looking for something a little different.
Image classification training data typically includes the images plus a corresponding set of classification labels, but I am thinking of a case of an image plus a "to-be" version of the image as the "label". Is this possible? Is it really just a classification problem in disguise?
Any help on where to get started would be appreciated. Also, the solution does not have to use TensorFlow, so any suggestions on alternate machine learning libraries would also be appreciated.
For example, lets say we want to train TensorFlow to draw circles around objects in a picture.
Example Inbound Image:
(source: pbrd.co)
Label/Expected Output:
(source: pbrd.co)
How could I accomplish that?
I can second that, its really hard to find information about Image modification with tensorflow :( But have a look here: https://affinelayer.com/pix2pix/
From my understanding, you do use a GAN, but insead of feeding the Input of the generator with random data during training, you use a sample Input.
Two popular ways (the ones that I know about) to make models generate/edit images are:
Deep Convolutional Generative Adversarial Networks
Back-Propagation through a pre-trained image classification model (in a similar manner to deep dream) but you can start from the final layer to feed back the wanted label and the gradient descent should be applied to the image only. This was explained in more details in the following course: CS231n (this lecture)
But I don't think they fit the circle around "3" example that you gave. I think object detection and instance segmentation would be more helpful. Detect the object you are looking for, extract its boundaries via segmentation and post-process it to make the circle that you wish for (or any other shape).
Reference for the images: Intro to Deep Learning for Computer Vision
I am doing research in the field of computer vision, and am working on a problem related to finding visually similar images to a query image. For example, finding t-shirts of similar colour with similar patterns (Striped/ Checkered), or shoes of similar colour and shape, and so on.
I have explored hand-crafted image features such as Color Histograms, Texture features, Shape features (Histogram of Oriented Gradients), SIFT and so on. I have also read up literature about Deep Neural Networks (Convolutional Neural Networks), which have been trained on massive amounts of data and are currently state of the art in Image Classification.
I was wondering if the same features (extracted from the CNN's) can also be used for my project - finding fine-grained similarities between images. From what I understand, the CNNs have learnt good representative features that can help classify images - for example, be it a red shirt or a blue shirt or an orange shirt, it is able to identify that the image is a shirt. However it doesn't understand that an orange shirt looks more similar to a red shirt than a blue shirt does, and hence it is not able to capture these similarities.
Please correct me if I am wrong. I would like to know if there are any Deep Neural Networks that capture these similarities, and have proven to be superior to the hand-crafted features. Thanks in advance.
For your task, a CNN is definitely worth a try!
Many researchers used networks which are pretrained for Image Classification and obtained state-of-the-art results on fine-grained classification. For example, trying to classify birds species or cars.
Now, your task is not classification, but it is related. You can think about similarity as some geometric distance between features, which are basically vectors. Thus, you may carry out some experiments computing the distance between the feature vectors for all your training images (the reference) and the feature vector extracted from the query image.
CNNs features extracted from the first layers of the net should be more related to color or other graphical traits, rather than more "semantical" ones.
Alternatively, there is some work on learning directly a similarity metric through CNN, see here for example.
A little bit out-dated, but it can still be useful for other people. Yes, CNNs can be used for image similarity and I used before. As Flavio pointed out, for a simple start, you can use a pre-trained CNN of your choice such as Alexnet,GoogleNet etc.. and then use it as feature extractor. You can compare the features based on the distance, similar pictures will have a smaller distance between their feature vectors.
I am developing a gesture recognition project. My goal is that the webcam captures my gestures and matches them with the existing gestures in my database. I have been able to capture hand gestures and store them in my project folder. Now, how exactly do i compare them? I am clueless about this part. I have gone through so many youtube links and most of them just show them how it works and none of them explains what algorithm they have used. I am completely stuck and all i want is some ideas or any possible link which can help me understand this matching part. Thanks
There are many different approaches that you can follow here.
If your images are of good quality, then you could detect feature points in your input image, and then match them with a "prior/template" representation of a similar gesture. This would be a brute-force search. Here, you can use SIFT to detect keypoints and generate descriptors for each image, and then match them based on the BFMatcher or FLANN. All of the above are implemented in OpenCV. Just read the documentation.
Docs here: detect/match
On the other hand, you could use a Bag-Of-Words approach. A good primer for that approach is here: BoW
You can use a classification machine learning algorithm like logistic regression.
This algorithm tries to minimize the cost function to predict a picture input similarity to all classes (all gestures in your case) and it'll pick the most similar class and give you that. for pictures you should use each pixel as a feature for your data.
After feeding your algorithm with enough training set it can classify your picture into one of the gestures, and as you said you are working with webcam images the running time wouldn't be that much.
Here is a great video for learning logistic regression by professor Andrew Ng of Stanford.