Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I have been doing image processing and machine learning course for couple of weeks . Everyday I learn something new about image manipulation and as well as how to train a system to recognize patterns in an image .
My question is in order to do successful image recognition what are the steps one has to follow ,for example denoising , use of LDA , PCA , then use of neural network . I am not looking for any algorithm , just a brief outline of each of the steps(5 -6) from capturing an image to test an input image for similarity.
P.S# To the mods , before labelling this question as not constructive , I know its not constructive , but I don't know which site to put this so . So please redirect me that site of stackexchange .
Thanks.
I would describe the pipeline as follows, and I omitted many bullet items.
acquire images with ground truth labeling.
amazon M-turk
image and label from flickr
compute feature for that image
direct stretch the image as a column vector
use more complicated features such as bag of words, LBP.
post-process the features to reduce the effect of noise if needed
sparse coding
max pooling
whitening
train a classifier/regressor given the (feature,label) pair
SVM
boosting
neural network
random forest
spectral clustering
heuristic methods...
use the trained model to recognize the unseen images and evaluate the result with some metrics.
BTW. traditionally, we will use dimension reduction methods such as PCA to make the problem tractable, but recent research seems cares nothing about that.
Few months back I developed a face recognition system using local binary pattern.
In this method I first took a image either from local storage or camera, then using local binary pattern method I considered each block of the input image. After getting LBP of input image, I found chi-square distance for lbp feature histogram. comparing its value with the stored database image using same process. I was able to get same face.
amazon M-turk is a service to make people do work for you. (and you pay them)
SIFT is a descriptor for interest points. By comparing those descriptors, you can find the correspondence between images. (SIFT fits into step 2.)
when doing step 4. you can choose to combines the result of different classifier, or simply trust the result of one classifier. That depends on situation.
Are you going to label the location of affected region? I am not sure what you are going to do.
couldn't use comment so I post another answer.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Say, I want to train a CNN to detect whether an image is a car or not.
What are some best practices or methods to choosing the "Not-Car" dataset?
Because this dataset could potentially be infinite (basically anything that is not a car) - is there a guideline on how big the dataset needs to be? Should they contain objects which are very similar to cars, but are not (planes, boats, etc.)?
Like in all of supervised machine learning, the training set should reflect the real distribution that the model is going to work with. Neural network is basically a function approximator. Your actual goal is to approximate the real-world distribution, but in practice it's only possible to get the sample from it, and this sample is the only thing a neural network will see. For any input way outside of the training manifold, the output will be a just a guess (see also this discussion on AI.SE).
So when choosing a negative dataset, the first question you should answer is: What will be the likely use-case of this model? E.g., if you're building an app for a smartphone, then the negative sample should probably include street views, pictures of buildings and stores, people, indoor environment, etc. It's unlikely that the image from the smartphone camera will be a wild animal or abstract painting, i.e., it's an improbable input in your real distribution.
Including images that look like a positive class (trucks, airplanes, boats, etc) is a good idea, because the low-conv-layer features (edges, corners) will be very similar and it's important that the neural network learned important high-level features correctly.
In general, I'd use 5-10x more negative images that positive ones. CIFAR-10 is a good starting point: out of 50000 training images 5000 are the cars, 5000 are the planes, etc. In fact, building a 10-class classifier is not a bad idea. In this case, you'll transform this CNN to a binary classifier by thresholding its certainty that the inferred class is a car. Anything that the CNN isn't certain about will be interpreted as not a car.
I think the negative sample should be selected depend on the occasion your model works on. If your model works on the street as a car detector, the reasonable negative sample should be street road background, trees, pedestrian,and other vehicle that commone in street. So i think there is not a universal negative sample select rules but only depend on your need.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
I have an ML model that takes X seconds to detect an object in an image on which it is trained. Does that mean it took at least X or X+Y seconds during training per image? Can you provide a detailed insight?
For instance, Assume the training speed of SSD512 model is 30 images per second on a hardware platform, Does this imply that I will be able to achieve the inference speed of at least (if not more) than 30 images per second?
The question is not confined to neural network models. A generic insight is appreciated. I am dealing with Cascade Classifiers in my case. I am loading a cascade.xml trained model to detect an object. I want to know the relation between the time taken to train an image and the time taken to detect an object after loading the trained model.
Because unstated, I assume here you mean neural network ML model.
The training process could be seen as two steps: running the network to detect the object and updating the weights to minimize the loss function.
Running the network: while training, the backpropagation part you essentially run the network as if you are detecting the object using the current network weights, which take X time as you stated. It should take the same as when used after the training, for example on the test dataset (to make things simple I am ignoring the mini-batch learning usually used, which might change things).
Updating the weights: this part in the training is done by the completing the backpropagation algorithm which tells you how changing the weights will affect your detection performance (i.e. lower the loss function for the current image) then usually a stochastic gradient descent iteration is done, which updates the weights. This is the Y you stated, which in fact could be bigger than X.
These two parts are done for every image (more commonly, for every mini-batch) in the training process.
UPDATE: You said in your response that you are looking for an answer for a generic algorithm. It is an interesting question! When looking on the training task, you always need to learn some kind of weights W that is the outcome of the training process and is the essence of what was learned. The update needs to make the learned function better, which basically sounds harder than simply running the function. I really don't know of any algorithm (certainly not the commonly used ones) that would take less training time than running time per image, but it might be theoretically possible.
How to perform image classification from mahout? How to convert the image to a form which is accepted by mahout classification algorithms? Is the any starter code to start with? Please share me some starter tutorials. Is mahout good library for image classification?
There are two answers to your question:
The simple answer is that from a Mahout point of view classifying images is no different than classifying any other type of data. You find a suitable set of features to describe your data, and then: train, validate, test, and deploy.
The second answer is a bit more involved, and I'm going to summarize. In the case of images the step in which you compute a suitable set of features spans a whole research area (called computer vision). There are many methods: DHOG, direction of gradient, SURF, SIFT, etc. Depending on the images and what your expectations are, you may obtain reasonable results just using an existing method, or maybe not. It would be impossible to say without looking at your images and you telling us your objectives.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
My image dataset is from http://www.image-net.org. There are various synsets for different things like flora, fauna, persons, etc
I have to train a classifier which predicts 1 if the image belongs to floral synset and 0, otherwise.
Images belonging to floral synset can be viewed at http://www.image-net.org/explore, by clicking on the plant, flora, plant life option in the left pane.
These images include wide variety of flora - like trees, herbs, shrubs, flowers etc.
I am not able to figure out what features to use to train the classifier. There is a lot of greenery in these images, but there are many flower images, which don't have much green component. Another feature is the shape of the leaves and the petals.
It would be helpful if anyone could suggest how to extract this shape feature and use it to train the classifier. Also suggest what other features could be used to train the classifier.
And after extracting features, which algorithm is to be used to train the classifier?
Not sure that shape information is the approach for the data set you have linked to.
Just having a quick glance at some of the images I have a few suggestions for classification:
Natural scenes rarely have straight lines - Line detection
You can discount scenes which have swathes of "unnatural" colour in them.
If you want to try something more advanced I would suggest that a hybrid between entropy/pattern recognition would form a good classifier as natural scenes have alot of both.
Attempting template-matching/shape matching for leaves/petals will break your heart - you need to use something much more generalised.
As for which classifier to use... I'd normally advise K-means initially and once you have some results determine if the extra effort to implement Bayes or a Neural Net would be worth it.
Hope this helps.
T.
Expanded:
"Unnatural Colors" could be highly saturated colours outside of the realms of greens and browns. They are good for detecting nature scenes as there should be ~50% of the scene in the green/brown spectrum even if a flower is at the center of it.
Additionally straight line detection should yield few results in nature scenes as straight edges are rare in nature. On a basic level generate an edge image, Threshold it and then search for line segments (pixels which approximate a straight line).
Entropy requires some Machine Vision knowledge. You would approach the scene by determining localised entropys and then histogramming the results here is a similar approach that you will have to use.
You would want to be advanced at Machine Vision if you are to attempt pattern recognition as this is a difficult subject and not something you can throw up in code sample. I would only attempt to implement these as a classifier once colour and edge information(lines) has been exhausted.
If this is a commercial application then a MV expert should be consulted. If this is a college assignment (unless it is a thesis) colour and edge/line information should be more than enough.
HOOG features are pretty much the de-facto standard for these kinds of problems, I think. They're a bit involved to compute (and I don't know what environment you're working in) but powerful.
A simpler solution which might get you up and running, depending on how hard the dataset is, is to extract all overlapping patches from the images, cluster them using k-means (or whatever you like), and then represent an image as a distribution over this set of quantised image patches for a supervised classifier like an SVM. You'd be surprised how often something like this works, and it should at least provide a competitive baseline.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Does anyone know of recent academic work which has been done on logo recognition in images?
Please answer only if you are familiar with this specific subject (I can search Google for "logo recognition" myself, thank you very much).
Anyone who is knowledgeable in computer vision and has done work on object recognition is welcome to comment as well.
Update:
Please refer to the algorithmic aspects (what approach you think is appropriate, papers in the field, whether it should work(and has been tested) for real world data, efficiency considerations) and not the technical sides (the programming language used or whether it was with OpenCV...)
Work on image indexing and content based image retrieval can also help.
You could try to use local features like SIFT here:
http://en.wikipedia.org/wiki/Scale-invariant_feature_transform
It should work because logo shape is usually constant, so extracted features shall match well.
The workflow will be like this:
Detect corners (e.g. Harris corner detector) - for Nike logo they are two sharp ends.
Compute descriptors (like SIFT - 128D integer vector)
On training stage remember them; on matching stage find nearest neighbours for every feature in the database obtained during training. Finally, you have a set of matches (some of them are probably wrong).
Seed out wrong matches using RANSAC. Thus you'll get the matrix that describes transform from ideal logo image to one where you find the logo. Depending on the settings, you could allow different kinds of transforms (just translation; translation and rotation; affine transform).
Szeliski's book has a chapter (4.1) on local features.
http://research.microsoft.com/en-us/um/people/szeliski/Book/
P.S.
I assumed you wanna find logos in photos, for example find all Pepsi billboards, so they could be distorted. If you need to find a TV channel logo on the screen (so that it is not rotated and scaled), you could do it easier (pattern matching or something).
Conventional SIFT does not consider color information. Since logos usually have constant colors (though the exact color depends on lightning and camera) you might want to consider color information somehow.
We worked on logo detection/recognition in real-world images. We also created a dataset FlickrLogos-32 and made it publicly available, including data, ground truth and evaluation scripts.
In our work we treated logo recognition as retrieval problem to simplify multi-class recognition and to allow such systems to be easily scalable to many (e.g. thousands) logo classes.
Recently, we developed a bundling technique called Bundle min-Hashing that aggregates spatial configurations of multiple local features into highly distinctive feature bundles. The bundle representation is usable for both retrieval and recognition. See the following example heatmaps for logo detections:
You will find more details on the internal operations, potential applications of the approach, experiments on its performance and of course also many references to related work in the papers [1][2].
Worked on that: Trademark matching and retrieval in sports video databases
get a PDF of the paper: http://scholar.google.it/scholar?cluster=9926471658203167449&hl=en&as_sdt=2000
We used SIFT as trademark and image descriptors, and a normalized threshold matching to compute the distance between models and images. In our latest work we have been able to greatly reduce computation using meta-models, created evaluating the relevance of the SIFT points that are present in different versions of the same trademark.
I'd say that in general working with videos is harder than working on photos due to the very bad visual quality of the TV standards currently used.
Marco
I worked on a project where we had to do something very similar. At first I tried using Haar Training techniques using this software
OpenCV
It worked, but was not an optimal solution for our needs. Our source images (where we were looking for the logo) were a fixed size and only contained the logo. Because of this we were able to use cvMatchShapes with a known good match and compare the value returned to deem a good match.