Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
My image dataset is from http://www.image-net.org. There are various synsets for different things like flora, fauna, persons, etc
I have to train a classifier which predicts 1 if the image belongs to floral synset and 0, otherwise.
Images belonging to floral synset can be viewed at http://www.image-net.org/explore, by clicking on the plant, flora, plant life option in the left pane.
These images include wide variety of flora - like trees, herbs, shrubs, flowers etc.
I am not able to figure out what features to use to train the classifier. There is a lot of greenery in these images, but there are many flower images, which don't have much green component. Another feature is the shape of the leaves and the petals.
It would be helpful if anyone could suggest how to extract this shape feature and use it to train the classifier. Also suggest what other features could be used to train the classifier.
And after extracting features, which algorithm is to be used to train the classifier?
Not sure that shape information is the approach for the data set you have linked to.
Just having a quick glance at some of the images I have a few suggestions for classification:
Natural scenes rarely have straight lines - Line detection
You can discount scenes which have swathes of "unnatural" colour in them.
If you want to try something more advanced I would suggest that a hybrid between entropy/pattern recognition would form a good classifier as natural scenes have alot of both.
Attempting template-matching/shape matching for leaves/petals will break your heart - you need to use something much more generalised.
As for which classifier to use... I'd normally advise K-means initially and once you have some results determine if the extra effort to implement Bayes or a Neural Net would be worth it.
Hope this helps.
T.
Expanded:
"Unnatural Colors" could be highly saturated colours outside of the realms of greens and browns. They are good for detecting nature scenes as there should be ~50% of the scene in the green/brown spectrum even if a flower is at the center of it.
Additionally straight line detection should yield few results in nature scenes as straight edges are rare in nature. On a basic level generate an edge image, Threshold it and then search for line segments (pixels which approximate a straight line).
Entropy requires some Machine Vision knowledge. You would approach the scene by determining localised entropys and then histogramming the results here is a similar approach that you will have to use.
You would want to be advanced at Machine Vision if you are to attempt pattern recognition as this is a difficult subject and not something you can throw up in code sample. I would only attempt to implement these as a classifier once colour and edge information(lines) has been exhausted.
If this is a commercial application then a MV expert should be consulted. If this is a college assignment (unless it is a thesis) colour and edge/line information should be more than enough.
HOOG features are pretty much the de-facto standard for these kinds of problems, I think. They're a bit involved to compute (and I don't know what environment you're working in) but powerful.
A simpler solution which might get you up and running, depending on how hard the dataset is, is to extract all overlapping patches from the images, cluster them using k-means (or whatever you like), and then represent an image as a distribution over this set of quantised image patches for a supervised classifier like an SVM. You'd be surprised how often something like this works, and it should at least provide a competitive baseline.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
Classification problems can exhibit a strong label imbalance in the given dataset. This can be overcome by subsampling certain class weight attributed weights, which allow for balancing the label distributions at least during model training. Stratification on the other hand will allow for keeping a certain label distribution, which stays for every respective fold.
For a regression problem this is by standard libaries e.g. scikit-learn not defined. There are few approaches to cover stratification and a well written theoretical approach for regression subsampling by Scott Lowe here.
I am wondering why label balancing for regression instead of classification problems has so few attention in the Machine Learning community? Regression problems also exhibit different characteristica that might be easier / harder acquired in a data collection setting. And then, is there any framework or paper that further addresses this issue?
The complexity of the problem lies in the continuous nature of regression. When you have the classification, it is very natural to split them into classes because they are basically already split into classes :) Now, if you have a regression, the number of possibilities to split is basically infinite and most importantly, it is just impossible to know what a good split would be. As in the article you sent, you might apply sorted or fractional approaches but in the end, you have no idea to what extent they would be correct. You can also split it into intervals. This is what the stack library does. In the documentation, it says: "For continuous target variable overstock uses binning and categoric split based on bins". What they do is, they first assign the continuous values to bins(classes) and then they apply stratification on them.
There are not many studies on this because everything you can come up with is going to be a heuristic. However, there can be exceptions if you can incorporate some domain knowledge. As an example, let's say that you are trying to predict the frequency of some electromagnetic waves from some set of features. In that case, you have prior knowledge of how the wave frequencies are split. ( https://en.wikipedia.org/wiki/Electromagnetic_spectrum) So now it is natural to split them into continuous intervals with respect to their wavelengths and do a regression stratification. But otherwise, it is hard to come with something that would generalize.
I personally never encountered a study on this.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Say, I want to train a CNN to detect whether an image is a car or not.
What are some best practices or methods to choosing the "Not-Car" dataset?
Because this dataset could potentially be infinite (basically anything that is not a car) - is there a guideline on how big the dataset needs to be? Should they contain objects which are very similar to cars, but are not (planes, boats, etc.)?
Like in all of supervised machine learning, the training set should reflect the real distribution that the model is going to work with. Neural network is basically a function approximator. Your actual goal is to approximate the real-world distribution, but in practice it's only possible to get the sample from it, and this sample is the only thing a neural network will see. For any input way outside of the training manifold, the output will be a just a guess (see also this discussion on AI.SE).
So when choosing a negative dataset, the first question you should answer is: What will be the likely use-case of this model? E.g., if you're building an app for a smartphone, then the negative sample should probably include street views, pictures of buildings and stores, people, indoor environment, etc. It's unlikely that the image from the smartphone camera will be a wild animal or abstract painting, i.e., it's an improbable input in your real distribution.
Including images that look like a positive class (trucks, airplanes, boats, etc) is a good idea, because the low-conv-layer features (edges, corners) will be very similar and it's important that the neural network learned important high-level features correctly.
In general, I'd use 5-10x more negative images that positive ones. CIFAR-10 is a good starting point: out of 50000 training images 5000 are the cars, 5000 are the planes, etc. In fact, building a 10-class classifier is not a bad idea. In this case, you'll transform this CNN to a binary classifier by thresholding its certainty that the inferred class is a car. Anything that the CNN isn't certain about will be interpreted as not a car.
I think the negative sample should be selected depend on the occasion your model works on. If your model works on the street as a car detector, the reasonable negative sample should be street road background, trees, pedestrian,and other vehicle that commone in street. So i think there is not a universal negative sample select rules but only depend on your need.
I started learning Image Recognition a few days back and I would like to do a project in which it need to identify different brand logos in Android.
For Ex: If I take a picture of Nike logo in an Android device then it needs to display "Nike".
Low computational time would be the main criteria for me.
For this, I have done some work and started learning OpenCV sample examples.
What would be the best Image Recognition that would be used for me.
1) I came to know from Template Matching that their applicability is limited mostly by the available computational power, as identification of big and complex templates can be time consuming. (and so I don't want to use it)
2) Feature Based detectors like SIFT/SURF/STAR (As per my knowledge this would be a better option for me)
3) How about Deep Learning and Pattern recognition concepts? (I was digging on this and don't know whether it would be an option for me). Can any of you let me know whether I can use this and why it would be an better choice for me when compared with 1 and 2.
4) Haar caascade classifiers (From one of the posts in SO, I came to know that by using Haar it doesn't work in Rotation and Scale invariant and so I haven't concentrated much on this). Does this been a better Option for me If I focus up on.
I’m now running one of my pet projects and it's required face recognition – detecting the area with face on the photo, if it exists with Raspberry pi, so I’ve done some analysis about that task
And I found this approach. The key idea is in avoiding scanning entire picture to help by scanning windows of different sizes like it was in OpenCV, but by dividing an entire photo into 49 (7x7) squares and train the model not only for detecting of presenting one of classes inside each square, but also for determining the location and size of detecting object
It’s only 49 runs of trained model, so I think it's possible to execute this in less than in a second even on non state-of-the-art smartphones. Anyway, it will be a trade-off between accuracy and performance
About the model
I will use vgg –like model, probably a bit simpler than even vgg11A.
In my case ready dataset already exists. So I can train convolutional network with it
Why deep learning approach is better than 1-3 you mentioned? Because of its higher accuracy for such kind of tasks. It’s practically proven. You could check it in kaggle. Majority of the top models for classification competitions are based on convolutional networks
The only disadvantage for you – probably it would be necessary create your own dataset to train the model
Here is a post that I think can be useful for you: Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition. Another one: Logo recognition in images.
2) Feature Based detectors like SIFT/SURF/STAR (As per my knowledge
this would be a better option for me)
Just remember that SIFT and SURF are both patented so you will need a license for any commercial use (free for non-commercial use).
4) Haar caascade classifiers (From one of the posts in SO, I came to know that by using Haar it doesn't work in Rotation and Scale invariant and so I haven't concentrated much on this). Does this been a better Option for me If I focus up on.
It works (if I understand your question right), much of this depends of how you trained your classifier. You could train it to detect all kind of rotations and scales. Anyways, I would discourage you to go for this option as I think the other possible solutions are better meant for the case.
4 years ago I posted this question and got a few answers that were unfortunately outside my skill level. I just attended a build tour conference where they spoke about machine learning and this got me thinking of the possibility of using ML as a solution to my problem. i found this on the azure site but i dont think it will help me because its scope is pretty narrow.
Here is what i am trying to achieve:
i have a source image:
and i want to which one of the following symbols (if any) are contained in the image above:
the compare needs to support minor distortion, scaling, color differences, rotation, and brightness differences.
the number of symbols to match will ultimately at least be greater than 100.
is ML a good tool to solve this problem? if so, any starting tips?
As far as I know, Project Oxford (MS Azure CV API) wouldn't be suitable for your task. Their APIs are very focused to Face related tasks (detection, verification, etc), OCR and Image description. And apparently you can't extend their models or train new ones from the existing ones.
However, even though I don't know an out of the box solution for your object detection problem; there are easy enough approaches that you could try and that would give you some start point results.
For instance, here is a naive method you could use:
1) Create your dataset:
This is probably the more tedious step and paradoxically a crucial one. I will assume you have a good amount of images to work with. What would you need to do is to pick a fixed window size and extract positive and negative examples.
If some of the images in your dataset are in different sizes you would need to rescale them to a common size. You don't need to get too crazy about the size, probably 30x30 images would be more than enough. To make things easier I would turn the images to gray scale too.
2) Pick a classification algorithm and train it:
There is an awful amount of classification algorithms out there. But if you are new to machine learning I will pick the one I would understand the most. Keeping that in mind, I would check out logistic regression which give decent results, it's easy enough for starters and have a lot of libraries and tutorials. For instance, this one or this one. At first I would say to focus in a binary classification problem (like if there is an UD logo in the picture or not) and when you master that one you can jump to the multi-class case. There are resources for that too or you can always have several models one per logo and run this recipe for each one separately.
To train your model, you just need to read the images generated in the step 1 and turn them into a vector and label them accordingly. That would be the dataset that will feed your model. If you are using images in gray scale, then each position in the vector would correspond to a pixel value in the range 0-255. Depending on the algorithm you might need to rescale those values to the range [0-1] (this is because some algorithms perform better with values in that range). Notice that rescaling the range in this case is fairly easy (new_value = value/255).
You also need to split your dataset, reserving some examples for training, a subset for validation and another one for testing. Again, there are different ways to do this, but I'm keeping this answer as naive as possible.
3) Perform the detection:
So now let's start the fun part. Given any image you want to run your model and produce coordinates in the picture where there is a logo. There are different ways to do this and I will describe one that probably is not the best nor the more efficient, but it's easier to develop in my opinion.
You are going to scan the picture, extracting the pixels in a "window", rescaling those pixels to the size you selected in step 1 and then feed them to your model.
If the model give you a positive answer then you mark that window in the original image. Since the logo might appear in different scales you need to repeat this process with different window sizes. You also would need to tweak the amount of space between windows.
4) Rinse and repeat:
At the first iteration it's very likely that you will get a lot of false positives. Then you need to take those as negative examples and retrain your model. This would be an iterative process and hopefully on each iteration you will have less and less false positives and fewer false negatives.
Once you are reasonable happy with your solution, you might want to improve it. You might want to try other classification algorithms like SVM or Deep Learning Artificial Neural Networks, or to try better object detection frameworks like Viola-Jones. Also, you will probably need to use crossvalidation to compare all your solutions (you can actually use crossvalidation from the beginning). By this moment I bet you would be confident enough that you would like to use OpenCV or another ready to use framework in which case you will have a fair understanding of what is going on under the hood.
Also you could just disregard all this answer and go for an OpenCV object detection tutorial like this one. Or take another answer from another question like this one. Good luck!
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I have been doing image processing and machine learning course for couple of weeks . Everyday I learn something new about image manipulation and as well as how to train a system to recognize patterns in an image .
My question is in order to do successful image recognition what are the steps one has to follow ,for example denoising , use of LDA , PCA , then use of neural network . I am not looking for any algorithm , just a brief outline of each of the steps(5 -6) from capturing an image to test an input image for similarity.
P.S# To the mods , before labelling this question as not constructive , I know its not constructive , but I don't know which site to put this so . So please redirect me that site of stackexchange .
Thanks.
I would describe the pipeline as follows, and I omitted many bullet items.
acquire images with ground truth labeling.
amazon M-turk
image and label from flickr
compute feature for that image
direct stretch the image as a column vector
use more complicated features such as bag of words, LBP.
post-process the features to reduce the effect of noise if needed
sparse coding
max pooling
whitening
train a classifier/regressor given the (feature,label) pair
SVM
boosting
neural network
random forest
spectral clustering
heuristic methods...
use the trained model to recognize the unseen images and evaluate the result with some metrics.
BTW. traditionally, we will use dimension reduction methods such as PCA to make the problem tractable, but recent research seems cares nothing about that.
Few months back I developed a face recognition system using local binary pattern.
In this method I first took a image either from local storage or camera, then using local binary pattern method I considered each block of the input image. After getting LBP of input image, I found chi-square distance for lbp feature histogram. comparing its value with the stored database image using same process. I was able to get same face.
amazon M-turk is a service to make people do work for you. (and you pay them)
SIFT is a descriptor for interest points. By comparing those descriptors, you can find the correspondence between images. (SIFT fits into step 2.)
when doing step 4. you can choose to combines the result of different classifier, or simply trust the result of one classifier. That depends on situation.
Are you going to label the location of affected region? I am not sure what you are going to do.
couldn't use comment so I post another answer.