I want to apply SVM on audio data det. I am extarcting difftrent features from the speech signal. After reducing the dimention of this matrix, I am still getting a features in matix form. Can anyone help me regarding the data formating
should i have to convert the feature matix in a row vector? Can i assign same label to each row of one feature matrix and other label to the rows of other matrix?
Little bit ambiguous question but let me try to resolve your problem. For feature selection, you can use filter method, wrapper method etc. One popularly used method is principle component analysis. Once you select your feature you can directly feed them to the classifier. In your case, i guess you are getting lower dimensional representation of your training data (for example, if you have used SVD). In this case, its fine, now you can use it for SVM classification.
What did you mean by adding label to feature matrix? You can add label to the training instances, not the features. I guess you are talking about separate matrix for each of the class labels. If that is the case, yes you can use as you want but remember it depends on the model design.
Related
I have many evolution curves (on time), of a system as images.
These evolution curves are plotted when the system behave in a normal way ('ok').
I want to train a model, which learn the shapes of the curves (or parts of the shapes) when it behave in a normal way, so it will be able to classify new curves to normal (or abnormal).
Any ideas of the model to use, or how to proceed ?
Thank you
You can perform PCA, and then classify. Also look for functional data analysis
Here is a nice getting started guide with PCA
You can start with labeling (annotating) the images. The label can be as Normal/ Not Normal as 0/1 or as many classes you want to divide the data into.
Since it's a chart so the orientation is important, a wrong orientation can destroy the meaning of the image.
So make an algorithm which always orient the chart in the same way while reading.
Now that the labeling is done you need to train these images for correct classification.
Augment the data if needed
Find a image classification model
Use the trained weights
feed you images and annotations in the desired format
Train the model
Check for the output error or classification errors.
Create an evaluation matrix like confusion matrix in case of classification.
If the model is right and training is properly done you will get good accuracy.
Otherwise repeat the steps.
This is just an overview, with this you can start towards your goal.
I know that feature selection helps me remove features that may have low contribution. I know that PCA helps reduce possibly correlated features into one, reducing the dimensions. I know that normalization transforms features to the same scale.
But is there a recommended order to do these three steps? Logically I would think that I should weed out bad features by feature selection first, followed by normalizing them, and finally use PCA to reduce dimensions and make the features as independent from each other as possible.
Is this logic correct?
Bonus question - are there any more things to do (preprocess or transform)
to the features before feeding them into the estimator?
If I were doing a classifier of some sort I would personally use this order
Normalization
PCA
Feature Selection
Normalization: You would do normalization first to get data into reasonable bounds. If you have data (x,y) and the range of x is from -1000 to +1000 and y is from -1 to +1 You can see any distance metric would automatically say a change in y is less significant than a change in X. we don't know that is the case yet. So we want to normalize our data.
PCA: Uses the eigenvalue decomposition of data to find an orthogonal basis set that describes the variance in data points. If you have 4 characteristics, PCA can show you that only 2 characteristics really differentiate data points which brings us to the last step
Feature Selection: once you have a coordinate space that better describes your data you can select which features are salient.Typically you'd use the largest eigenvalues(EVs) and their corresponding eigenvectors from PCA for your representation. Since larger EVs mean there is more variance in that data direction, you can get more granularity in isolating features. This is a good method to reduce number of dimensions of your problem.
of course this could change from problem to problem, but that is simply a generic guide.
Generally speaking, Normalization is needed before PCA.
The key to the problem is the order of feature selection, and it's depends on the method of feature selection.
A simple feature selection is to see whether the variance or standard deviation of the feature is small. If these values are relatively small, this feature may not help the classifier. But if you do normalization before you do this, the standard deviation and variance will become smaller (generally less than 1), which will result in very small differences in std or var between the different features.If you use zero-mean normalization, the mean of all the features will equal 0 and std equals 1.At this point, it might be bad to do normalization before feature selection
Feature selection is flexible, and there are many ways to select features. The order of feature selection should be chosen according to the actual situation
Good answers here. One point needs to be highlighted. PCA is a form of dimensionality reduction. It will find a lower dimensional linear subspace that approximates the data well. When the axes of this subspace align with the features that one started with, it will lead to interpretable feature selection as well. Otherwise, feature selection after PCA, will lead to features that are linear combinations of the original set of features and they are difficult to interpret based on the original set of features.
I'm interested in taking advantage of some partially labeled data that I have in a deep learning task. I'm using a fully convolutional approach, not sampling patches from the labeled regions.
I have masks that outline regions of definite positive examples in an image, but the unmasked regions in the images are not necessarily negative - they may be positive. Does anyone know of a way to incorporate this type of class in a deep learning setting?
Triplet/contrastive loss seems like it may be the way to go, but I'm not sure how to accommodate the "fuzzy" or ambiguous negative/positive space.
Try label smoothing as described in section 7.5.1 of Deep Learning book:
We can assume that for some small constant eps, the training set label y is correct with probability 1 - eps, and otherwise any of the other possible labels might be correct.
Label smoothing regularizes a model based on a softmax with k output values by replacing the hard 0 and 1 classification targets with targets of eps / k and 1 - (k - 1) / k * eps, respectively.
See my question about implementing label smoothing in Pandas.
Otherwise if you know for sure, that some areas are negative, other are positive while some are uncertain, then you can introduce a third uncertain class. I have worked with data sets that contained uncertain class, which corresponded to samples that could belong to any of the available classes.
I'm assuming that you are struggling with a data segmantation task with a problem of a ill-definied background (e.g. you are not sure if all examples are correctly labeled). Recently I came across the similiar problem and this is what I came across during my research:
In old days before deep learning and at the begining of deep learning era - the common way to deal with that is to smooth your output with some kind of a probability model which would take into account the possibility of a noisy labels (you could read about this in a Learning to Label from Noisy Data chapter from this book. It's important to discriminate this probabilistic models from models used to smooth your labels w.r.t. to image or label structure like classical CRFs for bilateral smoothing.
What we finally used (and worked really well) is the Channel Inhibited Softmax idea from this paper. In terms of a mathematical properties - it makes your network much more robust to some objects not labeled - because it makes your network to output much higher positive valued logits at correctly labeled objects.
You could treat this as a semi-supervised problem. Use the full dataset without labels to train a bottleneck autoencoder structure (or a GAN approach). This pretrained model can then be adjusted (e.g. removing the last layers, adding a better layer structure at the end on top of the bottleneck features) and finetuned on the labeled data.
I'm trying to set up an object classification system with OpenCV. When I detect a new object in a scene, I want to know if the new object belongs to a known object class (is it a box, a bottel, something unknown, etc.).
My steps so far:
Cutting down the Image to the roi where a new object could appear
Calculating keypoints for every Image (cv::SurfFeatureDetector)
Calculating descriptors for each keypoint (cv::SurfDescriptorExtractor)
Generating a vocabulary using Bag of Words (cv::BOWKMeansTrainer)
Calculating Response histograms (cv::BOWImgDescriptorExtractor)
Use the Response histograms to train a cv::SVM for every object class
Using the same set of images again to test the classification
I know that there is still something wrong with my code since the classification don't work yet.
But I don't really know, where I should use the full image (cutted down to the roi) or when I should extract the new object from the image and use just the object itself.
It's my first step into object recognition/classification and I saw people using both, full Images and extracted objects, but I just don't know when to use what.
I hope womeone can clarify this for me.
You should not use the same images for both testing and training.
In training, ideally you need to extract a ROI which includes just one dominant object, since the algorithm will assume that the codewords extracted from positive samples are the ones that should be presented in a test image to label it as positive. However, if you have a really big dataset like ImageNet, the algorithm should make a generalization.
In testing, you don't need to extract a ROI, because SIFT/SURF are scale invariant features. However, it's good to have a one dominant object in the test set, as well.
I think you should train 1 classifier for your each object class. This is called one-vs-all classifier.
One little note, if you don't want to worry about this issues and have big dataset. Just go with Convolutional Neural Networks. They have a really good generalization capability and are inherently multi-label thanks to their fully connected last layer.
I have been doing reading about Self Organizing Maps, and I understand the Algorithm(I think), however something still eludes me.
How do you interpret the trained network?
How would you then actually use it for say, a classification task(once you have done the clustering with your training data)?
All of the material I seem to find(printed and digital) focuses on the training of the Algorithm. I believe I may be missing something crucial.
Regards
SOMs are mainly a dimensionality reduction algorithm, not a classification tool. They are used for the dimensionality reduction just like PCA and similar methods (as once trained, you can check which neuron is activated by your input and use this neuron's position as the value), the only actual difference is their ability to preserve a given topology of output representation.
So what is SOM actually producing is a mapping from your input space X to the reduced space Y (the most common is a 2d lattice, making Y a 2 dimensional space). To perform actual classification you should transform your data through this mapping, and run some other, classificational model (SVM, Neural Network, Decision Tree, etc.).
In other words - SOMs are used for finding other representation of the data. Representation, which is easy for further analyzis by humans (as it is mostly 2dimensional and can be plotted), and very easy for any further classification models. This is a great method of visualizing highly dimensional data, analyzing "what is going on", how are some classes grouped geometricaly, etc.. But they should not be confused with other neural models like artificial neural networks or even growing neural gas (which is a very similar concept, yet giving a direct data clustering) as they serve a different purpose.
Of course one can use SOMs directly for the classification, but this is a modification of the original idea, which requires other data representation, and in general, it does not work that well as using some other classifier on top of it.
EDIT
There are at least few ways of visualizing the trained SOM:
one can render the SOM's neurons as points in the input space, with edges connecting the topologicaly close ones (this is possible only if the input space has small number of dimensions, like 2-3)
display data classes on the SOM's topology - if your data is labeled with some numbers {1,..k}, we can bind some k colors to them, for binary case let us consider blue and red. Next, for each data point we calculate its corresponding neuron in the SOM and add this label's color to the neuron. Once all data have been processed, we plot the SOM's neurons, each with its original position in the topology, with the color being some agregate (eg. mean) of colors assigned to it. This approach, if we use some simple topology like 2d grid, gives us a nice low-dimensional representation of data. In the following image, subimages from the third one to the end are the results of such visualization, where red color means label 1("yes" answer) andbluemeans label2` ("no" answer)
onc can also visualize the inter-neuron distances by calculating how far away are each connected neurons and plotting it on the SOM's map (second subimage in the above visualization)
one can cluster the neuron's positions with some clustering algorithm (like K-means) and visualize the clusters ids as colors (first subimage)