As you know, the feature in Iris dataset is 1 dimensional array for example :
[5.1, 3.5, 1.4, 0.2] Iris-Setosa
[7.0, 3.2, 4.7, 1.4] Iris-versicolor
I'm wondering if feature in machine-learning can be 2d-arrays or not ?
For example:
Feature Result
[[1.0 2.3 3.1],[2.4 6.3 9.6]] A
[[1.5 3.3 5.1],[5.4 9.3 7.0]] B
Thanks you!
Yes, they can be. In this case, the input become Tensors.
0 dimensional input - scalars
1 dimensional input - vectors
2 dimensional input - matrices
3 (and above) dimensional input - tensors
An image for example is a (m x n x p) tensor with RGB components. So each input is a multi-dimensional array of numbers. It is a matrix of matrices - This is an excellent explanation
Word embeddings are also somewhat similar. Words make up a document. A corpus is a collection of documents. In this case each word is typically 1 300-number vector. So a document will be a matrix - so each input is a matrix - Start here
Related
Just wondering whether anybody has done this? I have a dataset that is one dimensional (not sure whether it's the right word choice though). Unlike the usual CNN inputs which are images (so 2D), my data only has one dimension. An example would be:
instance1 - feature1, feature2,...featureN
instance2 - feature1, feature2,...featureN
...
instanceM - feature1, feature2,...featureN
How do I use my dataset with CNNs? the ones I have looked at accepts images (like AlexNet and GoogleNet) in the form:
instance1 - 2d feature matrix
instance2 - 2d feature matrix2
...
instanceM - 2d feature matrixN
Appreciate any help on it.
Thanks!
If your data were spatially related (you said it isn't) then you'd feed it to a convnet (or, specifically, a conv2d layer) with shape 1xNx1 or Nx1x1 (rows x cols x channels).
If this isn't spatial data at all - you just have N non-spatially-related features, then the shape should be 1x1xN.
For completeness, I should point out that if your data really is non-spatial, then there's really no point in using a convolutional layer/net. You could shape it as 1x1xN and then use 1x1 convolutions, but since a 1x1 convolution does the exact same thing as a fully-connected (aka dense aka linear) layer, you might as well just use that instead.
When the features are numeric, like these:
The feature matrix X in the hypothesis sigmoid(transpose(theta).X)) will be:
However, when we have 1 more feature here - color, which can be red or blue or green, on doing One Hot Encoding each will be a vector like: [1 0 0] [0 1 0] and [0 0 1].
I'm unable to figure out how to merge these One Hot Encoding vectors to already existing feature matrix and then use it in the equation for hypothesis
Yes, you should remove all not encoded categorical features from dataset, encode them and add their encoded values from one hot encoding, also you have to add corresponding weights into theta of course. Then you can fit your new model on this new dataset
I have a text classification task. By now i only tagged a corpus and extracted some features in a bigram format (i.e bigram = [('word', 'word'),...,('word', 'word')]. I would like to classify some text, as i understand SVM algorithm only can receive vectors in orther to classify, so i use some vectorizer in scikit as follows:
bigram = [ [('load', 'superior')
('point', 'medium'), ('color', 'white'),
('the load', 'tower')]]
fh = FeatureHasher(input_type='string')
X = fh.transform(((' '.join(x) for x in sample)
for sample in bigram))
print X
the output is a sparse matrix:
(0, 226456) -1.0
(0, 607603) -1.0
(0, 668514) 1.0
(0, 715910) -1.0
How can i use the previous sparse matrix X to classify with SVC?, assuming that i have 2 classes and a train and test sets.
As others have pointed out, your matrix is just a list of feature vectors for the documents in your corpus. Use these vectors as features for classification. You just need classification labels y and then you can use SVC().fit(X, y).
But... the way that you have asked this makes me think that maybe you don't have any classification labels. In this case, I think you want to be doing clustering rather than classification. You could use one of the clustering algorithms to do this. I suggest sklearn.cluster.MiniBatchKMeans to start. You can then output the top 5-10 words for each cluster and form labels from those.
I've been playing with some SVM implementations and I am wondering - what is the best way to normalize feature values to fit into one range? (from 0 to 1)
Let's suppose I have 3 features with values in ranges of:
3 - 5.
0.02 - 0.05
10-15.
How do I convert all of those values into range of [0,1]?
What If, during training, the highest value of feature number 1 that I will encounter is 5 and after I begin to use my model on much bigger datasets, I will stumble upon values as high as 7? Then in the converted range, it would exceed 1...
How do I normalize values during training to account for the possibility of "values in the wild" exceeding the highest(or lowest) values the model "seen" during training? How will the model react to that and how I make it work properly when that happens?
Besides scaling to unit length method provided by Tim, standardization is most often used in machine learning field. Please note that when your test data comes, it makes more sense to use the mean value and standard deviation from your training samples to do this scaling. If you have a very large amount of training data, it is safe to assume they obey the normal distribution, so the possibility that new test data is out-of-range won't be that high. Refer to this post for more details.
You normalise a vector by converting it to a unit vector. This trains the SVM on the relative values of the features, not the magnitudes. The normalisation algorithm will work on vectors with any values.
To convert to a unit vector, divide each value by the length of the vector. For example, a vector of [4 0.02 12] has a length of 12.6491. The normalised vector is then [4/12.6491 0.02/12.6491 12/12.6491] = [0.316 0.0016 0.949].
If "in the wild" we encounter a vector of [400 2 1200] it will normalise to the same unit vector as above. The magnitudes of the features is "cancelled out" by the normalisation and we are left with relative values between 0 and 1.
I previously asked for an explanation of linearly separable data. Still reading Mitchell's Machine Learning book, I have some trouble understanding why exactly the perceptron rule only works for linearly separable data?
Mitchell defines a perceptron as follows:
That is, it is y is 1 or -1 if the sum of the weighted inputs exceeds some threshold.
Now, the problem is to determine a weight vector that causes the perceptron to produce the correct output (1 or -1) for each of the given training examples. One way of achieving this is through the perceptron rule:
One way to learn an acceptable weight vector is to begin with random
weights, then iteratively apply the perceptron to each training
example, modify- ing the perceptron weights whenever it misclassifies
an example. This process is repeated, iterating through the training
examples as many times as needed until the perceptron classifies all
training examples correctly. Weights are modified at each step
according to the perceptron training rule, which revises the weight wi
associated with input xi according to the rule:
So, my question is: Why does this only work with linearly separable data? Thanks.
Because the dot product of w and x is a linear combination of xs, and you, in fact, split your data into 2 classes using a hyperplane a_1 x_1 + … + a_n x_n > 0
Consider a 2D example: X = (x, y) and W = (a, b) then X * W = a*x + b*y. sgn returns 1 if its argument is greater than 0, that is, for class #1 you have a*x + b*y > 0, which is equivalent to y > -a/b x (assuming b != 0). And this equation is linear and divides a 2D plane into 2 parts.