Why is this a rank 2 tensor? - machine-learning

tensor3 = tf.Variable([["hi", "hello","yo"],["hi","hello","hi"],["yo","whats","up"]], tf.string)
My understanding is this should be a rank 3 tensor but turns out its a rank 2 tensor. I'm new to machine learning so I'm not sure if I'm missing something here.

A tensor rank is the number of its dimensions, not the maximal size along a dimension.
C_ijkl would be a rank 4 tensor (see e.g. tensor formulation of Hooke's law). Yours has only 2 dimensions. You must be confusing it with a matrix rank. Straight from TF documentation:
Note: The rank of a tensor is not the same as the rank of a matrix.
The rank of a tensor is the number of indices required to uniquely
select each element of the tensor. Rank is also known as "order",
"degree", or "ndims."

Related

genereate unique row index in a 2D tensor as an output 1D tensor with PyTorch

When I implement target in in-batch multi-class classification on PyTorch (version 1.6), I have the following problem.
I got a variable D <class 'torch.Tensor'> (related to label description) of size as torch.Size([16, 128]), i.e. [data_size,token_id_size].
The original idea was to generate a target tensor of torch.Size([16]), each value is unique, corresponding to the rows in D, from 0 to 16 as [0,1,2,...,15], for in-batch multi-class classification.
This can be done using target = torch.LongTensor(torch.arange(16))
But there maybe repeated, non-unique rows in D, so I would like that the same, unique row in D has the its unique index in target. For example D has row0, row1, row8 the same token_ids or vector and the other rows are all different from each other, then target should be [0,0,2,3,4,5,6,0,8,9,10,11,12,13,14,15] or [0,0,1,2,3,4,5,0,6,7,8,9,10,11,12,13], wher the former has still indexes 0-15 (but no 1 and 7) and the latter has indexes of all in 0-13.
How can I implement this?
See answers of the simplified question (i) generate 1D tensor as unique index of rows of an 2D tensor and (ii) generate 1D tensor as unique index of rows of an 2D tensor (keeping the order and the original index), which address the problem of this question.
But these seem not useful to improve the contrastive multi-class classification.

Tensorly and tensor rank (CP rank)

I am trying to compute the tensor rank aka CP rank (https://en.wikipedia.org/wiki/Tensor_rank_decomposition#Tensor_rank) for a specific sparse tensor that is 8 x 8 x 8.
I am new to Tensorly and have only just installed. After reading the documentation on the parafac function (http://tensorly.org/stable/modules/generated/tensorly.decomposition.parafac.html), it seems I need to specify a particular tensor rank in order to find a tensor rank decomposition of that particular rank. How can one compute the tensor rank using this software? Is there perhaps a different function which yields the tensor rank when given a specific tensor?
Determining the rank of a tensor is, in general, NP-hard. Typically, the CP decomposition takes a tensor and the desired rank as input.
If you use the latest version of TensorLy from Github, you can set rank='same' or any floating value between 0 and 1 to set the rank so as to keep either the same number of parameters as the original tensor, or a fraction of the parameters.

Deep Learning Log Likelihood

I am new babie to the Deep Learning field, and I am use log-likelihood method to compare the MSE metrics.Could anyone be able to show how to calculate the following 2 predicted output examples with 3 outputs neurons each. Thanks
yt = [ [1,0,0],[0,0,1]]
yp = [ [0.9, 0.2,0.2], [0.2,0.8,0.3] ]
MSE or Mean Squared Error is simply the expected value of the squared difference between the predicted and the ground truth labels, represented as
\text{MSE}(\hat{\theta}) = E\left[(\hat{\theta} - \theta)^2\right]
where theta is the ground truth labels and theta^hat is the predicted labels
I am not sure what are you referring to exactly, like a theoretical question or a part of code
As a Python implementation
def mean_squared_error(A, B):
return np.square(np.subtract(A,B)).mean()
yt = [[1,0,0],[0,0,1]]
yp = [[0.9, 0.2,0.2], [0.2,0.8,0.3]]
mse = mean_squared_error(yt, yp)
print(mse)
This will give a value of 0.21
If you are using one of the DL frameworks like TensorFlow, then they are already providing the function which calculates the mse loss between tensors
tf.losses.mean_squared_error
where
tf.losses.mean_squared_error(
labels,
predictions,
weights=1.0,
scope=None,
loss_collection=tf.GraphKeys.LOSSES,
reduction=Reduction.SUM_BY_NONZERO_WEIGHTS
)
Args:
labels: The ground truth output tensor, same dimensions as 'predictions'.
predictions: The predicted outputs.
weights: Optional Tensor whose rank is either 0, or the same rank as labels, and must be broadcastable to labels (i.e., all dimensions
must be either 1, or the same as the corresponding losses dimension).
scope: The scope for the operations performed in computing the loss.
loss_collection: collection to which the loss will be added.
reduction: Type of reduction to apply to loss.
Returns:
Weighted loss float Tensor. If reduction is NONE, this has the same
shape as labels; otherwise, it is scalar.

Image classification with Sift features and Knn?

Can you help me waith Image classification using SIFT feature?
I want to classify images based on SIFT features:
Given a training set of images, extract SIFT from them
Compute K-Means over the entire set of SIFTs extracted form the
training set. the "K" parameter (the number of clusters) depends on
the number of SIFTs that you have for training, but usually is around
500->8000 (the higher, the better).
Now you have obtained K cluster centers.
You can compute the descriptor of an image by assigning each SIFT of
the image to one of the K clusters. In this way you obtain a
histogram of length K.
I have 130 images in training set so my training set 130*K
dimensional
I want to classify my test images ı have 1 images so my sample is 1*k
dimensional. I wrote this code knnclassify(sample,training
set,group).
I want to classify to 7 group. So, knnclassify(sample(1*10),trainingset(130*10),group(7*1))
The error is: The length of GROUP must equal the number of rows in TRAINING. What can I do?
Straight from the docs:
CLASS = knnclassify(SAMPLE,TRAINING,GROUP) classifies each row of the
data in SAMPLE into one of the groups in TRAINING using the nearest-
neighbor method. SAMPLE and TRAINING must be matrices with the same
number of columns. GROUP is a grouping variable for TRAINING. Its
unique values define groups, and each element defines the group to
which the corresponding row of TRAINING belongs. GROUP can be a
numeric vector, a string array, or a cell array of strings. TRAINING
and GROUP must have the same number of rows.
What this means, is that group should be 130x1, and should indicate which group each of the training samples belong to. unique(group) should return 7 values in your case - the seven categories represented in your training set.
If you don't already have a group vector which specifies which categories which image falls into, you could use kmeans to split your training set into 7 groups:
group = kmeans(trainingset,7);
knnclassify(sample, trainingset, group);

Genetic algorithms: fitness function for feature selection algorithm

I have data set n x m where there are n observations and each observation consists of m values for m attributes. Each observation has also observed result assigned to it. m is big, too big for my task. I am trying to find a best and smallest subset of m attributes that still represents the whole dataset quite well, so that I could use only these attributes for teaching a neural network.
I want to use genetic algorithm for this. The problem is the fittness function. It should tell how well the generated model (subset of attributes) still reflects the original data. And I don't know how to evaluate certain subset of attributes against the whole set.
Of course I could use the neural network(that will later use this selected data anyway) for checking how good the subset is - the smaller the error, the better the subset. BUT, this takes a looot of time in my case and I do not want to use this solution. I am looking for some other way that would preferably operate only on the data set.
What I thought about was: having subset S (found by genetic algorithm), trim data set so that it contains values only for subset S and check how many observations in this data ser are no longer distinguishable (have same values for same attributes) while having different result values. The bigger the number is, the worse subset it is. But this seems to me like a bit too computationally exhausting.
Are there any other ways to evaluate how well a subset of attributes still represents the whole data set?
This cost function should do what you want: sum the factor loadings that correspond to the features comprising each subset.
The higher that sum, the greater the share of variability in the response variable that is explained with just those features. If i understand the OP, this cost function is a faithful translation of "represents the whole set quite well" from the OP.
Reducing to code is straightforward:
Calculate the covariance matrix of your dataset (first remove the
column that holds the response variable, i.e., probably the last
one). If your dataset is m x n (columns x rows), then this
covariance matrix will be n x n, with "1"s down the main diagonal.
Next, perform an eigenvalue decomposition on this covariance
matrix; this will give you the proportion of the total variability
in the response variable, contributed by that eigenvalue (each
eigenvalue corresponds to a feature, or column). [Note,
singular-value decomposition (SVD) is often used for this step, but
it's unnecessary--an eigenvalue decomposition is much simpler, and
always does the job as long as your matrix is square, which
covariance matrices always are].
Your genetic algorithm will, at each iteration, return a set of
candidate solutions (features subsets, in your case). The next task
in GA, or any combinatorial optimization, is to rank those candiate
solutions by their cost function score. In your case, the cost
function is a simple summation of the eigenvalue proportion for each
feature in that subset. (I guess you would want to scale/normalize
that calculation so that the higher numbers are the least fit
though.)
A sample calculation (using python + NumPy):
>>> # there are many ways to do an eigenvalue decomp, this is just one way
>>> import numpy as NP
>>> import numpy.linalg as LA
>>> # calculate covariance matrix of the data set (leaving out response variable column)
>>> C = NP.corrcoef(d3, rowvar=0)
>>> C.shape
(4, 4)
>>> C
array([[ 1. , -0.11, 0.87, 0.82],
[-0.11, 1. , -0.42, -0.36],
[ 0.87, -0.42, 1. , 0.96],
[ 0.82, -0.36, 0.96, 1. ]])
>>> # now calculate eigenvalues & eivenvectors of the covariance matrix:
>>> eva, evc = LA.eig(C)
>>> # now just get value proprtions of each eigenvalue:
>>> # first, sort the eigenvalues, highest to lowest:
>>> eva1 = NP.sort(eva)[::-1]
>>> # get value proportion of each eigenvalue:
>>> eva2 = NP.cumsum(eva1/NP.sum(eva1)) # "cumsum" is just cumulative sum
>>> title1 = "ev value proportion"
>>> print( "{0}".format("-"*len(title1)) )
-------------------
>>> for row in q :
print("{0:1d} {1:3f} {2:3f}".format(int(row[0]), row[1], row[2]))
ev value proportion
1 2.91 0.727
2 0.92 0.953
3 0.14 0.995
4 0.02 1.000
so it's the third column of values just above (one for each feature) that are summed (selectively, depending on which features are present in a given subset you are evaluating with the cost function).

Resources