vectorize FOR loop in octave - vectorization

I'm trying to get rid of the FOR loop and vectorize this if possible. The numerical data in the variable data1 will not be sequential data / numbers but random numerical data.
ii=0;
data1=[1,2,3]; %test data will be random data this will not be sequential numbers
array_joined=[];
for ii = 0:2
ii+1;
array_joined=[array_joined; data1(:),repmat(ii,[1,length(data1)])(:)]
endfor
Result:
1 0
2 0
3 0
1 1
2 1
3 1
1 2
2 2
3 2
I'm using Octave 4.4 which is similar to Matlab.

repmat can be used on both data1 and iteration variable ii as follows:
data1 = [1,2,3]; ii = 0:2; %inputs
array_joined = [repmat(data1.',numel(ii),1) repmat(ii,numel(data1),1)(:)];

Related

How to calculate multiclass overall accuracy, sensitivity and specificity?

Can anyone explain how to calculate the accuracy, sensitivity and specificity of multi-class dataset?
Sensitivity of each class can be calculated from its
TP/(TP+FN)
and specificity of each class can be calculated from its
TN/(TN+FP)
For more information about concept and equations
http://en.wikipedia.org/wiki/Sensitivity_and_specificity
For multi-class classification, you may use one against all approach.
Suppose there are three classes: C1, C2, and C3
"TP of C1" is all C1 instances that are classified as C1.
"TN of C1" is all non-C1 instances that are not classified as C1.
"FP of C1" is all non-C1 instances that are classified as C1.
"FN of C1" is all C1 instances that are not classified as C1.
To find these four terms of C2 or C3 you can replace C1 with C2 or C3.
In a simple sentences :
In a 2x2, once you have picked one category as positive, the other is automatically negative. With 9 categories, you basically have 9 different sensitivities, depending on which of the nine categories you pick as "positive". You could calculate these by collapsing to a 2x2, i.e. Class1 versus not-Class1, then Class2 versus not-Class2, and so on.
Example :
we get a confusion matrix for the 7 types of glass:
=== Confusion Matrix ===
a b c d e f g <-- classified as
50 15 3 0 0 1 1 | a = build wind float
16 47 6 0 2 3 2 | b = build wind non-float
5 5 6 0 0 1 0 | c = vehic wind float
0 0 0 0 0 0 0 | d = vehic wind non-float
0 2 0 0 10 0 1 | e = containers
1 1 0 0 0 7 0 | f = tableware
3 2 0 0 0 1 23 | g = headlamps
a true positive rate (sensitivity) calculated for each type of glass, plus an overall weighted average:
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.714 0.174 0.667 0.714 0.690 0.532 0.806 0.667 build wind float
0.618 0.181 0.653 0.618 0.635 0.443 0.768 0.606 build wind non-float
0.353 0.046 0.400 0.353 0.375 0.325 0.766 0.251 vehic wind float
0.000 0.000 0.000 0.000 0.000 0.000 ? ? vehic wind non-float
0.769 0.010 0.833 0.769 0.800 0.788 0.872 0.575 containers
0.778 0.029 0.538 0.778 0.636 0.629 0.930 0.527 tableware
0.793 0.022 0.852 0.793 0.821 0.795 0.869 0.738 headlamps
0.668 0.130 0.670 0.668 0.668 0.539 0.807 0.611 Weighted Avg.
You may print a classification report from the link below, you will get the overall accuracy of your model.
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report
compute sensitivity and specificity for multi classification
from sklearn.metrics import precision_recall_fscore_support
res = []
for l in [0,1,2,3]:
prec,recall,_,_ = precision_recall_fscore_support(np.array(y_true)==l,
np.array(y_prediction)==l,
pos_label=True,average=None)
res.append([l,recall[0],recall[1]])
pd.DataFrame(res,columns = ['class','sensitivity','specificity'])

Returning a list of nearest neighbors from KNN

I am trying to use a KNN model to show the closest related brands to brand X. I have read in my data and transposed it so that the format is like this:
User1 User2 User3 User4 User5
Brand1 1 0 0 0 1
Brand2 0 0 0 1 1
Brand3 0 0 1 1 1
Brand4 1 1 1 0 1
Brand5 0 0 0 1 1
I've then defined my model:
from sklearn.neighbors import NearestNeighbors
model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
model_knn.fit(df_mini)
Then I use the following code to list the nearest 5 brands to a randomly selected brand:
query_index = np.random.choice(df_mini.shape[0])
distances, indices = model_knn.kneighbors(df_mini.iloc[query_index, :].values.reshape(1, -1), n_neighbors = 6)
for i in range(0, len(distances.flatten())):
if i == 0:
print ('Recommendations for {0}:\n'.format(df_mini.index[query_index]))
else:
print ('{0}: {1}, with distance of {2}:'.format(i, df_mini.index[indices.flatten()[i]], distances.flatten()[i]))
Returning sample results like this:
Recommendations for BRAND_X:
1: BRAND_a, with distance of 1.0:
2: BRAND_b, with distance of 1.0:
3: BRAND_c, with distance of 1.0:
4: BRAND_d, with distance of 1.0:
5: BRAND_e, with distance of 1.0:
All my results are showing all brands with a distance of 1.0, where have i gone wrong in my code for this to be the case? I have tried increasing the size of the sample data and this remains the same, which makes me feel it's a code error rather than a data quirk?
EDIT: Here is a fuller sample of my code:
import pandas as pd
df = pd.read_csv('sample.csv')
print(df.head())
df_mini = df[:5000]
df_mini = df_mini.transpose()
df_mini = df_mini.drop('UserID',axis=0)
from sklearn.neighbors import NearestNeighbors
model_knn = NearestNeighbors(metric='cosine', algorithm='brute')
model_knn.fit(df_mini)
query_index = np.random.choice(df_mini.shape[0])
distances, indices = model_knn.kneighbors(df_mini.iloc[query_index, :].values.reshape(1, -1), n_neighbors = 6)
for i in range(0, len(distances.flatten())):
if i == 0:
print ('Recommendations for {0}:\n'.format(df_mini.index[query_index]))
else:
print ('{0}: {1}, with distance of {2}:'.format(i, df_mini.index[indices.flatten()[i]], distances.flatten()[i]))
Sample data file:
https://drive.google.com/open?id=19KRJDGrsLNpDD0WNAz4be76O66fGmQtJ

Torch: Concatenating tensors of different dimensions

I have a x_at_i = torch.Tensor(1,i) that grows at every iteration where i = 0 to n. I would like to concatenate all tensors of different sizes into a matrix and fill the remaining cells with zeroes. What is the most idiomatic way to this. For example:
x_at_1 = 1
x_at_2 = 1 2
x_at_3 = 1 2 3
x_at_4 = 1 2 3 4
X = torch.cat(x_at_1, x_at_2, x_at_3, x_at_4)
X = [ 1 0 0 0
1 2 0 0
1 2 3 0
1 2 3 4 ]
If you know n and assuming you have access to your x_at_i easily at each iteration I would try something like
X = torch.Tensor(n, n):zero()
for i = 1, n do
X[i]:narrow(1, 1, i):copy(x_at[i])
end

Is there a way to use Machine Learning classify discrete and infinite scale data?

The data like that:
x y
7773 0
9805 4
7145 0
7645 1
2529 1
4814 2
6027 2
7499 2
3367 1
8861 5
9776 2
8009 5
3844 2
1218 2
1120 1
4553 0
3017 1
2582 2
1691 2
5342 0
...
The real function f(x) is: (Return the circle count of a decimal integer)
# 0 1 2 3 4 5 6 7 8 9
_f_map = [1, 0, 0, 0, 0, 0, 1, 0, 2, 1]
def f(x):
x = int(x)
assert x >= 0
if x == 0:
return 1
r = 0
while x:
r += _f_map[x % 10]
x /= 10
return r
The training data and test data can be produced by random:
data = []
target = []
for i in xrange(3000):
x = random.randint(0, 999999) #hardcode a scale
data.append([x])
target.append(f(x))
The real function is discrete and infinite scale.
Is there a way or a model can classify this data?
I tried SVM(Support Vector Machine), and acquired a 20% accuracy rate.
Looks like a typical use case of sequential models. You can easily learn LSTM/ other recurrent neural network to do so by considering your numbers as sequences of integers feeded to the network. At this point it just has to learn sum operation and a simple mapping(your f_map).

Torch tensors swapping dimensions

I came across these two lines (back-to-back) of code in a torch project:
im4[{1,{},{}}] = im3[{3,{},{}}]
im4[{3,{},{}}] = im3[{1,{},{}}]
What do these two lines do? I assumed they did some sort of swapping.
This is covered in indexing in the Torch Tensor Documentation
Indexing using the empty table {} is shorthand for all indices in that dimension. Below is a demo which uses {} to copy an entire row from one matrix to another:
> a = torch.Tensor(3, 3):fill(0)
0 0 0
0 0 0
0 0 0
> b = torch.Tensor(3, 3)
> for i=1,3 do for j=1,3 do b[i][j] = (i - 1) * 3 + j end end
> b
1 2 3
4 5 6
7 8 9
> a[{1, {}}] = b[{3, {}}]
> a
7 8 9
0 0 0
0 0 0
This assignment is equivalent to: a[1] = b[3].
Your example is similar:
im4[{1,{},{}}] = im3[{3,{},{}}]
im4[{3,{},{}}] = im3[{1,{},{}}]
which is more clearly stated as:
im4[1] = im3[3]
im4[3] = im3[1]
The first line assigns the values from im3's third row (a 2D sub-matrix) to im4's first row and the second line assigns the first row of im3 to the third row of im4.
Note that this is not a swap, as im3 is never written and im4 is never read from.

Resources