I'm trying to run the segmentation model on iOS and I have several questions about how I should properly use the output tensor.
Here the link on the model I'm using:
https://www.tensorflow.org/lite/models/segmentation/overview
When I run this model I'm getting the output tensor with dimension:
1 x 257 x 257 x 21.
Why I get 21 as the last dimension? It looks like for each pixel we are getting the class scores. Do we need to find argmax here to get the correct class value?
But why only 21 classes? I was thinking it should contain more. And where I can find the info which value corresponds to a certain class.
In ImageClassification example we have a label.txt with 1001 classes.
Based on ImageClassification example I did an attempt to parse the tensor: firstly transform it to Float array of size 1 387 029 (21 x 257 x 257) and then using the following code I'm creating an image pixel by pixel:
// size = 257
// depth = 21
// array - float array of size 1 387 029
for i in 0..<size {
for j in 0..<size {
var scores: [Float] = []
for k in 0..<depth {
let index = i * size * depth + j * depth + k
let score = array[index]
scores.append(score)
}
if let maxScore = scores.max(),
let maxClass = scores.firstIndex(of: maxScore) {
let index = i * size + j
if maxClass == 0 {
pixelBuffer[index] = .blue
} else if maxClass == 12 {
pixelBuffer[index] = .black
} else {
pixelBuffer[index] = .green
}
}
}
}
Here the result I get:
You can see that quality is not really good. What have I missed?
The segmentation model for CoreML(https://developer.apple.com/machine-learning/models/) works much better on the same example:
It seems like your model was trained on PASCAL VOC data that has 21 classes for segmentation.
You can find a list of the classes here:
background
aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor
Adding to the answer by Shai, you can also use a tool like Netron to visualize your network and get more insight on the inputs and outputs, for example your input would be the image of size 257x257x3:
And you already know your output size, for segmentation models you are getting that 21 since that is the number of classes your model supports as Shai mentioned, then take the argmax of each pixel for all classes and that should give you a more decent output, no need to resize anything, try something like (in pseudo-code):
output = [rows][cols]
for i in rows:
for j in cols:
argmax = -1
for c in classes:
if tensor_out[i][j][c] > argmax:
argmax = tensor_out[i][j][c]
output[i][j] = c
Then output would be your segmented image.
Related
Kaggle Dataset and code link
I'm trying to solve the above Kaggle problem and I want to export preprocessed csv so that I can build a model on weka, but when I'm trying to save it in csv I'm losing a dimension, I want to retain all the information in that csv.
please help me with the relevant code or any resource.
Thanks
print (scaled_x)
|x |y |z |label
|1.485231 |-0.661030 |-1.194153 |0
|0.888257 |-1.370361 |-0.829636 |0
|0.691523 |-0.594794 |-0.936247 |0
Fs=20
frame_size = Fs*4 #80
hop_size = Fs*2 #40
def get_frames(df, frame_size, hop_size):
N_FEATURES = 3
frames = []
labels = []
for i in range(0,len(df )- frame_size, hop_size):
x = df['x'].values[i: i+frame_size]
y = df['y'].values[i: i+frame_size]
z = df['z'].values[i: i+frame_size]
label = stats.mode(df['label'][i: i+frame_size])[0][0]
frames.append([x,y,z])
labels.append(label)
frames = np.asarray(frames).reshape(-1, frame_size, N_FEATURES)
labels = np.asarray(labels)
return frames, labels
x,y = get_frames(scaled_x, frame_size, hop_size)
x.shape, y.shape
((78728, 80, 3), (78728,))
According to the link you posted, the data is times series accelerometer/gyro data sampled at 20 Hz, with a label for each sample. They want to aggregate the time series into frames (with the corresponding label being the most common label during a given frame).
So frame_size is the number of samples in a frame, and hop_size is the amount the sliding window moves forward each iteration. In other words, the frames overlap by 50% since hop_size = frame_size / 2.
Thus at the end you get a 3D array of 78728 frames of length 80, with 3 values (x, y, z) each.
EDIT: To answer your new question about how to export as CSV, you'll need to "flatten" the 3D frame array to a 2D array since that's what a CSV represents. There are multiple different ways to do this but I think the easiest may just be to concatenate the final two dimensions, so that each row is a frame, consisting of 240 values (80 samples of 3 co-ordinates each). Then concatenate the labels as the final column.
x_2d = np.reshape(x, (x.shape[0], -1))
full = np.concatenate([x, y], axis=1)
import pandas as pd
df = pd.DataFrame(full)
df.to_csv("frames.csv")
If you also want proper column names:
columns = []
for i in range(1, x.shape[1] + 1):
columns.extend([f"{i}_X", f"{i}_Y", f"{i}_Z"])
columns.append("label")
df = pd.DataFrame(full, columns=columns)
I'm doing an experiment using face images in PyTorch framework. The input x is the given face image of size 5 * 5 (height * width) and there are 192 channels.
Objective: To obtain patches of x of patch_size(given as argument).
I have obtained the required result with the help of two for loops. But I want a better-vectorized solution so that the computation cost will be very less than using two for loops.
Used: PyTorch 0.4.1, (12 GB) Nvidia TitanX GPU.
The following is my implementation using two for loops
def extractpatches( x, patch_size): # x is bsx192x5x5
patches = x.unfold( 2, patch_size , 1).unfold(3,patch_size,1)
bs,c,pi,pj, _, _ = patches.size() #bs,192,
cnt = 0
p = torch.empty((bs,pi*pj,c,patch_size,patch_size)).to(device)
s = torch.empty((bs,pi*pj, c*patch_size*patch_size)).to(device)
//Want a vectorized method instead of two for loops below
for i in range(pi):
for j in range(pj):
p[:,cnt,:,:,:] = patches[:,:,i,j,:,:]
s[:,cnt,:] = p[:,cnt,:,:,:].view(-1,c*patch_size*patch_size)
cnt = cnt+1
return s
Thanks for your help in advance.
I think you can try this as following. I used some parts of your code for my experiment and it worked for me. Here l and f are the lists of tensor patches
l = [patches[:,:,int(i/pi),i%pi,:,:] for i in range(pi * pi)]
f = [l[i].contiguous().view(-1,c*patch_size*patch_size) for i in range(pi * pi)]
You can verify the above code using toy input values.
Thanks.
I want to calculate the information gain on 20_newsgroup data set.
I am using the code here(also I put a copy of the code down of the question).
As you see the input to the algorithm is X,y
My confusion is that, X is going to be a matrix with documents in rows and features as column. (according to 20_newsgroup it is 11314,1000
in case i only considered 1000 features).
but according to the concept of information gain, it should calculate information gain for each feature.
(So I was expecting to see the code in a way loop through each feature, so the input to the function be a matrix where rows are features and columns are class)
But X is not feature here but X stands for documents, and I can not see the part in the code that take care of this part! ( I mean considering each document, and then going through each feature of that document; like looping through rows but at the same time looping through columns as the features are stored in columns).
I have read this and this and many similar questions but they are not clear in terms of input matrix shape.
this is the code for reading 20_newsgroup:
newsgroup_train = fetch_20newsgroups(subset='train')
X,y = newsgroup_train.data,newsgroup_train.target
cv = CountVectorizer(max_df=0.99,min_df=0.001, max_features=1000,stop_words='english',lowercase=True,analyzer='word')
X_vec = cv.fit_transform(X)
(X_vec.shape) is (11314,1000) which is not features in the 20_newsgroup data set. I am thinking am I calculating Information gain in a incorrect way?
This is the code for Information gain:
def information_gain(X, y):
def _calIg():
entropy_x_set = 0
entropy_x_not_set = 0
for c in classCnt:
probs = classCnt[c] / float(featureTot)
entropy_x_set = entropy_x_set - probs * np.log(probs)
probs = (classTotCnt[c] - classCnt[c]) / float(tot - featureTot)
entropy_x_not_set = entropy_x_not_set - probs * np.log(probs)
for c in classTotCnt:
if c not in classCnt:
probs = classTotCnt[c] / float(tot - featureTot)
entropy_x_not_set = entropy_x_not_set - probs * np.log(probs)
return entropy_before - ((featureTot / float(tot)) * entropy_x_set
+ ((tot - featureTot) / float(tot)) * entropy_x_not_set)
tot = X.shape[0]
classTotCnt = {}
entropy_before = 0
for i in y:
if i not in classTotCnt:
classTotCnt[i] = 1
else:
classTotCnt[i] = classTotCnt[i] + 1
for c in classTotCnt:
probs = classTotCnt[c] / float(tot)
entropy_before = entropy_before - probs * np.log(probs)
nz = X.T.nonzero()
pre = 0
classCnt = {}
featureTot = 0
information_gain = []
for i in range(0, len(nz[0])):
if (i != 0 and nz[0][i] != pre):
for notappear in range(pre+1, nz[0][i]):
information_gain.append(0)
ig = _calIg()
information_gain.append(ig)
pre = nz[0][i]
classCnt = {}
featureTot = 0
featureTot = featureTot + 1
yclass = y[nz[1][i]]
if yclass not in classCnt:
classCnt[yclass] = 1
else:
classCnt[yclass] = classCnt[yclass] + 1
ig = _calIg()
information_gain.append(ig)
return np.asarray(information_gain)
Well, after going through the code in detail, I learned more about X.T.nonzero().
Actually it is correct that information gain needs to loop through features.
Also it is correct that the matrix scikit-learn give us here is based on doc-features.
But:
in code it uses X.T.nonzero() which technically transform all the nonzero values into array. and then in the next row loop through the length of that array range(0, len(X.T.nonzero()[0]).
Overall, this part X.T.nonzero()[0] is returning all the none zero features to us :)
I am currently working in torch to implement a random shuffle (on the rows, the first dimension in this case) on some input data. I am new to torch, so I have some troubles figuring out how permutation works..
The following is supposed to shuffle the data:
if argshuffle then
local perm = torch.randperm(sids:size(1)):long()
print("\n\n\nSize of X and y before")
print(X:view(-1, 1000, 128):size())
print(y:size())
print(sids:size())
print("\nPerm size is: ")
print(perm:size())
X = X:view(-1, 1000, 128)[{{perm},{},{}}]
y = y[{{perm},{}}]
print(sids[{{1}, {}}])
sids = sids[{{perm},{}}]
print(sids[{{1}, {}}])
print(X:size())
print(y:size())
print(sids:size())
os.exit(69)
end
This prints out
Size of X and y before
99
1000
128
[torch.LongStorage of size 3]
99
1
[torch.LongStorage of size 2]
99
1
[torch.LongStorage of size 2]
Perm size is:
99
[torch.LongStorage of size 1]
5
[torch.LongStorage of size 1x1]
5
[torch.LongStorage of size 1x1]
99
1000
128
[torch.LongStorage of size 3]
99
1
[torch.LongStorage of size 2]
99
1
[torch.LongStorage of size 2]
Out of the value, I can imply that the function did not shuffle the data. How can I make it shuffle correctly, and what is the common solution in lua/torch?
I also faced a similar issue. In the documentation, there is no shuffle function for tensors (there are for dataset loaders). I found a workaround to the problem using torch.randperm.
>>> a=torch.rand(3,5)
>>> print(a)
tensor([[0.4896, 0.3708, 0.2183, 0.8157, 0.7861],
[0.0845, 0.7596, 0.5231, 0.4861, 0.9237],
[0.4496, 0.5980, 0.7473, 0.2005, 0.8990]])
>>> # Row shuffling
...
>>> a=a[torch.randperm(a.size()[0])]
>>> print(a)
tensor([[0.4496, 0.5980, 0.7473, 0.2005, 0.8990],
[0.0845, 0.7596, 0.5231, 0.4861, 0.9237],
[0.4896, 0.3708, 0.2183, 0.8157, 0.7861]])
>>> # column shuffling
...
>>> a=a[:,torch.randperm(a.size()[1])]
>>> print(a)
tensor([[0.2005, 0.7473, 0.5980, 0.8990, 0.4496],
[0.4861, 0.5231, 0.7596, 0.9237, 0.0845],
[0.8157, 0.2183, 0.3708, 0.7861, 0.4896]])
I hope it answers the question!
dim = 0
idx = torch.randperm(t.shape[dim])
t_shuffled = t[idx]
If your tensor is e.g. of shape CxNxF (channels by rows by features), then you can shuffle along the second dimension like so:
dim=1
idx = torch.randperm(t.shape[dim])
t_shuffled = t[:,idx]
A straightforward solution is to use permutation matrices (those that are usual in linear algebra). Since you seem to be interested in the 3d case, we will have to flatten your 3d tensor first. So, here's an example code (ready-to use) that I came up with
data=torch.floor(torch.rand(5,3,2)*100):float()
reordered_data=data:view(5,-1)
perm=torch.randperm(5);
perm_rep=torch.repeatTensor(perm,5,1):transpose(1,2)
indexes=torch.range(1,5);
indexes_rep=torch.repeatTensor(indexes,5,1)
permutation_matrix=indexes_rep:eq(perm_rep):float()
permuted=permutation_matrix*reordered_data
print("perm")
print(perm)
print("before permutation")
print(data)
print("after permutation")
print(permuted:view(5,3,2))
As you will see from one execution, it reorders the tensor data according to the row indexes given in perm.
Based on your syntax, I assume you're using to torch with lua rather than PyTorch.
torch.Tensor.index is your function, it works like below:
x = torch.rand(4, 4)
p = torch.randperm(4)
print(x)
print(p)
print(x:index(1,p:long())
This is my code for slicing my 512*512 image into a cube of 64*64*64 dimension. but when i reshape it again into a 2D array why is it not giving me the original image.am i doing something incorrect please help.
clc;
im=ind2gray(y,ymap);
% im=imresize(im,0.125);
[rows ,columns, colbands] = size(im)
end
image3d=reshape(image3d,512,512);
figure,imshow(uint8(image3d));
Just a small hint.
P(:,:,1) = [0,0;0,0]
P(:,:,2) = [1,1;1,1]
P(:,:,3) = [2,2;2,2]
P(:,:,4) = [3,3;3,3]
B = reshape(P,4,4)
B =
0 1 2 3
0 1 2 3
0 1 2 3
0 1 2 3
So you might change the slicing or do the reshaping on your own.
If I have understood your question right, you can look into the code below to perform the same operation.
% Random image of the provided size 512X512
imageX = rand(512,512)
imagesc(imageX)
% Converting the image "imageX" into the cube of 64X64X64 dimension
sliceColWise = reshape(imageX,64,64,64)
size(sliceColWise)
% Reshaping the cube to obtain the image original that was "imageX",
% in order to observe that they are identical the difference is plotted
imageY = reshape(sliceColWise,512,512);
imagesc(imageX-imageY)
n.b: From MATLAB help you can see that the reshape works column wise
reshape(X,M,N) or reshape(X,[M,N]) returns the M-by-N matrix
whose elements are taken columnwise from X. An error results
if X does not have M*N elements.