Relationship between Parameters and FLOPs - machine-learning

Model Params FLOPs Input Top1 Acc
ConvNeXtV2Tiny 28.6M 4.47G 224 83.0
- ImageNet21k-ft1k 28.6M 4.47G 224 83.9
- ImageNet21k-ft1k 28.6M 13.1G 384 85.1
I understand why the FLOPs didn't change at all.
However, the number of Params remains. How can be?

Related

Ran into "TypeError: '<' not supported between instances of 'Tensor' and 'list'" when going through dataset

I am replicating ResNet (source: https://arxiv.org/abs/1512.03385).
I ran into the error "TypeError: '<' not supported between instances of 'Tensor' and 'list'" when trying to go through several different dataset in different sections of my code.
I tried different fixes but none worked: (i) I deleted enumerate cause I worried that using this may cause the problem (ii) I tried to go through dataloader rather than dataset but it didn't work
1st time: When I tried to view images:
for images, _ in train_loader:
print('images.shape:', images.shape)
plt.figure(figsize=(16,8))
plt.axis('off')
plt.imshow(torchvision.utils.make_grid(images, nrow=16).permute((1, 2, 0)))
break
2nd/3rd time: when I tried to validate/test the resnet:
with torch.no_grad():
for j, inputs, labels in enumerate(test_loader, start=0):
outputs = resnet_models[i](inputs)
_, prediction = torch.max(outputs, dim=1)
You may notice that I didn't run into this error when training the resnet, and the code is quite similar:
for batch, data in enumerate(train_dataloader, start=0):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
Error message (taking the first error as an example. The rest is pretty much the same)
TypeError Traceback (most recent call last)
Input In [38], in <cell line: 8>()
6 print("Images AFTER NORMALIZATION")
7 print("--------------------------")
----> 8 for images, _ in training_data:
9 sort=False
10 print('images.shape:', images.shape)
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torch/utils/data/dataset.py:471, in Subset.getitem(self, idx)
469 if isinstance(idx, list):
470 return self.dataset[[self.indices[i] for i in idx]]
--> 471 return self.dataset[self.indices[idx]]
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torchvision/datasets/cifar.py:118, in CIFAR10.getitem(self, index)
115 img = Image.fromarray(img)
117 if self.transform is not None:
--> 118 img = self.transform(img)
120 if self.target_transform is not None:
121 target = self.target_transform(target)
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torchvision/transforms/transforms.py:95, in Compose.call(self, img)
93 def call(self, img):
94 for t in self.transforms:
---> 95 img = t(img)
96 return img
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torch/nn/modules/module.py:1110, in Module._call_impl(self, *input, **kwargs)
1106 # If we don't have any hooks, we want to skip the rest of the logic in
1107 # this function, and just call forward.
1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks >or _global_backward_hooks
1109 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110 return forward_call(*input, **kwargs)
1111 # Do not call functions when jit is used
1112 full_backward_hooks, non_full_backward_hooks = [], []
File ~/miniconda3/envs/resnet/lib/python3.9/site->packages/torchvision/transforms/transforms.py:707, in RandomHorizontalFlip.forward(self, >img)
699 def forward(self, img):
700 """
701 Args:
702 img (PIL Image or Tensor): Image to be flipped.
(...)
705 PIL Image or Tensor: Randomly flipped image.
706 """
--> 707 if torch.rand(1) < self.p:
708 return F.hflip(img)
709 return img
TypeError: '<' not supported between instances of 'Tensor' and 'list'
I was having the same error message, probably under different circumstances, but I just found my own bug and figured I would share it anyway for various readers. I was using a torchvision transformation in my dataset, which the dataloader was loading from. The transformation was
torchvision.transforms.RandomHorizontalFlip([0.5]),
and the error is that the input to this transformation should not be a list but should be
torchvision.transforms.RandomHorizontalFlip(0.5),
So if there is anything I can recommend, it's just that maybe there is some list argument being passed through that shouldn't be in some transformation or otherwise.

Converting Fully connected layers into Convolutional layers?

I've read another post made about converting FC layers into convolutional layers in this post:
https://stats.stackexchange.com/questions/263349/how-to-convert-fully-connected-layer-into-convolutional-layer ,
but i don't understand how you get the 4096x1x1 in the last calculations. I know that after going trough the convolution layers and the pooling that we end up with a layer of 7x7x512
I got this from this github post: https://cs231n.github.io/convolutional-networks/#convert
Conversely, any FC layer can be converted to a CONV layer. For example, an FC layer with K=4096 that is looking at some input volume of size 7×7×512 can be equivalently expressed as a CONV layer with F=7,P=0,S=1,K=4096. In other words, we are setting the filter size to be exactly the size of the input volume, and hence the output will simply be 1×1×4096 since only a single depth column “fits” across the input volume, giving identical result as the initial FC layer.
It's the math i don't completely understand. In the other post, the author wrote
In this example, as far as I understood, the converted CONV layer should have the shape (7,7,512), meaning (width, height, feature dimension). And we have 4096 filters. And the output of each filter's spatial size can be calculated as (7-7+0)/1 + 1 = 1. Therefore we have a 1x1x4096 vector as output.
But where do you get the other 7 from? do you calculate the convolutional layer with itself? and that's how you end up with 1x1x4096?
Consider I have a CNN that consists of Input(234×234)-Conv(7,32,1)-Pool(2,2)-Conv(7,32,1)-Pool(2,2)Conv(7,32,1)-Pool(2,2)-FC(1024)-FC(1024)-FC(1000). And zero-padding is not used.
Running this through and calculating conv layers and pooling should leave us at 24x24x32 at last pooling if i'm not wrong. Stride is 1 for the conv layers.
234x234x1 > conv7x7x32 > (234-7)/1+1 = 228
228x228x32 > pool2x2 > (228 - 2 )/2 + 1 = 114
114x114x32 > conv7x7x32 > (114 - 7 ) / 1 + 1 = 108
108x108x32 > pool2x2 > (108-2)/2 + 1 = 54
54x54x32 > conv7x7x32 > (54-7)/1 + 1 = 48
48x48x32 > pool2x2 > (48-2)/2 + 1 = 24
24x24x32
(24-24)/1 + 1 = 1 > 1024x1x1, 1024x1x1, 1000x1x1
Is this the right way to convert the FC layers into convolutional layers?

Number of parameters for Keras SimpleRNN

I have a SimpleRNN like:
model.add(SimpleRNN(10, input_shape=(3, 1)))
model.add(Dense(1, activation="linear"))
The model summary says:
simple_rnn_1 (SimpleRNN) (None, 10) 120
I am curious about the parameter number 120 for simple_rnn_1.
Could you someone answer my question?
When you look at the headline of the table you see the title Param:
Layer (type) Output Shape Param
===============================================
simple_rnn_1 (SimpleRNN) (None, 10) 120
This number represents the number of trainable parameters (weights and biases) in the respective layer, in this case your SimpleRNN.
Edit:
The formula for calculating the weights is as follows:
recurrent_weights + input_weights + biases
*resp: (num_features + num_units)* num_units + num_units
Explanation:
num_units = equals the number of units in the RNN
num_features = equals the number features of your input
Now you have two things happening in your RNN.
First you have the recurrent loop, where the state is fed recurrently into the model to generate the next step. Weights for the recurrent step are:
recurrent_weights = num_units*num_units
The secondly you have new input of your sequence at each step.
input_weights = num_features*num_units
(Usually both last RNN state and new input are concatenated and then multiplied with one single weight matrix, nevertheless inputs and last RNN state use different weights)
So now we have the weights, whats missing are the biases - for every unit one bias:
biases = num_units*1
So finally we have the formula:
recurrent_weights + input_weights + biases
or
num_units* num_units + num_features* num_units + biases
=
(num_features + num_units)* num_units + biases
In your cases this means the trainable parameters are:
10*10 + 1*10 + 10 = 120
I hope this is understandable, if not just tell me - so I can edit it to make it more clear.
It might be easier to understand visually with a simple network like this:
The number of weights is 16 (4 * 4) + 12 (3 * 4) = 28 and the number of biases is 4.
where 4 is the number of units and 3 is the number of input dimensions, so the formula is just like in the first answer: num_units ^ 2 + num_units * input_dim + num_units or simply num_units * (num_units + input_dim + 1), which yields 10 * (10 + 1 + 1) = 120 for the parameters given in the question.
I visualize the SimpleRNN you add, I think the figure can explain a lot.
SimpleRNN layer, I'm a newbie here, can't post images directly, so you need to click the link.
From the unrolled version of SimpleRNN layer,it can be seen as a dense layer. And the previous layer is a concatenation of input and the current layer(previous step) itself.
So the number of parameters of SimpleRNN can be computed as a dense layer:
num_para = units_pre * units + num_bias
where:
units_pre is the sum of input neurons(1 in your settings) and units(see below),
units is the number of neurons(10 in your settings) in the current layer,
num_bias is the number of bias term in the current layer, which is the same as the units.
Plugging in your settings, we achieve the num_para = (1 + 10) * 10 + 10 = 120.

How to calculate the number of parameters of an LSTM network?

Is there a way to calculate the total number of parameters in a LSTM network.
I have found a example but I'm unsure of how correct this is or If I have understood it correctly.
For eg consider the following example:-
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding
from keras.layers import LSTM
model = Sequential()
model.add(LSTM(256, input_dim=4096, input_length=16))
model.summary()
Output
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
lstm_1 (LSTM) (None, 256) 4457472 lstm_input_1[0][0]
====================================================================================================
Total params: 4457472
____________________________________________________________________________________________________
As per My understanding n is the input vector lenght.
And m is the number of time steps. and in this example they consider the number of hidden layers to be 1.
Hence according to the formula in the post. 4(nm+n^2) in my example m=16;n=4096;num_of_units=256
4*((4096*16)+(4096*4096))*256 = 17246978048
Why is there such a difference?
Did I misunderstand the example or was the formula wrong ?
No - the number of parameters of a LSTM layer in Keras equals to:
params = 4 * ((size_of_input + 1) * size_of_output + size_of_output^2)
Additional 1 comes from bias terms. So n is size of input (increased by the bias term) and m is size of output of a LSTM layer.
So finally :
4 * (4097 * 256 + 256^2) = 4457472
image via this post
num_params = [(num_units + input_dim + 1) * num_units] * 4
num_units + input_dim: concat [h(t-1), x(t)]
+ 1: bias
* 4: there are 4 neural network layers (yellow box) {W_forget, W_input, W_output, W_cell}
model.add(LSTM(units=256, input_dim=4096, input_length=16))
[(256 + 4096 + 1) * 256] * 4 = 4457472
PS: num_units = num_hidden_units = output_dims
I think it would be easier to understand if we start with a simple RNN.
Let's assume that we have 4 units (please ignore the ... in the network and concentrate only on visible units), and the input size (number of dimensions) is 3:
The number of weights is 28 = 16 (num_units * num_units) for the recurrent connections + 12 (input_dim * num_units) for input. The number of biases is simply num_units.
Recurrency means that each neuron output is fed back into the whole network, so if we unroll it in time sequence, it looks like two dense layers:
and that makes it clear why we have num_units * num_units weights for the recurrent part.
The number of parameters for this simple RNN is 32 = 4 * 4 + 3 * 4 + 4, which can be expressed as num_units * num_units + input_dim * num_units + num_units or num_units * (num_units + input_dim + 1)
Now, for LSTM, we must multiply the number of of these parameters by 4, as this is the number of sub-parameters inside each unit, and it was nicely illustrated in the answer by #FelixHo
Formula expanding for #JohnStrong :
4 means we have different weight and bias variables for 3 gates (read / write / froget) and - 4-th - for the cell state (within same hidden state).
(These mentioned are shared among timesteps along particular hidden state vector)
4 * lstm_hidden_state_size * (lstm_inputs_size + bias_variable + lstm_outputs_size)
as LSTM output (y) is h (hidden state) by approach, so, without an extra projection, for LSTM outputs we have :
lstm_hidden_state_size = lstm_outputs_size
let's say it's d :
d = lstm_hidden_state_size = lstm_outputs_size
Then
params = 4 * d * ((lstm_inputs_size + 1) + d) = 4 * ((lstm_inputs_size + 1) * d + d^2)
LSTM Equations (via deeplearning.ai Coursera)
It is evident from the equations that the final dimensions of all the 6 equations will be same and final dimension must necessarily be equal to the dimension of a(t).
Out of these 6 equations, only 4 equations contribute to the number of parameters and by looking at the equations, it can be deduced that all the 4 equations are symmetric. So,if we find out the number of parameters for 1 equation, we can just multiply it by 4 and tell the total number of parameters.
One important point is to note that the total number of parameters doesn't depend on the time-steps(or input_length) as same "W" and "b" is shared throughout the time-step.
Assuming, insider of LSTM cell having just one layer for a gate(as that in Keras).
Take equation 1 and lets relate. Let number of neurons in the layer be n and number of dimension of x be m (not including number of example and time-steps). Therefore, dimension of forget gate will be n too. Now,same as that in ANN, dimension of "Wf" will be n*(n+m) and dimension of "bf" will be n. Therefore, total number of parameters for one equation will be [{n*(n+m)} + n]. Therefore, total number of parameters will be 4*[{n*(n+m)} + n].Lets open the brackets and we will get -> 4*(nm + n2 + n).
So,as per your values. Feeding it into the formula gives:->(n=256,m=4096),total number of parameters is 4*((256*256) + (256*4096) + (256) ) = 4*(1114368) = 4457472.
The others have pretty much answered it. But just for further clarification, on creating an LSTM layer. The number of params is as follows:
No of params= 4*((num_features used+1)*num_units+
num_units^2)
The +1 is because of the additional bias we take.
Where the num_features is the num_features in your input shape to the LSTM:
Input_shape=(window_size,num_features)

create a loop in random forest

I receive some bits of advice from a friend about how to improve the model, but I can't follow this instruction very well. Would someone please help me to work it out? Below is my prediction model
fit1 <- cforest(SWL ~ affect + negemo+ future+swear+sad
+negate+sexual+death + filler+leisure + conj+ funct +discrep + i
+future + past + bio + body + cogmech + death + cause + quant
+ future +incl + motion + sad + tentat + excl+insight +percept +posemo
+ppron +quant + relativ + space + article + age + s_all + s_sad + gender
, data = trainset1,
controls=cforest_unbiased(ntree=500, mtry= 1))
test_predict1<-predict(fit1, newdata=testset1, type='response')
My friend's advice:
You simply run a loop for e.g. 500 times. In each loop you create a temporary copy of the data.frame and then switch all the SWL values around randomly. Then you run the whole process on that messed up data.frame and finally save the total accuracy at the end. When it finishes you have e.g. 500 saved accuracies that you can compare to your best result.

Resources