Pytorch and Scikit-Learn - Labels starting from 0 or 1 - machine-learning

For both Pytorch and Scikit-Learn, is there any requirement for labels for start from 0 all the way to 1, 2, 3, number of classes? Or can I have labels start from 1, for example?
This is just a matter of having to apply a label encoder manually to the dataset or not.

Related

Creating 3D Convolutional Layer

ValueError: One of the dimensions in the output is <= 0 due to downsampling in conv3d_15. Consider increasing the input size. Received input shape [None, 1, 1, 1, 1904211] which would produce output shape with a zero or negative value in a dimension.
Can anyone explain me what is the meaning of this error
While i am trying to build my 3d convolutional neural network. I got this erro
A 3D convolution layer expect the input to be in a shape similar to (4, 28, 28, 28, 1). For 28x28x28 volumes with a single channel. More info here - https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv3D

How exactly do you compute the gradients for the filters in a convolutional neural network?

I learned from several articles that to compute the gradients for the filters, you just do a convolution with the input volume as input and the error matrix as the kernel. After that, you just subtract the filter weights by the gradients(multiplied by the learning rate). I implemented this process but it's not working.
I even tried doing the backpropagation process myself with pen and paper but the gradients I calculated doesn't make the filters perform any better. So am I understanding the whole process wrong?
Edit:
I will provide an example of my understanding of the backpropagation in CNNs and the problem with it.
Consider a randomised input matrix for a convolutional layer:
1, 0, 1
0, 0, 1
1, 0, 0
And a randomised weight matrix:
1, 0
0, 1
The output would be (applied ReLU activator):
1, 1
0, 0
The target for this layer is a 2x2 matrix filled with zeros. This way, we know the weight matrix should be filled with zeros also.
Error:
-1, -1
0, 0
By applying the process as stated above, the gradients are:
-1, -1
1, 0
So the new weight matrix is:
2, 1
-1, 1
This is not getting anywhere. If I repeat the process, the filter weights just go to extremely high values. So I must have made a mistake somewhere. So what is it that I'm doing wrong?
I'll give you a full example, not going to be short but hopefully you will get it. I'm omitting both bias and activation functions for simplicity, but once you get it it's simple enough to add those too. Remember, backpropagation is essentially the SAME in CNN as in a simple MLP, but instead of having multiplications you'll have convolutions. So, here's my sample:
Input:
.7 -.3 -.7 .5
.9 -.5 -.2 .9
-.1 .8 -.3 -.5
0 .2 -.1 .6
Kernel:
.1 -.3
-.5 .7
Doing the convolution yields (Result of 1st convolutional layer, and input for the 2nd convolutional layer):
.32 .27 -.59
.99 -.52 -.55
-.45 .64 .13
L2 Kernel:
-.5 .1
.3 .9
L2 activation:
.73 .29
.37 -.63
Here you would have a flatten layer and a standard MLP or SVM to do the actual classification. During backpropagation you'll recieve a delta which for fun let's assume is the following:
-.07 .15
-.09 .02
This will always be the same size as your activation before the flatten layer. Now, to calculate the kernel's delta for the current L2, you'll convolve L1's activation with the above delta. I'm not writting this down again but the result will be:
.17 .02
-.05 .13
Updating the kernel is done as L2.Kernel -= LR * ROT180(dL2.K), meaning you first rotate the above 2x2 matrix and then update the kernel. This for our toy example turns out to be:
-.51 .11
.3 .9
Now, to calculate the delta for the first convolutional layer, recall that in MLP you had the following: current_delta * current_weight_matrix. Well in Conv layer, you pretty much have the same. You have to convolve the original Kernel (before update) of L2 layer with your delta for the current layer. But this convolution will be a full convolution. The result turns out to be:
.04 -.08 .02
.02 -.13 .14
-.03 -.08 .01
With this you'll go for the 1st convolutional layer, and will convolve the original input with this 3x3 delta:
.16 .03
-.09 .16
And update your L1 kernel the same way as above:
.08 -.29
-.5 .68
Then you can start over from feeding forward. The above calculations were rounded to 2 decimal places and a learning rate of .1 was used for calculating the new kernel values.
TLDR:
You get a delta
You calculate the next delta that will be used for the next layer as: FullConvolution(Li.Input, delta)
Calculate the kernel delta that is used to update the kernel: Convolution(Li.W, delta)
Go to next layer and repeat.

Feature scaling a linear regression model and how it affects the output

I have input data that looks like:
col1 col2 col3 col4 col5 col 6
-0.1144887 -0.1717161 3847 3350 2823 2243
0.3534122 0.53008300 4230 3520 2421 3771
...
So columns 1 and 2 range from -1 to 1, and columns 3-6 range from 2000-5000
The output data ranges from 5.0 to 10.0. I expect to predict a single real-valued output for each input vector and am using a linear regression dense neural network with an 'mse' loss function.
I'm thinking I should scale columns 3-6 to between 0 and 1 and leave columns 1 and 2 as is. Is that correct or should I also scale columns 1 and 2 to be between 0 and 1? If I scale the input, does that affect my predicted output value or does it only speed up the learning? Is there any need to scale the output?
You should scale all the features in the same range. The standard way is to center to the mean value and scale using the variance:
1) compute the mean value and the variance of the features using the training set (e.g. col1_av=average(col1_train), col2_av=average(col2_train),...)
2) from each feature subtract the corresponding mean value and scale wrt the variance(e.g. [x1=-0.1144887,x2=0.3534122,...]-> (x1-col1_av)/col1_var). The sample in the test set must be scaled using the value estimated on the training set.
Having features so different in magnitude will affect also the output and not only the learning process since feature with larger magnitude will weight more in the model.
In general there is no need to scale the output.
An interesting read: https://medium.com/greyatom/why-how-and-when-to-scale-your-features-4b30ab09db5e

How to calculate partial derivatives of error function with respect to values in matrix

I am building a project that is a basic neural network that takes in a 2x2 image with the goal to classify the image as either a forward slash (1-class) or back slash (0-class) shape. The data for the input is a flat numpy array. 1 represents a black pixel 0 represents a white pixel.
0-class: [1, 0, 0, 1]
1-class: [0, 1, 1, 0]
If I start my filter as a random 4x1 matrix, how can I use gradient descent to come to either perfect matrix [1,-1,-1,1] or [-1,1,1,-1] to classify the datapoints.
Side note: Even when multiplied with the "perfect" answer matrix then summed, the label output would be -2 and 2. Would my data labels need to be -2 and 2? What if I want my classes labeled as 0 and 1?

Output dimensions of convolutional layer with Keras

The Keras tutorial gives the following code example (with comments):
# apply a convolution 1d of length 3 to a sequence with 10 timesteps,
# with 64 output filters
model = Sequential()
model.add(Convolution1D(64, 3, border_mode='same', input_shape=(10, 32)))
# now model.output_shape == (None, 10, 64)
I am confused about the output size. Shouldn't it create 10 timesteps with a depth of 64 and a width of 32 (stride defaults to 1, no padding)? So (10,32,64) instead of (None,10,64)
In k-Dimensional convolution you will have a filters which will somehow preserve a structure of first k-dimensions and will squash the information from all other dimension by convoluting them with a filter weights. So basically every filter in your network will have a dimension (3x32) and all information from the last dimension (this one with size 32) will be squashed to a one real number with the first dimension preserved. This is the reason why you have a shape like this.
You could imagine a similar situation in 2-D case when you have a colour image. Your input will have then 3-dimensional structure (picture_length, picture_width, colour). When you apply the 2-D convolution with respect to your first two dimensions - all information about colours will be squashed by your filter and will no be preserved in your output structure. The same as here.

Resources