My confusion is about pipeline. Suppose my code is
pipe=Pipeline([('sc',StandardScaler()),
('pca',PCA(n_components=2)),
('lr',LinearRegression())])
and i called pipe.fit(X_train,y_train). Does this also scale the y_train values?
No, it doesn't. If pipeline scaled labels too, you would get scaled predictions as well.
No, it does not.
Pipeline sequentially applies the fit method and then the transform method to each of the steps, except for the last one, which only needs the fit method. Your first two classes in the pipeline are StandardScaler and PCA, and both of them apply the fit method ignoring the y_train values, therefore, they only depend on the X_train data. For the final step, LinearRegression will receive the transformed X_train values, and will call the fit method with them, but also with the original y_train values.
Related
I am trying to conduct a simple feature scaling in PyTorch. For example, I have an image, and I want to scale certain pixel values down by 10. Now I have 2 options:
Directly divide those features by 10.0 in __getitem__ function in dataloader;
Pass the original features into the model forward function, but before pass them through trainable layers, scale down the corresponding features.
I have conducted several experiments, but observed after the first epoch, the validation losses between the two would start to diverge slightly. While after a couple hundreds of epochs, the two trained models would vary largely. Any suggestion on this?
I'm running a FCN in Keras that uses the binary cross-entropy as the loss function. However, im not sure how the losses are accumulated.
I know that the loss gets applied at the pixel level, but then are the losses for each pixel in the image summed up to form a single loss per image? Or instead of being summed up, is it being averaged?
And furthermore, are the loss of each image simply summed(or is it some other operation) over the batch?
I assume that you question is a general one, and to specific to a particular model (if not can you share your model?).
You are right that if the cross-entropy is used at a pixel level, the results have to be reduced (summed or averaged) over all pixels to get a single value.
Here is an example of a convolutional autoencoder in tensorflow where this step is specific:
https://github.com/udacity/deep-learning/blob/master/autoencoder/Convolutional_Autoencoder_Solution.ipynb
The relevant lines are:
loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=targets_, logits=logits)
cost = tf.reduce_mean(loss)
Whether you take the mean or sum of the cost function does not change the value of the minimizer. But If you take the mean, then the value of the cost function is more easily comparable between experiments when you change the batch size or image size.
My FCN is trained to detect 10 different classes and produces an output of 500x500x10 with each of the final dimensions being the prediction probabilities for a different class.
Usually, I've seen using a uniform threshold, for instance 0.5, to binarize the probability matrices. However, in my case, this doesn't quite cut it because the IoU for some of the classes increases when the threshold is 0.3 and for other classes it is 0.8.
Hence, I don't have to arbitrarily pick the threshold for each class but rather use a more probabilistic approach to finalizing the threshold values. I thought of using CRFs but this also requires the thresholding to have already been done. Any ideas on how to proceed?
Example: consider an image of a forest with 5 different birds. Now im trying to output an image that has segmented the forest and the five birds, 6 classes, each with a separate label. The network outputs 6 confusion matrices indicating the confidence that a pixel falls into a particular class. Now, the correct answer for a pixel isnt always the class with the highest confidence value. Therefore, a one size fits all method or a max value method won't work.
CRF Postprocessing Approch
You don't need to set thresholds to use a CRF. I'm not familiar with any python libraries for CRFs, but in principle, what you need to define is:
A probability distribution of the 10 classes for each of the nodes
(pixels), which is simply the output of your network.
Pairwise potentials: 10*10 matrix, where element Aij denotes the "strength" of the configuration that one pixel is of class i and the other of class j. If you set the potentials to have a value alpha (alpha >> 1) in the diagonal and 1 elsewhere, then alpha is the regularization force that gives you consistency of the predictions (if pixel X is of class Y, then the neighboring pixels of X are more likely to be of the same class).
This is just one example of how you can define your CRF.
End to End NN Approach
Add a loss to your network that will penalize pixels that have neighbors of a different class. Please note that you will still end up with a tune-able parameter for the weight of the new regularization loss.
I am reading Fit generator and data augmentation in keras, but there are still something that I am not quite sure about image augmentation in keras.
(1) In datagen.flow(), we also set a batch_size. I know batch_size is needed if we do mini-batch training, so are these two batch_size values the same, i mean, if we indicate batch_size in flow() generator, are we assuming we will do mini-batch training with the same batch_size?
(2)
Let me assume the size of training set is 10,000. I guess the only difference between model.fit_generator() and model.fit() at each epoch is that, for the former one, we are using 10,000 of randomly transformed images, rather than the original 10,000 ones. But for other epochs, we are using another 10,000 images which are totally different than those used in the first epoch, because all the images are randomly generated. Is it right?
It is like we are always using new images at each epoch, which is different from the ordinary case, when the same set of images are used at each epoch.
I am new to this area. Please help!
the 1st question:the answer is YES.
the 2nd question:yes we are always using new images at each epoch,if we use data augmentation in model.fit_generator()
I modified the MNIST example and when I train it with my 3 image classes it returns an accuracy of 91%. However, when I modify the C++ example with a deploy prototxt file and labels file, and try to test it on some images it returns a prediction of the second class (1 circle) with a probability of 1.0 no matter what image I give it - even if it's images that were used in the training set. I've tried a dozen images and it consistently just predicts the one class.
To clarify things, in the C++ example I modified I did scale the image to be predicted just like the images were scaled in the training stage:
img.convertTo(img, CV_32FC1);
img = img * 0.00390625;
If that was the right thing to do, then it makes me wonder if I've done something wrong with the output layers that calculate probability in my deploy_arch.prototxt file.
I think you have forgotten to scale the input image during classification time, as can be seen in line 11 of the train_test.prototxt file. You should probably multiply by that factor somewhere in your C++ code, or alternatively use a Caffe layer to scale the input (look into ELTWISE or POWER layers for this).
EDIT:
After a conversation in the comments, it turned out that the image mean was mistakenly being subtracted in the classification.cpp file whereas it was not being subtracted in the original training/testing pipeline.
Are your train classes balanced?
You may get to a stacked network on a prediction of one major class.
In order to find the issue I suggest to output the train prediction during training compared to predictions with the forward example on same train images from a different class.