While training the LSTM model, I encountered one problem that I couldn't solve. To begin with, let me describe my model: I used a stacked LSTM model in Pytorch with 3 layers, 256 hidden units in each layer to predict human joint torques and joint angles from EMG features. After the training, the model can predict well when the ground truth is far away from 0, but when the ground truth is near zero, there is always an offset between the predicted value and the ground truth. I guess the reason would be that the large value of ground truth will give more impact during the training process to reduce the loss function.
This is the result:
The prediction for the validation set
The prediction for the training set
As you can see from the figures, in both datasets, the model can predict well when the ground truth is above 20 degrees. I have tried with different loss functions but the situation did not improve. Since I am just a beginner in this field, I hope someone can point out the problem in my method and how to solve it. Thank you!
Related
I'm trying to train a custom dataset on using faster_rcnn using the Pytorch implementation of Detectron here. I have made changes to the dataset and configuration according to the guidelines in the repo.
The training process is carried out successfully, but the loss_cls and loss_bbox values are 0 from the beginning and even though the training is completed, final output cannot be used to make an evaluation or an inference.
I would like to know what these two mean and how to get those values to change during the training. The exact model I'm using here is e2e_faster_rcnn_R-50-FPN_1x
Any help regarding this would be appreciated. I' using Ubuntu 16.04 with Python 3.6 on Anaconda, CUDA 9, cuDNN 7.
What are the two losses?
When training a multi-object detector, you usually have (at least) two types of losses:
loss_bbox: a loss that measures how "tight" the predicted bounding boxes are to the ground truth object (usually a regression loss, L1, smoothL1 etc.).
loss_cls: a loss that measures the correctness of the classification of each predicted bounding box: each box may contain an object class, or a "background". This loss is usually called cross entropy loss.
Why are the losses always zero?
When training a detector, the model predict quite a few (~1K) possible boxes per image. Most of them are empty (i.e. belongs to "background" class). The loss function associate each of the predicted boxes with the ground truth boxes annotation of the image.
If a predicted box has a significant overlap with a ground truth box then loss_bbox and loss_cls are computed to see how well the model is able to predict the ground truth box.
On the other hand, if a predicted box has no overlap with any ground truth box, than only loss_cls is computed for the "background" class.
However, if there is only very partial overlap with ground truth the predicted box is "discarded" and no loss is computed. I suspect, for some reason, this is the case for your training session.
I suggest you check the parameters that determines the association between predicted boxed and ground truth annotations. Moreover, look at the parameters of your "anchors": these parameters determines the scale and aspect ratios of the predicted boxes.
I'm using naive Bayes for text classification and I have 100k records in which 88k are positive class records and 12krecords are negative class records. I converted sentences to unigrams and bigrams using countvectorizer and I took alpha range from [0,10] with 50 values and I draw the plot.
In Laplace additive smoothing, If I keep increasing the alpha value then accuracy on the cross-validation dataset also increasing. My question is is this trend expected or not?
If you keep increasing the alpha value then naive bayes model will bias towards the class which has more records and model becomes a dumb model(underfitting) so by choosing small alpha value is good idea.
Because you have 88k Positive Point and 12K negative point which means that you have unbalanced data set.
You can add more negative point to balanced data set, you can clone or replicate your negative point which we called upsampling. After that, your data set is balanced now you can apply naive bayes with alpha it will work properly, now your model is not dumb model, earlier you model was dumb that's why as increase alpha it increase you Accuracy.
I am currently using sklearn's Logistic Regression function to work on a synthetic 2d problem. The dataset is shown as below:
I'm basic plugging the data into sklearn's model, and this is what I'm getting (the light green; disregard the dark green):
The code for this is only two lines; model = LogisticRegression(); model.fit(tr_data,tr_labels). I've checked the plotting function; that's fine as well. I'm using no regularizer (should that affect it?)
It seems really strange to me that the boundaries behave in this way. Intuitively I feel they should be more diagonal, as the data is (mostly) located top-right and bottom-left, and from testing some things out it seems a few stray datapoints are what's causing the boundaries to behave in this manner.
For example here's another dataset and its boundaries
Would anyone know what might be causing this? From my understanding Logistic Regression shouldn't be this sensitive to outliers.
Your model is overfitting the data (The decision regions it found perform indeed better on the training set than the diagonal line you would expect).
The loss is optimal when all the data is classified correctly with probability 1. The distances to the decision boundary enter in the probability computation. The unregularized algorithm can use large weights to make the decision region very sharp, so in your example it finds an optimal solution, where (some of) the outliers are classified correctly.
By a stronger regularization you prevent that and the distances play a bigger role. Try different values for the inverse regularization strength C, e.g.
model = LogisticRegression(C=0.1)
model.fit(tr_data,tr_labels)
Note: the default value C=1.0 corresponds already to a regularized version of logistic regression.
Let us further qualify why logistic regression overfits here: After all, there's just a few outliers, but hundreds of other data points. To see why it helps to note that
logistic loss is kind of a smoothed version of hinge loss (used in SVM).
SVM does not 'care' about samples on the correct side of the margin at all - as long as they do not cross the margin they inflict zero cost. Since logistic regression is a smoothed version of SVM, the far-away samples do inflict a cost but it is negligible compared to the cost inflicted by samples near the decision boundary.
So, unlike e.g. Linear Discriminant Analysis, samples close to the decision boundary have disproportionately more impact on the solution than far-away samples.
I trying to implement shape detector in tensorflow. For this, I am having two class, one is only vertical rectangle and another one is only right arrow, like the following images.
Training is done with 190 samples for each class.
I trained the model two times, one without ZCA whitened training data and another one with ZCA whitened training data, with the same network architecture and the same number of iterations.
When the following image of down arrow is tested with the first model, it is predicted as rectangle with 99.99 percent accuracy, but when the same image is tested with the second model (trained with the ZCA whitened samples), it is predicted as arrow with 100 percent accuracy.
I want to know how ZCA whitening changed the accuracy percent that much drastically, even though, no data augmentation (like rotation) is used for the training.
Any kind of help would be greatly appreciated.
I am using One-Class SVM for outlier detections. It appears that as the number of training samples increases, the sensitivity TP/(TP+FN) of One-Class SVM detection result drops, and classification rate and specificity both increase.
What's the best way of explaining this relationship in terms of hyperplane and support vectors?
Thanks
The more training examples you have, the less your classifier is able to detect true positive correctly.
It means that the new data does not fit correctly with the model you are training.
Here is a simple example.
Below you have two classes, and we can easily separate them using a linear kernel.
The sensitivity of the blue class is 1.
As I add more yellow training data near the decision boundary, the generated hyperplane can't fit the data as well as before.
As a consequence we now see that there is two misclassified blue data point.
The sensitivity of the blue class is now 0.92
As the number of training data increase, the support vector generate a somewhat less optimal hyperplane. Maybe because of the extra data a linearly separable data set becomes non linearly separable. In such case trying different kernel, such as RBF kernel can help.
EDIT: Add more informations about the RBF Kernel:
In this video you can see what happen with a RBF kernel.
The same logic applies, if the training data is not easily separable in n-dimension you will have worse results.
You should try to select a better C using cross-validation.
In this paper, the figure 3 illustrate that the results can be worse if the C is not properly selected :
More training data could hurt if we did not pick a proper C. We need to
cross-validate on the correct C to produce good results