Time series classification and prediction sensor data from CAN bus - time-series

i am trying a time series classification task where i have to predict/classify in advance a driving scenario, making sure my predictions are accurate too and how ahead or time or delayed, it made a prediction. I am working with lstm model but it is too much overfitted. Accuracy curves for validation not changing, confusion matrix too is detecting only one label for each of my 3 labels. I tried SMOTE; sklearn class weights too, nothing improved, my ques is is there data augmentation for time series data too?.... and other problems too if some solutions or hints can be given.....
i am trying a time series classification task where i have to predict/classify in advance a driving scenario, making sure my predictions are accurate too and how ahead or time or delayed, it made a prediction. I am working with lstm model but it is too much overfitted. Accuracy curves for validation not changing, confusion matrix too is detecting only one label for each of my 3 labels. I tried SMOTE; sklearn class weights too, nothing improved, my ques is is there data augmentation for time series data too?.... and other problems too if some solutions or hints can be given.....

Related

overfitting and data augmentation in random forest, prediction

I want to make a prediction model using Random Forest, but overfitting occurs. We adjusted various parameters due to overfitting, but there is no big change.
When I looked up the reason, I checked the Internet post that it could be caused by a small number of data (1,000). As you know, in the case of image classification, data augmentation increases the amount of data by gradually transforming the shape and angle of the image.
How about increasing the amount of data in predictions like this? And we copied the entire data, and we made about three times as many data as three thousand. This prevents overfitting and increases accuracy.
But I'm not sure if this is the right way in terms of data science, so I'm writing like this.
In addition to these methods, I would like to ask you how to avoid overfitting the prediction problem or how to increase the amount of data.
Thank you!

LSTM cannot predict well when the ground truth is near zero

While training the LSTM model, I encountered one problem that I couldn't solve. To begin with, let me describe my model: I used a stacked LSTM model in Pytorch with 3 layers, 256 hidden units in each layer to predict human joint torques and joint angles from EMG features. After the training, the model can predict well when the ground truth is far away from 0, but when the ground truth is near zero, there is always an offset between the predicted value and the ground truth. I guess the reason would be that the large value of ground truth will give more impact during the training process to reduce the loss function.
This is the result:
The prediction for the validation set
The prediction for the training set
As you can see from the figures, in both datasets, the model can predict well when the ground truth is above 20 degrees. I have tried with different loss functions but the situation did not improve. Since I am just a beginner in this field, I hope someone can point out the problem in my method and how to solve it. Thank you!

Keras accuracy plots flat while loss plots not flat

I am doing deep learning using a multi-layer perceptron for regression. The loss curve turns flat in the third epoch however accuracy curve remains flat at the beginning. I wonder whether this makes sense.
Since you didn't provide the code, it would be harder to narrow down what is the problem. Being said, here are some pointers that might help you see what is the problem:
Validation set is either small or it is a bad representation of your training set. (bear in mind, if you are using validation_split in fit function, then keras will only take the last percentage of your training set and will keep it the same for all epochs. link]).
You are not using any regularization (Dropout, Regularization, Constraints).
The model could be small (layers- and neurons-wise), so it is underfitting.
Hope these pointers help you with your problem.

How to judge whether model is overfitting or not

I am doing video classification with a model combining CNN and LSTM.
In the training data, the accuracy rate is 100%, but the accuracy rate of the test data is not so good.
The number of training data is small, about 50 per class.
In such a case, can I declare that over learning is occurring?
Or is there another cause?
Most likely you are indeed overfitting if the performance of your model is perfect on the training data, yet poor on test/validation data set.
A good way of observing that effect is to evaluate your model on both training and validation data after each epoch of training. You might observe that while you train, the performance on your validation set is increasing initially, and then starts to decrease. That is the moment when your model starts to overfit and where you can interrupt your training.
Here's a plot demonstrating this phenomenon with the blue and red lines corresponding to errors on training and validation sets respectively.

Overfitting in convolutional neural network

I was applying CNN for classification of hand gestures I have 10 gestures and 100 images for each gestures. Model constructed by me was giving accuracy around 97% on training data, and I got 89% accuracy in testing data. Can I say that my model is overfitted or is it acceptable to have such accuracy graph(shown below)?
Add more data to training set
When you have a large amount of data(all kinds of instances) in your training set, it is good to create an overfitting model.
Example: Let's say you want to detect just one gesture say 'thumbs-up'(Binary classification problem) and you have created your positive training set with around 1000 images where images are rotated, translated, scaled, different colors, different angles, viewpoint varied, back-ground cluttered...etc. And if your training accuracy is 99%, your test accuracy will also be somewhere close.
Because our training set is big enough to cover all instances of the positive class, so even if the model is overfitted, it will perform well with the test set as the instances in the test set will only be a slight variation to that of the instances in the training set.
In your case, your model is good but if you can add some more data, you will get even better accuracy.
What kind of data to add?
Manually go through the test samples which the model got wrong and check for patterns if you can figure out what kind of samples are going wrong, you can add such kind to your training set and re-train again.

Resources