I'm working on the MNIST dataset, and was wondering about the processing time. I've performed the KMeans function with NN classifier and was able to receive around 116s processing time. Didn't get to implement this, but if I now perform a KNN algorithm on the respective clusters, will I get a slower or faster processing time, and why?
My thoughts are that since we increase the number of neighbors, the processing time will be much faster, but the error rate will increase. Am I wrong?
Related
I was trying to train an emotion recognition model on the fer2013 dataset using the architecture proposed in this paper
The paper uses different dataset than mine so I did some modifications on on the stride and filter size.
After a couple hours of training, accuracy on both training and test set suddenly drops.
After that the accuracy just stay around 0.1-0.2 for both set, never improve anymore.
Does anybody know about this phenomenon?
In any neural network training, if both accuracies i.e. training and validation improves at first and then starts decreasing, it is a sign that your network is failing to converge. More appropriately, your optimizer has started overshooting.
One most likely reason for this could be high learning rate. Reduce your learning rate and then check your example again. Also, in your linked paper, (at least in first glimpse), I couldn't see learning rate mentioned. Since your data is different from the paper's, same learning rate might not work as well.
I am working on anomaly detection problem and I need your help and expertise. I have a sensor that records episodic time series data. For example, once in a while, the sensor activates for 10 seconds and records values at millisecond interval. My task is to identify whether the recorded pattern is not normal. In other words, I need to detect anomalies in that pattern compared to other recorded patterns.
What would be the state-of-the-art approaches to that?
After doing my own research, the following methods proven to work very well in practice:
Variational Inference for On-line Anomaly Detection in
High-Dimensional Time Series
Multivariate Industrial Time Series
with Cyber-Attack Simulation: Fault Detection Using an LSTM-based
Predictive Data Model
I'm training a neural network in TensorFlow (using tflearn) on data that I generate. From what I can tell, each epoch we use all of the training data. Since I can control how many examples I have, it seems like it would be best to just generate more training data until one epoch is enough to train the network.
So my question is: Is there any downside to only using one epoch, assuming I have enough training data? Am I correct in assuming that 1 epoch of a million examples is better than 10 epochs of 100,000?
Following a discussion with #Prune:
Suppose you have the possibility to generate an infinite number of labeled examples, sampled from a fixed underlying probability distribution, i.e. from the same manifold.
The more examples the network see, the better it will learn, and especially the better it will generalize. Ideally, if you train it long enough, it could reach 100% accuracy on this specific task.
The conclusion is that only running 1 epoch is fine, as long as the examples are sampled from the same distribution.
The limitations to this strategy could be:
if you need to store the generated examples, you might run out of memory
to handle unbalanced classes (cf. #jorgemf answer), you just need to sample the same number of examples for each class.
e.g. if you have two classes, with 10% chance of sampling the first one, you should create batches of examples with a 50% / 50% distribution
it's possible that running multiple epochs might make it learn some uncommon cases better.
I disagree, using multiple times the same example is always worse than generating new unknown examples. However, you might want to generate harder and harder examples with time to make your network better on uncommon cases.
You need training examples in order to make the network learn. Usually you don't have so many examples in order to make the network converge, so you need to run more than one epoch.
It is ok to use only one epoch if you have so many examples and they are similar. If you have 100 classes but some of them only have very few examples you are not going to learn those classes only with one epoch. So you need balanced classes.
Moreover, it is a good idea to have a variable learning rate which decreases with the number of examples, so the network can fine tune itself. It starts with a high learning rate and then decreases it over time, if you only run for one epoch you need to bear in mind this to tweak the graph.
My suggestion is to run more than one epoch, mostly because the more examples you have the more memory you need to store them. But if memory is fine and learning rate is adjusted based on number of examples and not epochs, then it is fine run one epoch.
Edit: I am assuming you are using a learning algorithm which updates the weights of the network every batch or similar.
I write my own neural net library with backpropagation using gpu computing.
Want to make it universal, that I dont must check if the training set fits to the gpu memory.
How do you train a neural net, when the training set is too large to fit in gpu memory?
I assume that it fits in RAM of the host.
Must I do the train iteration on the firts piece, then deallocate it on the device and send the second piece to the device and train on that, so on ...
And then sum up the gradient results.
Is it not too slow, when i must push all the data trough the PCIe bus?
Have you a better idea?
Use minibatch gradient descent: in a loop,
send a batch of samples to the GPU
compute error, backprop gradient
adjust parameters.
Repeat this loop several times until the network converges.
This is not exactly equivalent to the naive batch learning algorithm (batch gradient descent): in fact it usually converges faster than batch learning. It helps if you randomly shuffle the samples before each training loop. So you still have the memory transfers, but you don't need as many iterations and the algorithm will run faster.
I wrote an image processing program that train some classifier to recognize some object in the image. now I want to test the response of my algorithm to noise. I wish the algorithm have some robustness to noise.
My question is that, should I train the classifier using noisy version of train dataset, or train the classifier using original version of dataset, and see its performance on noisy data.
Thank you.
to show robustness of classifier one might use highly noisy test data on the originally trained classifier. depending on that performance, one can train again using noisy data and then test again. obviously for an application development, if including extremely noisy samples increase accuracy then that's the way to go. literature says to have as large a range of training samples as possible. however sometimes this degrades performances in specific cases.