Classifying Spikes of different periodicity among Data - signal-processing

I am trying to classify spikes in real Data. Mostly there are two classes of Spikes in the data, which have slightly different frequency and also different amplitude.
Annoyingly the frequency of the classes is not fixed, but may vary a little bit randomly, or it might even suddenly increase or decrease up to the factor of (I hope at the max) around 2. The Amplitude might change, as well, so that it is hard to tell which spike belongs to which class.
A plot of the Data and detected spikes:
detected Spikes
As you might see, it can happen that a spike is hiding behind another Spike, also there could sometimes be some noise detected as peak or so.
Do you have any Idea how to approach this? Might autocorrelation work, although the frequency can change a little over time?

Related

using LSTM on time series with different intervals

I want to build a classifier to classify time series. For each point in time series there are multiple features and a timestamp. Sometimes there is 1 second between 2 points but sometimes there could be 1 minute between timestamp.
I tought to give the time compared to the previous point as a feature.
Can LSTM handle that ?
Ultimately I think you are going to have to play with the data and see what works for your particular problem, but here are some thoughts
I have done something similar. My data contained regular gaps during part of the day and providing the time of day as a feature proved to be beneficial, however in this case it was likely useful in more ways than adjusting for the gaps.
If the size of the gap to the previous timestamp contains information that is useful to the network then definitely include it. If the gap is because there is data missing then that might not be very useful, but its worth a try.
If the data at each point is statistically similar regardless of the size of the gap then you may be able to simply feed them in as if there are no gaps.
If the gaps are causing the data to be non-stationary then that could make it harder for the network to learn. Which comes back to your question of can providing the gap size let the network correct for the non-stationary nature of the time series, it is possible but probably not ideal.
You might also want to try interpolation to fill in the missing gaps, and re-sampling the data to the level of granularity that is actually important for your prediction.

For car detection, Shall the negative samples have the same size of positive samples?

I adjusted the size of all positive samples to be of same size, so shall negative samples have the same size of positive ones.
Generally, with object detection, you are sliding a search window of a fixed size across your image, producing feature responses. The classifier then compares the responses to a trained model and reports the proximity of the two. We are relying on the fact that the same kind of objects will produce similar feature responses. For this reason you want your positive data to be of the same size in each sliding window, otherwise the responses will be different and you won't get good matches.
When you are training on the negative data, you are giving the classifier examples of responses which generally won't have anything in common, this is how the algorithm learns to partition your data. It doesn't really matter what the size of your images because you will be using the same sliding window. What matters is the data captured by that window - it should represent the data you will use at runtime. What I mean is that the sliding window should not contain either too much or to little detail. You don't really want to take a full-landscape photo, reduce it to 320x240 and then train on it. Your sliding window will capture too much information. Same goes for taking a smaller subset of a scene and blowing it up to 1280x960. Now there's too little information.
With all that said, however, things are more complicated and simpler at the same time in the real world. You will encounter objects of different sizes; therefore you need to be able to handle them at different scales. So your classifier should be searching across multiple scales, thus making image sizes irrelevant. Remember, it's what's within the sliding window that counts. And: garbage in = garbage out. Make sure your data looks good.
Edit: http://docs.opencv.org/2.4/doc/user_guide/ug_traincascade.html
But each image should be (but not necessarily) larger then a training window size, because these images are used to subsample negative image to the training size.

Will larger batch size make computation time less in machine learning?

I am trying to tune the hyper parameter i.e batch size in CNN.I have a computer of corei7,RAM 12GB and i am training a CNN network with CIFAR-10 dataset which can be found in this blog.Now At first what i have read and learnt about batch size in machine learning:
let's first suppose that we're doing online learning, i.e. that we're
using a mini­batch size of 1. The obvious worry about online learning
is that using mini­batches which contain just a single training
example will cause significant errors in our estimate of the gradient.
In fact, though, the errors turn out to not be such a problem. The
reason is that the individual gradient estimates don't need to be
super­accurate. All we need is an estimate accurate enough that our
cost function tends to keep decreasing. It's as though you are trying
to get to the North Magnetic Pole, but have a wonky compass that's
10­-20 degrees off each time you look at it. Provided you stop to
check the compass frequently, and the compass gets the direction right
on average, you'll end up at the North Magnetic Pole just
fine.
Based on this argument, it sounds as though we should use online
learning. In fact, the situation turns out to be more complicated than
that.As we know we can use matrix techniques to compute the gradient
update for all examples in a mini­batch simultaneously, rather than
looping over them. Depending on the details of our hardware and linear
algebra library this can make it quite a bit faster to compute the
gradient estimate for a mini­batch of (for example) size 100 , rather
than computing the mini­batch gradient estimate by looping over the
100 training examples separately. It might take (say) only 50 times as
long, rather than 100 times as long.Now, at first it seems as though
this doesn't help us that much.
With our mini­batch of size 100 the learning rule for the weights
looks like:
where the sum is over training examples in the mini­batch. This is
versus for online learning.
Even if it only takes 50 times as long to do the mini­batch update, it
still seems likely to be better to do online learning, because we'd be
updating so much more frequently. Suppose, however, that in the
mini­batch case we increase the learning rate by a factor 100, so the
update rule becomes
That's a lot like doing separate instances of online learning with a
learning rate of η. But it only takes 50 times as long as doing a
single instance of online learning. Still, it seems distinctly
possible that using the larger mini­batch would speed things up.
Now i tried with MNIST digit dataset and ran a sample program and set the batch size 1 at first.I noted down the training time needed for the full dataset.Then i increased the batch size and i noticed that it became faster.
But in case of training with this code and github link changing the batch size doesn't decrease the training time.It remained same if i use 30 or 128 or 64.They are saying that they got 92% accuracy.After two or three epoch they have got above 40% accuracy.But when i ran the code in my computer without changing anything other than the batch size i got worse result after 10 epoch like only 28% and test accuracy stuck there in the next epochs.Then i thought since they have used batch size of 128 i need to use that.Then i used the same but it became more worse only give 11% after 10 epoch and stuck in there.Why is that??
Neural networks learn by gradient descent an error function in the weight space which is parametrized by the training examples. This means the variables are the weights of the neural network. The function is "generic" and becomes specific when you use training examples. The "correct" way would be to use all training examples to make the specific function. This is called "batch gradient descent" and is usually not done for two reasons:
It might not fit in your RAM (usually GPU, as for neural networks you get a huge boost when you use the GPU).
It is actually not necessary to use all examples.
In machine learning problems, you usually have several thousands of training examples. But the error surface might look similar when you only look at a few (e.g. 64, 128 or 256) examples.
Think of it as a photo: To get an idea of what the photo is about, you usually don't need a 2500x1800px resolution. A 256x256px image will give you a good idea what the photo is about. However, you miss details.
So imagine gradient descent to be a walk on the error surface: You start on one point and you want to find the lowest point. To do so, you walk down. Then you check your height again, check in which direction it goes down and make a "step" (of which the size is determined by the learning rate and a couple of other factors) in that direction. When you have mini-batch training instead of batch-training, you walk down on a different error surface. In the low-resolution error surface. It might actually go up in the "real" error surface. But overall, you will go in the right direction. And you can make single steps much faster!
Now, what happens when you make the resolution lower (the batch size smaller)?
Right, your image of what the error surface looks like gets less accurate. How much this affects you depends on factors like:
Your hardware/implementation
Dataset: How complex is the error surface and how good it is approximated by only a small portion?
Learning: How exactly are you learning (momentum? newbob? rprop?)
I'd like to add to what's been already said here that larger batch size is not always good for generalization. I've seen these cases myself, when an increase in batch size hurt validation accuracy, particularly for CNN working with CIFAR-10 dataset.
From "On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima":
The stochastic gradient descent (SGD) method and its variants are
algorithms of choice for many Deep Learning tasks. These methods
operate in a small-batch regime wherein a fraction of the training
data, say 32–512 data points, is sampled to compute an approximation
to the gradient. It has been observed in practice that when using a
larger batch there is a degradation in the quality of the model, as
measured by its ability to generalize. We investigate the cause for
this generalization drop in the large-batch regime and present
numerical evidence that supports the view that large-batch methods
tend to converge to sharp minimizers of the training and testing
functions—and as is well known, sharp minima lead to poorer
generalization. In contrast, small-batch methods consistently converge
to flat minimizers, and our experiments support a commonly held view
that this is due to the inherent noise in the gradient estimation. We
discuss several strategies to attempt to help large-batch methods
eliminate this generalization gap.
Bottom-line: you should tune the batch size, just like any other hyperparameter, to find an optimal value.
The 2018 opinion retweeted by Yann LeCun is the paper Revisiting Small Batch Training For Deep Neural Networks, Dominic Masters and Carlo Luschi suggesting a good generic maximum batch size is:
32
With some interplay with choice of learning rate.
The earlier 2016 paper On Large-batch Training For Deep Learning: Generalization Gap And Sharp Minima gives some reason for not using big batches, which I paraphrase badly, as big batches are likely to get stuck in local (“sharp”) minima, small batches not.

How to measure and compare performance for two different implementations?

I have two different algorithms and want to know which one performs better in OpenGL ES.
There's this Time Profiler tool in Instruments which tells me how much % which line of code consumes of the overall processing time, but this is always relative to this algorithm.
How can I get an absolute value so I could compare which algorithm performs better? Actually I just need a percentage of overall CPU occupation. Couldn't find it in Time Profiler. Just percentages of consumed time but not overall CPU workload.
There was also a WWDC show talking about some nifty CPU tracker which showed each core separately. Which performance instrument do I need and at which values must I look for this comparison?
The situation you're talking about, optimizing OpenGL ES performance, is something that Time Profiler isn't well suited to help you with. Time Profiler simply measures CPU-side time spent in various functions and methods, not the actual load something places on the GPU when rendering. Also, the deferred nature of the iOS GPUs means that processing for draw calls can actually take place much later than you'd expect, causing certain functions to look like bottlenecks when they aren't. They just happen to be when actions queued up by earlier calls are finally executed.
As a suggestion, don't measure in frames per second, but instead report the time in milliseconds it takes from the start of your frame rendering to just after a glFinish() or -presentRenderbuffer: call. When you're profiling, you want to work directly with the time it takes to render, because it's easier to understand the impact you're having on that number than on its inverse, frames per second. Also, as you've found, iOS caps its display framerate at 60 FPS, but you can measure rendering times well below 16.7 ms to tell the difference between your two fast approaches.
In addition to time-based measurements, look at the Tiler and Renderer Utilization statistics in the OpenGL ES Driver instrument to see the load you are placing on the vertex and fragment processing portions of the GPU. When combined with the overall CPU load of your application while rendering, this can give a reasonable representation of the efficiency of one approach vs. another.
To answer your last question, the Time Profiler instrument has the CPU strategy, which lets you view each CPU core separately. Above the instrument list are three small buttons, where the center one is initially selected.
Click the left button to show the CPU strategy.

Performace loss for using non-power of two textures

Is there any performance loss for using non-power-of-two textures under iOS? I have not noticed any my in quick benchmarks. I can save quite a bit of active memory by dumping them all together since there is a lot of wasted padding (despite texture packing). I don't care about the older hardware that can't use them.
This can vary widely depending on the circumstances and your particular device. On iOS, the loss is smaller if you use NEAREST filtering rather than LINEAR, but it isn't huge to begin with (think 5-10%).

Resources