Is there a way to summarize the features of many time series? - time-series

I'm actually trying to detect characteristics of the time series for a very big region composed of many smaller subregions (in my case pixels). I don't know much about this, so the only way I can come up with is an averaged time series for the entire region, although I know this would definitely conceal many features by averaging.
I'm just wondering if there are any widely used techniques that can detect the common features of a suite of time series? like pattern recognition or time series classification?
Any ideas/suggestions are much appreciated!
Thanks!
Some extra explanations: I'm dealing with remote sensing images of several years with a time step of 7 days. So for each pixel, there is a time series associated, with values extracted from this pixel on different dates.So if I define a region consisting of many pixels, is there a way to detect or extract some common features charactering all or most of the time series of pixels within this region? Such as the shape of the time series, or a date around which there's an obvious increase in the values?

You could compute the correlation matrix for the pixels. This would simply be:
corr = np.zeros((npix,npix))
for i in range(npix):
for j in range(npix):
corr(i,j) = sum(data(i,:)*data(j,:))/sqrt(sum(data(i,:)**2)*sum(data(j,:)**2))
If you want more information, you can compute this as a function of time, i.e. divide your time series into blocks (say minutes) and compute the correlation for each of them. Then you can see how the correlation changes over time.
If the correlation changes a lot, you may be more interested in the cross-power spectrum of the pixels. This is defined as
cpow(i,j,:) = (fft(data(i,:))*conj(fft(data(j,:)))
This will tell you how much pixel i and j tend to change together on various time-scales. For example, they could be moving in unison in time-scales of a second (1 Hz), but also have changes on a time-scale of, say, 10 seconds which are not correlated with each other.
It all depends on what you need, really.

Related

How to Optimise OpenCV Histogram Comparison for Image Similarity?

I use example code to compare HSV histograms using EMD.
I want to find similar images in people's (mobile) picture library. It's quite common that people take several images of the same subject (in a row) with just slight changes: zooming in/out a bit, different angle, different exposure as a result of changing position, other pose, ....
I selected 4 sets of 4 similar images to test this algorithm. When comparing the images inside the sets, I get 22 EMD-L1 values between roughly 0.25 and 2.25 (average 1.47) and 2 outliers around 7.2.
When I cross-comparing between sets I get values between 2 and 15 with an average around 8.
Yes, there is a significant range difference between the two result sets. But I was disappointed that there was no (gap) between these ranges, and instead a small overlap [2.0, 2.25]. I'm hoping to improve the algorithm.
How can I optimise my comparison for my particular use-case? There are various histogram forms, various histogram comparison algorithms, and then each has various parameters.
Does OpenCV implement the fastest known EMD algorithm? I was surprised that the comparison of some histograms took up to a second; especially with the relatively small bin numbers.
Then, some cross-comparisons give good EMD results, but have totally different RGB histograms. Here are two images:
My current EMD-L1 says 1.95, but the RGB histograms are totally different.
Probably you've already refined your comparison method. But this might not be obvious, you could divide the image into overlapping subregions, and then compute the EMD for all 4 parts.

Is it possible to use core motion for distance measurement [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Getting displacement from accelerometer data with Core Motion
Android accelerometer accuracy (Inertial navigation)
I am trying to use core motion user acceleration values, and double integrating them to derive distance covered. I move my iPhone linearly along its Y axis, against a 30 cm log ruler, on the table. First, I let the device be at rest for 10 seconds, and I calculate my offsets along the three axes, by averaging the respective user acceleration values.
The X, Y and Z offsets are subtracted from the acceleration values, when I try calculating the distance covered. After offset subtraction, these values are passed through a low pass filter and a median filter, separately of course. The filters are linear filters, and the cut-off frequency is specified by the number of neighbouring values whose mean is taken in low pass, and median in the median filter. I have experimented with varying values of this number from 1 to 100. In the end, these filtered values are double integrated using trapezoidal rule to get distances. But, the distance calculated is no where close to 30 cm. The closest value I got was some -22 cm(I am wondering why I am getting negative values even though I move the device in positive Y direction). I also came across this:
http://ajnaware.wordpress.com/2008/09/05/accelerating-iphones/
its an old post about the same thing, which says that the accelerometer readings returned appeared to come in quanta of about 0.18m/s^2 (ie. about 0.018g), resulting in a large cumulative error very quickly. Going by that, for this error to really not matter, one will have to accelerate the device by almost 1.8m/s^2, which is practically impossible for distance/length measurement purposes. for small movements, it does not look like there is a possibility of calculating distances by using an optimal filter and a higher order numerical integration method, without an impractical velocity/acceleration constraint like that. Is it possible?
How about using my acceleration vs timestamp data to interpolate a polynomial that grows over time, as I get more and more motion updates, which represents approximately an acceleration vs time curve. Double integration of ths polynomial would be a piece of cake. But, for small distances, the polynomial will have a big error component. Using a predictable known motion that my device will be subjected to, I wish to take a huge number of snapshots (calculated distance vs actual known distance) to calculate my error polynomial in a similar way, and then subtract it from my first polynomial. Can this work?
Although this does not fit StackOverflow, because it's not a question but a discussion, I'll try to sum up my thoughts about it.
As already said, the accelerometer is very inaccurate and you would need very good accuracy for this kind of task, especially if you are trying to measure such short distances. Plus, accelerometers differ from device to device, you will get different results for the same movements with different device. Plus a very huge random error.
My guess is, that you can get rid of a huge part of randomness/error by calibrating the device and making the "measurement move" a couple of times, like 10 times. After that you have enough data to get an average that might get close to the real value.
Calibration is a key part here, you have to think of a clever way to calibrate, like letting the user move the device over different distances in different speeds.
But all this is just theory. I would really like to see your results, but I doubt you get it working good enough even using the best possible filters/algorithms, since there is just too much noise.

GPUImage Taking sum of columns of image

Im using GPUImage in my project and I need an efficient way of taking the column sums. Naive way would obviously be retrieving the raw data and adding values of every column. Can anybody suggest a faster way for that?
One way to do this would be to use the approach I take with the GPUImageAverageColor class (as described in this answer), only instead of reducing the total size of each frame at each step, only do this for one dimension of the image.
The average color filter determines the average color of the overall image by stepping down in a factor of four in both X and Y, averaging 16 pixels into one at each step. If operating in a single direction, you should be able to use hardware interpolation to get an 18X reduction in a single direction per step with good performance. Your final step might either require a quick CPU-based iteration on the much smaller image or a tweaked version of this shader that pulls the last few pixels in a column together into the final result pixel for that column.
You notice that I've been talking about averaging here, because the output values for any OpenGL ES operation will need to be in terms of colors, which only have a 0-255 range per channel. A sum will easily overflow this, but you could use an average as an approximation of your sum, with a more limited dynamic range.
If you only care about one color channel, you could possibly encode a larger value into the RGBA channels and maintain a 32-bit sum that way.
Beyond what I describe above, you could look at performing this sum with the help of the Accelerate framework. While probably not quite as fast as doing a shader-based reduction, it might be good enough for your needs.

Image retrieval - edge histogram

My lecturer has slides on edge histograms for image retrieval, whereby he states that one must first divide the image into 4x4 blocks, and then check for edges at the horizontal, vertical, +45°, and -45° orientations. He then states that this is then represented in a 14x1 histogram. I have no idea how he came about deciding that a 14x1 histogram must be created. Does anyone know how he came up with this value, or how to create an edge histogram?
Thanks.
The thing you are referring to is called the Histogram of Oriented Gradients (HoG). However, the math doesn't work out for your example. Normally you will choose spatial binning parameters (the 4x4 blocks). For each block, you'll compute the gradient magnitude at some number of different directions (in your case, just 2 directions). So, in each block you'll have N_{directions} measurements. Multiply this by the number of blocks (16 for you), and you see that you have 16*N_{directions} total measurements.
To form the histogram, you simply concatenate these measurements into one long vector. Any way to do the concatenation is fine as long as you keep track of the way you map the bin/direction combo into a slot in the 1-D histogram. This long histogram of concatenations is then most often used for machine learning tasks, like training a classifier to recognize some aspect of images based upon the way their gradients are oriented.
But in your case, the professor must be doing something special, because if you have 16 different image blocks (a 4x4 grid of image blocks), then you'd need to compute less than 1 measurement per block to end up with a total of 14 measurements in the overall histogram.
Alternatively, the professor might mean that you take the range of angles in between [-45,+45] and you divide that into 14 different values: -45, -45 + 90/14, -45 + 2*90/14, ... and so on.
If that is what the professor means, then in that case you get 14 orientation bins within a single block. Once everything is concatenated, you'd have one very long 14*16 = 224-component vector describing the whole image overall.
Incidentally, I have done a lot of testing with Python implementations of Histogram of Gradient, so you can see some of the work linked here or here. There is also some example code at that site, though a more well-supported version of HoG appears in scikits.image.

Algorithm for detecting peaks from recorded, noisy data. Graphs inside

So I've recorded some data from an Android GPS, and I'm trying to find the peaks of these graphs, but I haven't been able to find anything specific, perhaps because I'm not too sure what I'm looking for. I have found some MatLab functions, but I can't find the actual algorithms that do it. I need to do this in Java, but I should be able to translate code from other languages.
As you can see, there are lots of 'mini-peaks', but I just want the main ones.
Your solution depends on what you want to do with the data. If you want to do very serious things then you should most likely use (Fast) Fourier Transforms, and extract both the phase and frequency output from it. But that's very computationally intensive and takes a long while to program. If you just want to do something simple that doesn't require a lot of computational resources, then here's a suggestion:
For that exact problem i implemented the below algorithm a few hours ago. I invented the algorithm myself so i do not know if it has a name already, but it is working great on very noisy data.
You need to determine the average peak-to-peak distance and call that PtP. Do that measurement any what you like. Judging from the graph in your case it appears to be about 35. In my code i have another algorithm i invented to do that automatically.
Then choose a random starting index on the graph. Poll every new datapoint from then on and wait until the graph has either risen or fallen from the starting index level by about 70% of PtP. If it was a fall then that's a tock. If it was a rise then that's a tick. Store that level as the last tick or tock height. Produce a 'tick' or 'tock' event at this index.
Continue forward in the data. After ticks, if the data continues to rise after that point then store that level as the new 'height-of-tick' but do not produce a new tick event. After tocks, if the data continues to fall after that point then store that level as the new 'depth-of-tock' but do not produce a new tock event.
If last event was a tock then wait for a tick, if last event was a tick then wait for a tock.
Each time you detect a tick, then that should be a peak! Good luck.
I think what you want to do is run this through some sort of low-pass filter. Depending on exactly what you want to get out of this dataset, a simple "box car" filter might be
sufficient: at each point, take the average of the N samples centered on that point,
and take the average as the filtered value. The larger N is, the more aggressively smoothed the filtered data will be.
I guess you have lots of points... Calculate mean value of them, subtract it from all point's values and get highest point value (negative or positive) from each range where points have same sign till they change it. I hope I am clear...
With particulary nasty and noisy data I usually use smoothing. Easiest example of smoothing is moving average. Then you can find peacks on that moving average. And then you simply go back to your original data and take the closest peak to one you found on moving average.
I've done some looking into peak detection and I can tell you that if your data doesn't behave, it could mess up your algorithm. Off the top of my head, you could try: Pick a threshold, i.e threshold = 250. If data is above threshold, find the max at that period. This is assuming that the data you have has a mean about 230. Not sure how fancy you want to get. Hope that helps.

Resources