I need to find some software to detect seasonalities in time-series. I have detected bi-weekly seasonality, using MAPLE, but it's only graphics without any numbers
I need something not only to show main seasonalities, but also to find one that can detect some sub-seasonalities and (it would be really-really awesome) calculate seasonal indexes. Or, maybe, some other way to do it, using MAPLE? I've already tried to find some software, but the main problem was, that it can calculate seasonality only if i give it a period. And I need something to find that period.
One way to try to find the period would be to use a Fast Fourier Transform (FFT) to transform your data to the frequency domain. Then by plotting the power spectrum for the transformed data, you can find the frequency, and by extension, the period (which is equal to 1/frequency). Here's an example in Maple working with a time series of sunspot data:
https://www.maplesoft.com/support/help/Maple/view.aspx?path=applications/SunspotPeriodicity
Related
So far, I have calculated the evoked potentials. However, I would like to see if there is relatively more activity in the theta band wrt the other bands. When I use mne.Evoked.filter, I get a plot which lookes a lot like a sine wave, containing no useful information. Furthermore, the edge regions (time goes from -0.2s to 1s) are highly distorted.
Filtering will always result in edge artifacts, especially for low frequencies like theta (longer filter). To perform analyses on low frequency signal you should epoch your data into longer segments (epochs) than the time period you are interested in.
Also, if you are interested in theta oscillations it is better to perform time-frequency analysis than filter the ERP. ERP contains only time-locked activity, while with time-frequency representation you will be able to see theta even in time periods where it was not phase-aligned across trials. You may want to follow this tutorial for example.
Also make sure to see the many rich tutorials and examples in mne docs.
If you have any further problems we use Discourse now: https://mne.discourse.group/
I'm new to machine learning, and I understand that there are parameters and choices that apply to the model you attach to a certain set of inputs, which can be tuned/optimised, but those inputs obviously tie back to fields you generated by slicing and dicing whatever source data you had in a way that makes sense to you. But what if the way you decided to model and cut up your source data, and therefore training data, isn't optimal? Are there ways or tools that extend the power of machine learning into, not only the model, but the way training data was created in the first place?
Say you're analysing the accelerometer, GPS, heartrate and surrounding topography data of someone moving. You want to try determine where this person is likely to become exhausted and stop, assuming they'll continue moving in a straight line based on their trajectory, and that going up any hill will increase heartrate to some point where they must stop. If they're running or walking modifies these things obviously.
So you cut up your data, and feel free to correct how you'd do this, but it's less relevant to the main question:
Slice up raw accelerometer data along X, Y, Z axis for the past A number of seconds into B number of slices to try and profile it, probably applying a CNN to it, to determine if running or walking
Cut up the recent C seconds of raw GPS data into a sequence of D (Lat, Long) pairs, each pair representing the average of E seconds of raw data
Based on the previous sequence, determine speed and trajectory, and determine the upcoming slope, by slicing the next F distance (or seconds, another option to determine, of G) into H number of slices, profiling each, etc...
You get the idea. How do you effectively determine A through H, some of which would completely change the number and behaviour of model inputs? I want to take out any bias I may have about what's right, and let it determine end-to-end. Are there practical solutions to this? Each time it changes the parameters of data creation, go back, re-generate the training data, feed it into the model, train it, tune it, over and over again until you get the best result.
What you call your bias is actually the greatest strength you have. You can include your knowledge of the system. Machine learning, including glorious deep learning is, to put it bluntly, stupid. Although it can figure out features for you, interpretation of these will be difficult.
Also, especially deep learning, has great capacity to memorise (not learn!) patterns, making it easy to overfit to training data. Making machine learning models that generalise well in real world is tough.
In most successful approaches (check against Master Kagglers) people create features. In your case I'd probably want to calculate magnitude and vector of the force. Depending on the type of scenario, I might transform (Lat, Long) into distance from specific point (say, point of origin / activation, or established every 1 minute) or maybe use different coordinate system.
Since your data in time series, I'd probably use something well suited for time series modelling that you can understand and troubleshoot. CNN and such are typically your last resort in majority of cases.
If you really would like to automate it, check e.g. Auto Keras or ludwig. When it comes to learning which features matter most, I'd recommend going with gradient boosting (GBDT).
I'd recommend reading this article from AirBnB that takes deeper dive into journey of building such systems and feature engineering.
I developed an app a few months back for iOS devices that generates real-time harmonic rich drones. It works fine on newer devices, but it's running into buffer underruns on slower devices. I need to optimize this thing and need some mental help. Here's a super basic overview of what I'm currently doing:
Create an "Oscillator Bank" that consists of X number of harmonics (simply calculated from a given fundamental frequency. Nothing fancy here.)
Inside my DAC function that spits out samples to an iOS audio buffer, I call a "GetNextSample()" function that goes through the bank of sine oscillators, calculates the sample for each one and adds them up. Some simple additive synthesis.
Enjoy the beauty of the drone.
Again, it works great, until it doesn't. I'd like to optimize this thing so I'm not using brute additive synthesis of real-time calculated sine waves. If I limit the number of harmonics ("banks") to 2, it'll work on the older devices. Not cool. On the newer devices, it underruns around 50 harmonics. Not too bad. But if I want to play multiple drones at once to create some chords, that's too much processing power.... so...
Should I generate waveform tables to just loop through instead of constant calculation? (I assume yes...)
Should I convert my usage of double-precision floating point to integer based calculations? (I assume yes...)
And my big algorithmic question (being pretty non-mathematical):
If I use a waveform table, how do I accurately determine how long the wave/table should be?? In my experience developing this app, if I just go to the end of a period (2*PI) and start over again, resetting the phase back to 0, I get a sound artifact, since I'm force offsetting the phase. In other words, I can't guarantee that one period will give me the right results...
Maybe I'm over complicating things... What's the standard way of doing quick, processor friendly real-time synth of multiple added sines?
I'll keep poking around in the meantime.
Thanks!
Have you (or can you, not an iOS person) increase the buffer size? Might give you enough overhead that you do not need this. Otherwise yes wave-table synthesis is a viable approach. You could calculate a new wavetable from the sum of all the harmonics only when a parameter changes.
I have written such a beast in golang on server side ... for starters yes use single precision floating point
To address table population, I would assure your implementation is solid by having it synthesize a square wave. Visualize the output for each run as you give it each additional frequency (with its corresponding parms of amplitude and phase shift) ... by definition a single cycle is enough as long as you are correctly using enough cycles to cover the time period of a sample
Its important to leverage the knowledge that generating an output curve from an input set of sine waves ( each with freq, amplitude, phase shift) lends itself to doing the reverse ... namely perform a FFT on that output curve to have the api give you its version of the underlying sine waves (again each with a freq, amplitude and phase) ... this will confirm your system is accurate
The name of the process you are implementing is : inverse Fourier transform and there are libraries for this however I too prefer rolling my own
I am currently working on coding sensor fusion of a wheel based robot pose from GPS, Lidar, Vision and Vehicle measure. Its model is basic kinematics using EKF and no discrimination against sensors i.e. data comes in based on time stamp.
I have difficulty to fuse those sensors due to following issue;
Sometimes when the latest incoming data comes in from different sensor from a sensor gave previous state, the latest pose of the robot comes in behind previous pose. Therefore data fusion does not get so smooth and zigzag-ed as a result.
I would like discard data which plots behind/backwards of the previous data and take data which poses always forward/ahead of previous state even when sensor to provide the data changes between timestamp t and timestamp t+1. Since the data frame is global frame, it is impossible to rely on its x coordinate in minus to achieve this.
Please let me know if you had some idea on this. Thank you so much in advance.
Best,
Preliminary warning
Let me slip here a warning before suggesting posible solutions to your problem: be careful with discarding data based on your current estimate, since you never know if last measure is "pulling pose back" or previous one was wrong and caused your estimate to move forward too much.
Posible solutions
In a Kalman-like filter, observations are assumed to provide independent, uncorrelated information about state vector variables. These observations are assumed to have a random error distributed as a zero mean gaussian variable. Real life is harder, though :-(
Sometimes, measures are affected by a "bias" (a fixed term, similar to the gaussian error having a non-zero mean). e.g. tropospheric perturbations are known to introduce a position error in GPS fixes that drifts slowly over time.
If you take several sensors observing the same variable, as GPS and Lidar for for position, but they have different biases, your estimation will be jumping back and forth. Scaling problems can have a similar effect.
I will assume this is the root of your problem. If not, please refine your question.
How can you mitigate this problem? I see several alternatives:
Introduce a bias/scale correction term in your state vector to compensate sensor bias/drift. This is a very common trick in EKFs for inertial sensor fusion (gyro/accelerometer), that can work nice when tuned properly.
Apply some preprocessing to sensory inputs to correct known problems. It can be difficult to tune a filter for estimating state vector and sensor parameters at the same time.
Change how observations are interpreted. For example, use difference between consecutive position observations so that you are creating a fake odometer sensor. This greatly reduces the drift problem.
Post-process your output. Instead of discarding observations, integrate them and keep the "jumping" state vector internally, but smooth the output vector to eliminate the jumps. This is done in some UAV autopilots because such jumps affect the performance of PID controllers.
Finally, the most obvious and simple approach: discard observations based on some statistical test. A chi-square test of the residual can be used to determine if an observation is too far from expected values and must be discarded. Be careful with this options, though: observation rejection schemes must be completed with a state vector reinitialization logic to resutl in a stable behavior.
Almost all these solutions require knowning the source of each observation, so you would no longer be able to treat them indistinctly.
I have an array of soil water content sensors across several desert field sites. Their signals contain a lot of noise or bias (depending on who I talk to). I want to remove the junk while keeping as much of the signal as possible. I'm not a signal processing guy, so anything along the lines of "use an XYZ filter" or a particular algorithm or something would really help me.
I've posted a plot showing a year's worth of data from one probe. The signal is the "top"; all the junk is below the signal:
http://www.unm.edu/~hilton/swc.png
I've played around with lowess smoothing a lot; that works reasonably well except in places where there's a lot of bias below the signal (like roughly idx 1000 to 2000 and 15000 to 16000 in the example below).
I have access to Matlab's signal processing toolbox and I'm very comfortable in R and python; if there's a pre-packaged filter in one of those I could jump off from that would be great (but I'm open to coding something new).
Many thanks,
Tim
I'd start with a median filter. If I read your plot correctly you're sampling twice an hour and the data isn't too dynamic. Assuming that's correct, a median filter length of 47 or 49 would equate to a one-day window. In this data set you could probably crank that up to a week or more. In any case you should plot the unfiltered and filtered data on top of each other to make sure the filtered data passes the eyeball test (you'll know it when you see it). You may need to do the final clean-up by hand (hope you don't have thousands of sensors).
(Also, I'd send an intern or grad student out to the field sites to find out what's wrong with sensors and fix them.)
It might be worth a quick try to implement some standard deviation filtering of your data set. Split your data up into N segments and for each segment, calculate the standard deviation for the Y-values. Once you've got that, filter out data points that have Y-values that exceed 3 standard deviations (or however much you want). Of course, there is some manual work that goes on with figuring out exactly how many segments to use.