How to normalize data taken from Apple Watch accelerometer? - normalization

I have triaxial data (X, Y, and Z) collected from the accelerometer of an Apple Watch, and from these data I want to conduct some analysis, such as gait analysis, physical activity, etc.
I have observed that my data-X, data-Y and data-Z are in different scales, and I have the following questions:
Is suitable to conduct a normalization process on my data before conducting a gait analysis?
If it is the case, What is the proper way to perform a normalization on accelerometer data?

Related

Unsupervised Learning for regression analysis

I am a geophysics student and I am trying to predict shear wave velocity which is numerical data. I feel since it is numerical data it'd be regression analysis but the problem I have now is that I don't have a shear wave log I can use as a target which then makes the project unsupervised, How do I go about it, please?
I want to if it's possible to predict numerical data because I have tried picking out random logs I feel will predict it but how do I check the accuracy
The solution inhere for you is to make data out of the signal data. I was also working on similar kind of problem where I was to predict the intensity of fall and data that I got was signal data having x,y,z axis. I managed to solve the problem by initially creating the data using clustering methodology according to my use case.Now since I have supervised data I proceded with futher analysis and predictions.

How do engineered features help when they are not present in the test data

I am trying to classify between drones and birds using machine learning. I have got a big number of samples of feature vectors from a radar which generally consists of position(x,y,z), velocity(vx,vy,vz), acceleration(ax,ay,az), Noise, SNR etc plus some more features. Actual classes are known for the samples. However, These basic features are not able to distinguish between drones and birds for new(out of bag) samples. so I am going to try feature engineering to generate new features like standard deviation of speed calculated using mean-speed and then uses the difference between mean-speed and speeds obtained from individual samples(of the same track) to calculate standard deviation by averaging out the differences . Similarly, I generate new features using some other formula by using sum or difference or deviation from average(of different samples from same track) etc.
After obtaining these features we will use the same to create a trained model which will be used for classification.
However, I can apply all these feature engineering on the training dataset whereas the same engineered features will not be present in the test dataset obtained in the operational scenario where we get one sample after another. Also in operational scenario we do not know how many samples we will be getting for a track.
So, how can these engineered features be obtained so as to create a test feature vector with the same in actual operational scenario.
If these cannot be obtained while testing ,then how will the same engineered features (used for model training) be able to solve the classification problem when we do not have these in the test data?

What is the definition of the terms SPATIAL and TEMPORAL in terms of statistics, data science or machine learning

What is the exact definition of spatial and temporal? I saw in many places people use these two terms, e.g., spatial vector, temporal vector, temporal factor, spatial location.
I was searching in StackOverflow, and found this one- what's the difference between spatial and temporal characterization in terms of image processing?
What I understood so far is that the term spatial is related to space and the term temporal is related to time. Still, it is quite abstract to me. Again, I am also not sure about the uses of these two. So, as same as the person asked in the above link, I want to ask the same question- What do these two terms mean and why do we care about these two?
Spatial data have to do with location-aware information, in other words, data that have coordinates (x, y). A typical example of spatial data is latitude and longitude in geographic datasets. Spatial analyses are the techniques involved in analyzing spatial data. This is a significant component of GIS (Geographic Information Systems/Science)
Temporal data is time-series data. In other words, this is data that is collected as time progresses. Temporal analysis is also known as Time-Series analysis. These are the techniques for analyzing data units that change with time.
I hope this makes these concepts less abstract and more concrete.
Adding to Ekaba's answer, spatial data doesn't necessarily need to be two dimensional either. I'm going to take an example from a medical domain which would have both spatial and temporal elements of data.
If you consider magnetic resonance imaging, it is essentially a 3D Volumetric view of an organ (let's say brain for clarity). So if you are to analyse a traditional MRI, it would be spatial analysis and you'll have 3 dimensions as it is 3D. There's another MRI modality called DCE-MRI which is essentially a sequence of MRI volumes captured over time. Now this is a typical example of a temporal sequence. Let's say DCE-MRI sequence has 40 MRI volumes captured 20s apart from each. If you just consider one sequence out of these 40 and analyse that, you'll be analyzing it spatially whereas if you consider all 40 (or a subset) of these volumes at the same time, you are analyzing it spatially as well as temporally.
Hope that clarifies things.
Another similar medical example is ultrasound imaging of a beating heart (2D Echocardiography) where the ultrasound image shows opening and closing movement of heart valves in real-time and volumetric movement of heart chambers. With high temporal resolution (# 30 frames per second) it is easy to follow the valves opening and closing accurately. With high spatial resolution it is also easy to differentiate boarders of the heart chambers to provide accurate volumetric blood flow data.

State-of-art for sensor's anomaly detection

I am working on anomaly detection problem and I need your help and expertise. I have a sensor that records episodic time series data. For example, once in a while, the sensor activates for 10 seconds and records values at millisecond interval. My task is to identify whether the recorded pattern is not normal. In other words, I need to detect anomalies in that pattern compared to other recorded patterns.
What would be the state-of-the-art approaches to that?
After doing my own research, the following methods proven to work very well in practice:
Variational Inference for On-line Anomaly Detection in
High-Dimensional Time Series
Multivariate Industrial Time Series
with Cyber-Attack Simulation: Fault Detection Using an LSTM-based
Predictive Data Model

Training data and testing data from the same sensor

When using learning methords, We have training and testing data.
I'd like to confirm that
1)whether the training data and testing data must capture from the same sensor 2)What if they are from different sensors?
3) If they must be captured from the same sensor, are there any methods to uniform the data even they are not from the same sensor?
Thank you.
Yes, you would need both train and test data from the same sensor because of measurement error and detection bias that's specific to that sensor. If the test data came from a sensor that's always different from the sensor training data came from, you could have total system failure. Each sensor has it's own precision, bias, detection limits, etc, so that has to be distributed to both training and testing.
The idea with testing and training is not so much what you are thinking about, but rather the idea that when training the algorithm, the objects used in testing were never used in training. It's called selection bias. But you can have have objects from the same sensor used in training or testing.
If, however, the measurement wavelength or angle (pitch) of each sensor is different, then you are dealing more with a problem requiring MUlti-Signal Classification or Pisarenko harmonic decomposition.

Resources