I have a number of sensors measuring a Temperature (or some other physical attribute) data. Does anyone know of any clustering method that can tell which sensors are showing similar patterns and behaviors? My series are showing some trends with cycles.
I am very new to Time series analysis.
Thank you,
Basic K-means clustering works fine for most kinds of sensor data. You will need to take time slices to avoid auto-regressive issues. Check out the proc in R
Related
Data Science
How to handle dimensionality differences over time or between subjects
Asked today
Modified today
Viewed 9 times
0
Note: This question has in mind tabular data, rather than imaging/NLP.
In the situation of collecting data over long periods of time, instruments may change and collect more precise data. This leads to the dimensionality of the data changing over time. In its simplest form this might be a feature called FeatureA was collected, but over time the instrument allows us to collect more detailed features that aim to replace FeatureA with new features FeatureA1, FeatureA2 and FeatureA3.
We probably dont want to throw the data away with the less precise FeatureA, so how do we incorporate these new features? If it was a straight replacement of one feature with another one feature I might go for a time-varying multilevel model, but I can't see a way of applying this where the dimensionality increases, at least with most libraries.
Similarly, if sub-cohorts are using slightly different instruments to detect the same thing, but the dimensionality is different, how could we input them into the same model.
Encoder-Decoder RNNs can handle the problem of differing input sizes quite elegantly so perhaps there is some inspiration there - maybe code up tabular data inputs as tensors like you would with word vectors.
Perhaps dimensionality reduction techniques like PCA/Autoencoders might work?
Does anyone have any suggestions?
What is the exact definition of spatial and temporal? I saw in many places people use these two terms, e.g., spatial vector, temporal vector, temporal factor, spatial location.
I was searching in StackOverflow, and found this one- what's the difference between spatial and temporal characterization in terms of image processing?
What I understood so far is that the term spatial is related to space and the term temporal is related to time. Still, it is quite abstract to me. Again, I am also not sure about the uses of these two. So, as same as the person asked in the above link, I want to ask the same question- What do these two terms mean and why do we care about these two?
Spatial data have to do with location-aware information, in other words, data that have coordinates (x, y). A typical example of spatial data is latitude and longitude in geographic datasets. Spatial analyses are the techniques involved in analyzing spatial data. This is a significant component of GIS (Geographic Information Systems/Science)
Temporal data is time-series data. In other words, this is data that is collected as time progresses. Temporal analysis is also known as Time-Series analysis. These are the techniques for analyzing data units that change with time.
I hope this makes these concepts less abstract and more concrete.
Adding to Ekaba's answer, spatial data doesn't necessarily need to be two dimensional either. I'm going to take an example from a medical domain which would have both spatial and temporal elements of data.
If you consider magnetic resonance imaging, it is essentially a 3D Volumetric view of an organ (let's say brain for clarity). So if you are to analyse a traditional MRI, it would be spatial analysis and you'll have 3 dimensions as it is 3D. There's another MRI modality called DCE-MRI which is essentially a sequence of MRI volumes captured over time. Now this is a typical example of a temporal sequence. Let's say DCE-MRI sequence has 40 MRI volumes captured 20s apart from each. If you just consider one sequence out of these 40 and analyse that, you'll be analyzing it spatially whereas if you consider all 40 (or a subset) of these volumes at the same time, you are analyzing it spatially as well as temporally.
Hope that clarifies things.
Another similar medical example is ultrasound imaging of a beating heart (2D Echocardiography) where the ultrasound image shows opening and closing movement of heart valves in real-time and volumetric movement of heart chambers. With high temporal resolution (# 30 frames per second) it is easy to follow the valves opening and closing accurately. With high spatial resolution it is also easy to differentiate boarders of the heart chambers to provide accurate volumetric blood flow data.
I'm new to machine learning.
I've got a huge database of sensor data from weather stations. Those sensors can be broken or have odd values. Broken sensors influences the calculations that are being done with that data.
The goal is to use machine-learning to detect if new sensor values are odd and mark them as broken if so. As said, I'm new to ML. Can somebody push me in the right direction or give feedback to my approach.
The data has a datetime and a value. The sensor values are being pushed every hour.
I appreciate any kind of help!
Since the question is pretty general in nature, I will provide some basic thoughts. Maybe you are already slightly familiar with them.
Set up a dataset that contains both broken sensors, as well as good sensors. That is the dependent variable. With that set you also have some variables that might predict the Y variable. Let's call them X.
You train a model to learn te relationship between X and Y.
You predict, based on X values where you do not know the outcome, what Y will be.
Some useful insight on the basics, is here:
https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A
Good Luck!
You could use Isolation Forest to detect abnormal readings.
Twitter has developed a algorithm called ESD (Extreme Studentized Deviate) also useful.
https://github.com/twitter/AnomalyDetection/
However a good EDA (Exploratory data analysis) is needed to define the types of abnormality found in the readings due to faulty sensors.
1) Step kind of trend, where suddenly the value increases and remains increased or decreased as well
2) Gradual increase in the value compared to other sensors and suddenly very high increase
3) Intermittent spike in the data
I am working on anomaly detection problem and I need your help and expertise. I have a sensor that records episodic time series data. For example, once in a while, the sensor activates for 10 seconds and records values at millisecond interval. My task is to identify whether the recorded pattern is not normal. In other words, I need to detect anomalies in that pattern compared to other recorded patterns.
What would be the state-of-the-art approaches to that?
After doing my own research, the following methods proven to work very well in practice:
Variational Inference for On-line Anomaly Detection in
High-Dimensional Time Series
Multivariate Industrial Time Series
with Cyber-Attack Simulation: Fault Detection Using an LSTM-based
Predictive Data Model
I have a customer location streaming data, which i need to analyze and check out for each event if the location is his usual visited location or not and generate an alert in real time if its not his usually visited location.
I was looking at various clustering algorithms but couldn't find a good one which do it in 'real time'.
Kmeans is too rigid with number of centriods.. DBSCAN is heavy weight and not sure if its fast enough to respond in real time...
Can you suggest one, which suits the real time stream processing?
I believe DBSCAN is suitable enough. Its worst-case-scenario complexity is O(n2) which is decent enough compared to other traditional algorithms such as hierarchical. In comparison to kmeans, I believe that kmeans is applicable if you use a ST_Centroid function from a spatial database such as SpatiaLIte or PostGIS ( take for granted that you use geographic data).
Between kmeans and DBSCAN, I choose DBSCAN because I think the answer to your problem is a density-based approach regarding real-time data.