Anomaly detection with machine learning without labels - machine-learning

I am tracing multiple signals for a certain period of time and associating them with a timestamp like following:
t0 1 10 2 0 1 0 ...
t1 1 10 2 0 1 0 ...
t2 3 0 9 7 1 1 ... // pressed a button to change the mode
t3 3 0 9 7 1 1 ...
t4 3 0 8 7 1 1 ... // pressed button to adjust a certain characterstic like temperature (signal 3)
where t0 is the tamp stamp, 1 is the value for signal 1, 10 the value for signal 2 and so on.
That captured data during that certain period of time should be considered as the normal case. Now significant derivations should be detected from the normal case. With significant derivation I do NOT mean that one signal value just changes to a value that has not been seen during the tracing phase but rather that a lot of values change that have not yet been related to each other. I do not want to hardcode rules since in the future more signals might be added or removed and other "modi" that have other signal values might be implemented.
Can this be achieved via a certain Machine Learning algorithm? If a small derivation occurs I want the algorithm to first see it as a minor change to the training set and if it occurs multiple times in the future it should be "learned". The major goal is to detect the bigger changes / anomalies.
I hope I could explain my problem detailed enough. Thanks in advance.

you could just calculate the nearest neighbor in your feature space and set a threshold how far its allowed to be away from your test point to not be an anomaly.
Lets say you have 100 values in your "certain period of time"
so you use a 100 dimensional feature space with your training data (which doesn't contain anomalies)
If you get a new dataset you want to test, you calculate the (k) nearest neighbor(s) and calculate the (e.g. euclidean) distance in your featurespace.
If that distance is larger than a certain threshold it's an anomaly.
What you have to do in order to optimize is finding a good k and a good threshold. E.g. by Grid-search.
(1) Note that something like this probably only works well if your data has a fixed starting and ending point. Otherwise you would need a huge amount of data and even than it will not perform as good.
(2) Note It should be worth trying to create an own detector for every "mode" you have mentioned in your question.

Related

Algorithm for detecting the lowest level of EMG activity?

How, having only data from the EMG sensor, to determine whether a person is in the REM phase? In other words, I need to detect the lowest level of activity from the sensor's EMG. Well, or at least register the phase change ...
In more detail... I'm going to make a REM phase detector using an EMG (electromyography) sensor. There is already a sketch of the Android application on the github, if you are interested, I can post a link. Although there is still work to be done...)
The device should work based on the fact that in different states of the brain (wakefulness, slow sleep, REM sleep), different levels of activity will be recorded from the sensor. In REM sleep, this activity is minimal.
The Bluetooth sensor is attached to the body before going to bed, the Android program is launched, communicates with the sensor and sends the data read from it to the connected TCP client via WiFi. TCP client - python script running on a nettop. It receives data, and by design should determine in real time whether the current level of activity is the minimum for the entire observation period. If so, the script will tell the server (Android program) to turn on the hint - it can be a vibration on the phone or a fitness bracelet, playing an audio sample through a headphone, light flashes, a slight electric shock =), etc.
Because only one EMG sensor is used, I admit that it will not be possible to catch REM sleep phases with 100% accuracy, but this is not necessary. If the accuracy is 80% - it's already good. For starters, even an algorithm for detecting a change in the current activity level is suitable. - There will be something to experiment with and something to build on.
The problem is with the algorithm. I would not like to use fixed thresholds, because these thresholds will be different for different people, and even for the same person at different times and in different states they will differ. Therefore, I will be glad to ideas and tips from your side.
I found an interesting article "A Self-adaptive Threshold Method for Automatic Sleep Stage Classifi-
cation Using EOG and EMG" (https://www.researchgate.net/publication/281722966_A_Self-adaptive_Threshold_Method_for_Automatic_Sleep_Stage_Classification_Using_EOG_and_EMG):
But there remains incomprehensible, a few points. First, how is the energy calculated (Step 1-4: Energy)?
d = f.read(epoche_seconds*SAMPLE_RATE*2)
if not d:
break
print('epoche#{}'.format(i))
fmt = '<{}H'.format(len(d) // 2)
t = unpack(fmt, d)
d = list(t)
d = np.array(d)
d = d / (1 << 14) # 14 - bits per sample
df=pd.DataFrame({'signal': d, 'id': [x*2 for x in range(len(d))]})
df = df.set_index('id')
d = df.signal
# signal prepare:
d = butter_bandpass_filter(d, lowcut, highcut, SAMPLE_RATE, order=2)
# extract signal futures:
future_iv = np.mean(np.absolute(d))
print('iv={}'.format(future_iv))
future_var = np.var(d)
print('var={}'.format(future_var))
ws = 6
df = pd.DataFrame({'signal': np.absolute(d)})
future_e = df.rolling(ws).sum()[ws-1:].max().signal
print('E={}'.format(future_e))
-- Will it be right?
Secondly, can someone elaborate on this:
Step 2-1: normalized processed
EMG and EOG feature vectors were processed with
normalized function before involved into classification
steps. Feature vectors were normalized as the follow-
ing function (4):
Where, Xmax and Xmin were got by following
steps: first, sort x(i) vector, and set window length N
represents 50 ; then compare 2Nk (k, values from
10 to 1) with the length of x(i) , if 2Nk is bigger
than the later one, reduce k , until 2Nk is lower than
the length of x(i). If length of x(i) is greater than
2Nk, compute the mean of 50 larger values as
Xmax, and the average of 50 smaller values as Xmin.
?
If you are looking for REM (deep sleep); a spectogram will show you intensity and frequency information on a 1 page graph/chart. People of a certain age refer to spectograms as TFFT - the wiki link is... https://en.wikipedia.org/wiki/Spectrogram
Can you input the data into a spectogram display/plot. Use a large FFT window (you probably have hours of data) with a small overlap (15%). I would recommend starting with an FFT window around 1 second.

Prometheus query for last local peak value

What Prometheus query (PromQl) can be used to identify the last local peak value in the last X minutes in a graph?
A local peak is a point that is larger than its previous and next datapoint. (So ​​the current time is definitely not a local peak)
(p: peak point, i: cornjob interval, m: missed execuation)
I want this value to find an anomaly in the execution of a cron job. As you can see in the picture, I have written a query to calculate the elapsed time since the last execution of a job. Now to set an alert rule to calculate the elapsed time from the last successful execution and find missed execution, I need the amount of time that the last execution of the job occurred in that interval. This interval is unknown for the query (In other words, the interval of the job is specified by another program), so I can not compare elapsed time with a fixed time.
Use z-score to detecting anomalies
If you know the average value and standard deviation (σ) of a series, you can use any sample in the series to calculate the z-score. The z-score is measured in the number of standard deviations from the mean. So a z-score of 0 would mean the z-score is identical to the mean in a data set with a normal distribution, while a z-score of 1 is 1.0 σ from the mean, etc.
Calculate the average and standard deviation for the metric using data with large sample size.
# Long-term average value for the series
- record: job:cronjob_duration_time_seconds_count:rate10m:avg_over_time_1w
expr: avg_over_time(sum(rate(cronjob_duration_time_seconds_count[10m]))[1w:])
# Long-term standard deviation for the series
- record: job:cronjob_duration_time_seconds_count:rate5m:stddev_over_time_1w
expr: stddev_over_time(sum(rate(cronjob_duration_time_seconds_count[10m]))[1w:])
calculate the z-score for the Prometheus query once you have the average and standard deviation for the aggregation.
# Z-Score for aggregation
(
job:cronjob_duration_time_seconds_count:rate10m -
job:cronjob_duration_time_seconds_count:rate10m:avg_over_time_1w
) / stddev_over_time(sum(rate(cronjob_duration_time_seconds_count[10m]))[1w:])
Based on the statistical principles of normal distributions, you can assume that any value that falls outside of the range of roughly +1 to -1 is an anomaly. For example, you can get an alert when our aggregation is out of this range for more than five minutes.
If what you want is an alert to be fired when the elapsed time has been longer than a fixed duration, you can set an alert similar to the up alert, based on the changes > 0 expression, which is only true (i.e. > 0) when the job is running.
An example would be:
rules:
- alert: CronJobNotRunning
expr: |
changes(
sum(
rate(
cronjob_duration_time_seconds_count{
status="ok", namespace="<namespace>", exported_job="<job>"
}[1m]
)
)[1m:]
) == 0
for: <alert_duration>
Note that subqueries ([1m:]) are expensive, and introducing a recording rule there can help performance, especially in a dashboard.
Also, in your case, the time since the last time the second derivative was non-zero can be used too, as that happens when a job starts/finishes (the drops in the graph, or when it starts to rise).

change in value of one frequency bin affect FFT and IFFT values of non changing bins

I have a 3001x577 matrix. I want to apply a operation to the first 120 samples. I have applied to the first 120 samples which accounts to 20 Hz of frequency. The sampling rate is 2 msec. So I have Fnyq =250hz. Now I have taken out the first 120 samples. I noticed that after applying the filter and replacing it with the older 120 samples, the values of bins greater than 120 has changed after I applied an IFFT . And this is evident on my final result. I got the desired filter result but it ends up changing values of samples which i want untouched.
Can someone explain why change in value of few frequency bins affect the ifft or fft of non changing bins. I am using matlab. And how can i prevent it?
You took part of the spectrum (the first 120 samples), changed this part somehow and transformed the outcome back into the time domain by using an IFFT. It is to be expected that the signal has changed beyond the 120 samples since you manipulated frequency components which will alter all samples in the time domain. Think of it this way: You changed the amplitude (and phase) of 120 sinuses and then expect that the outcome to be limited to a certain time extent. Maybe you can post a new question where you describe what you actually want to achieve instead of the experiment you perform to get the job done.

Q-Learning: Inaccurate predictions

I recently started getting into Q-Learning. I'm currently writing an agent that should play a simple board game (Othello).
I'm playing against a random opponent. However it is not working properly. Either my agent stays around 50% winrate or gets even worse, the longer I train.
The output of my neural net is an 8x8 matrix with Q-values for each move.
My reward is as follows:
1 for a win
-1 for a loss
an invalid move counts as loss therefore the reward is -1
otherwise I experimented with a direct reward of -0.01 or 0.
I've made some observations I can't explain:
When I consider the prediction for the first move (my agent always starts) the predictions for the invalid moves get really close to -1 really fast. The predictions for the valid moves however seem to rise the longer I play. They even went above 1 (they were above 3 at some point), what shouldn't be possible since I'm updating my Q-values according to the Bellman-equation (with alpha = 1)
where Gamma is a parameter less than 1 (I use 0.99 for the most time). If a game lasts around 30 turns I would expect max/min values of +-0.99^30=+-0.73.
Also the predictions for the starting state seem to be the highest.
Another observation I made, is that the network seems to be too optimistic with its predictions. If I consider the prediction for a move, after which the game will be lost, the predictions for the invalid turns are again close to -1. The valid turn(s) however, often have predictions well above 0 (like 0.1 or even 0.5).
I'm somewhat lost, as I can't explain what could cause my problems, since I already double-checked my reward/target matrices. Any ideas?
i suspect your bellman calculation (specifically, Q(s', a')) does not check for valid moves as the game progresses. that would explain why "predictions for the invalid moves get really close to -1 really fast".

Association Rule - Non-Binary Items

I have studied association rules and know how to implement the algorithm on the classic basket of goods problem, such as:
Transaction ID Potatoes Eggs Milk
A 1 0 1
B 0 1 1
In this problem each item has a binary identifier. 1 indicates the basket contains the good, 0 indicates it does not.
But what would be the best way to model a basket which can contain many of the same good? E.g., take the below, very unrealistic example.
Transaction ID Potatoes Eggs Milk
A 5 0 178
B 0 35 7
Using binary indicators in this case would obviously be losing a lot of information and I am seeking a model which takes into account not only the presence of items in the basket, but also the frequency that the items occur.
What would be a suitable algorithm for this problem?
In my actual data there are over one hundred items and, based on the profile of a user's basket, I would like to calculate the probabilities of the customer consuming the other available items.
An alternative is to use binary indicators but constructing them in a more clever way.
The idea is to set the indicator when an amount is more than the central value, which means that it shall be significant. If everyone buys 3 breads on average, does it make sense to flag someone as a "bread-lover" for buying two or three?
Central value can a plain arithmetic mean, one with outliers removed, or the median.
Instead of:
binarize(x) = 0 if x = 0
1 otherwise
you can use
binarize*(x) = 0 if x <= central(X)
1 otherwise
I think if you really want to have probabilities is to encode your data in a probabilistic way. Bayesian or Markov networks might be a feasible way. Nevertheless without having a reasonable structure this will be computational extremely expansive. For three item types this, however, seems to be feasible
I would try to go for a Neural Network Autoencoder if you have many more item types. If there is some dependency in the data it will discover that.
For the above example you could use a network with three input, two hidden and three output neurons.
A little bit more fancy would be to use 3 fully connected layers with drop out in the middle layer.

Resources