I have the following metrics in prometheus: it counts memcached command by sec:
sum (irate (memcached_commands_total{instance="memcached-instance"}[5m])) by (command)
Result:
{command="delete"} 0
{command="flush"} 0
{command="get"} 62.733333333333334
{command="incr"} 0
{command="set"} 93.43333333333334
{command="touch"} NaN
{command="cas"} 0
{command="decr"} 0
I want to count commands by sec (without separate rate for different commands). I have tried the following formula:
sum (irate (memcached_commands_total{instance="memcached-instance"}[5m]))
But the result is:
{} NaN
I expect about 155, but it is NaN. I suppose it is command="touch" the culprit. It is possible to exclude NaN from the sum?
I have figured it out:
sum (irate (memcached_commands_total{instance="memcached-instance"}[5m]) > 0)
returns the correct number. >0 does the trick.
The only way irate can produce a NaN is if it's being given a NaN as input. Given that irate is meant to work on counters, that should be impossible. I'd suggest looking into why memcached_commands_total is producing NaN.
Related
I am doing a binary classification problem, I am struggling with removing outliers and also increasing accuracy.
Ratings are one my feature looks like this:
0 0.027465
1 0.027465
2 0.027465
3 0.027465
4 0.027465
...
26043 0.027465
26044 0.027465
26045 0.102234
26046 0.027465
26047 0.027465
mean value of the data:
train.ratings.mean()
0.03871552285960927
std of the data:
train.ratings.std()
0.07585168664836195
I tried the log transformation but accuracy is not increased:
train['ratings']=np.log(train.ratings+1)
my goal is to classify the data true or false:
train.netgain
0 False
1 False
2 False
3 False
4 True
...
26043 True
26044 False
26045 True
26046 False
26047 Fals
One method I used was to calculate a MAD and after that I tag all outlier with a bool type with that I can get all outliers.
Sample of MAD calculation:
def mad(x):
return np.median(np.abs(x - np.median(x)))
def mad_ratio(x):
mad_value = mad(x)
if mad_value == 0:
return 0
x_mad = np.abs(x - np.median(x)) / mad_value
return x_mad
Assume that the rating feature is normally distributed and convert it to the standard normal distribution
From normal distribution, we know 99.7% values are covered with 3 standard deviations. so we can remove the values which are above 3 standard deviations away from the mean.
.**
See below for python code.
ratings_mean=train['ratings'].mean() #Finding the mean of ratings column
ratings_std=train['ratings'].std() # standard deviation of the column
train['ratings']=train['ratings'].map(lamdba x: (x - ratings_mean)/ ratings_std
Ok, now we have now converted our data into a standard normal distribution. Now we if you see, its mean should be 0 and the standard deviation should be 1. From this, we can find out which are greater than 3 and less than -3. so that we can remove those rows from the dataset.
train=train[np.abs(train_ratings) < 3]
Now train dataframe will remove the outliers from the dataset.
**Note: You can apply 2 standard deviations as well because 2-std contains 95% of the data. Its all depends on the domain knowledge and your data. **
for my thesis I have to calculate the number of workers at risk of substitution by machines. I have calculated the probability of substitution (X) and the number of employee at risk (Y) for each occupation category. I have a dataset like this:
X Y
1 0.1300 0
2 0.1000 0
3 0.0841 1513
4 0.0221 287
5 0.1175 3641
....
700 0.9875 4000
I tried to plot a histogram with this command:
hist(dataset1$X,dataset1$Y,xlim=c(0,1),ylim=c(0,30000),breaks=100,main="Distribution",xlab="Probability",ylab="Number of employee")
But I get this error:
In if (freq) x$counts else x$density
length > 1 and only the first element will be used
Can someone tell me what is the problem and write me the right command?
Thank you!
It is worth pointing out that the message displayed is a Warning message, and should not prevent the results being plotted. However, it does indicate there are some issues with the data.
Without the full dataset, it is not 100% obvious what may be the problem. I believe it is caused by the data not being in the correct format, with two potential issues. Firstly, some values have a value of 0, and these won't be plotted on the histogram. Secondly, the observations appear to be inconsistently spaced.
Histograms are best built from one of two datasets:
A dataframe which has been aggregated grouped into consistently sized bins.
A list of values X which in the data
I prefer the second technique. As originally shown here The expandRows() function in the package splitstackshape can be used to repeat the number of rows in the dataframe by the number of observations:
set.seed(123)
dataset1 <- data.frame(X = runif(900, 0, 1), Y = runif(900, 0, 1000))
library(splitstackshape)
dataset2 <- expandRows(dataset1, "Y")
hist(dataset2$X, xlim=c(0,1))
dataset1$bins <- cut(dataset1$X, breaks = seq(0,1,0.01), labels = FALSE)
In this calculation:
months = (saved_cents / spend_cents).to_f.floor
I get the following error:
FloatDomainError: NaN
I think saved_cents and spend_cents are floats, that's why I don't understand why I get this error.
What could be the reason for this error? And how can I fix it?
According to the documentation, a FloatDomainError is:
Raised when attempting to convert special float values (in particular Infinity or NaN) to numerical classes which don't support them.
In your code, a FloatDomainError occurs when both values are zero:
saved_cents = 0.0
spend_cents = 0.0
(saved_cents / spend_cents).floor
#=> FloatDomainError: NaN
Because zero divided by zero is NaN:
saved_cents / spend_cents
#=> NaN
and although NaN is a float, attempting to send it a floor message results in that error:
Float::NaN.floor
#=> FloatDomainError: NaN
What could be the reason for this error? And how can I fix it?
Double check your input. Maybe there's another bug which sets the values to 0.0 accidentally.
I was working on generating an XOR gate dataset with torch7. But when i printed the dataset i saw that the data was wrong, but i could not find the bug. There seems to be nothing wrong with the code. But I'm new to torch, so mistakes can happen.
So, here is my code
input = torch.Tensor (4,2)
input:random(0,1)
output = torch.Tensor(1)
dataset={};
function dataset:size() return 4 end
for i=1,dataset:size() do
if input[i][1]==input[i][2] then
output[1] = 0
else
output[1] = 1
end
print("original")
print(input[i][1].." "..input[i][2].." "..output[1]) -- the values that are going to dataset
dataset[i] = {input[i], output}
print("dataset")
print(dataset[i][1][1].." "..dataset[i][1][2].." "..dataset[i][2][1]) -- for double checking i read from dataset again
end
print("Why dataset is different now?")
for i=1,4 do
print(dataset[i][1][1].." "..dataset[i][1][2].." "..dataset[i][2][1]) -- So, why this is different?
end
As you can see, I printed the values that are being inserted into the dataset list and for double checking i read from dataset again.
And finally i checked from dataset after full insertion. The dataset was different somehow. I ran couple of times. Every time it was different. Like it was stuck on 1 or 0.
So here is my output
original
1 0 1
dataset
1 0 1
original
0 0 0
dataset
0 0 0
original
1 1 0
dataset
1 1 0
original
0 0 0
dataset
0 0 0
Why dataset is different now?
1 0 0
0 0 0
1 1 0
0 0 0
As you can see, the format is like this
input input output
I printed original when i read from input[i] and output.
I printed dataset when i read from dataset, after being inserted.
Also you can see that the first set of values are different when i printed. It should be 1 0 1. But it is 1 0 0.
I could not find the bug in my code. Can anyone help? If the question is not clear please let me know.
Problem is here: dataset[i] = {input[i], output}
You're not saving calculated result, you're saving reference to value that is changed with subsequent calculations for 'xor' function.Naturally, when you read result, you're always getting the same number - last result written to output[1]
To fix it, either change output variable to store actual temporary value (not table), or at least read actual value from output table when saving to dataset[i], do not just save link to table, you won't get deep copy that way.
I'm not sure if this is a bug or not, so I thought that maybe you folks might want to take a look.
The problem lies with this code:
for i=0,1,.05 do
print(i)
end
The output should be:
0
.05
.1
--snip--
.95
1
Instead, the output is:
0
.05
.1
--snip--
.95
This same problem happened with a while loop:
w = 0
while w <= 1 do
print(w)
w = w + .05
end
--output:
0
.05
.1
--snip--
.95
The value of w is 1, which can be verified by a print statement after the loop.
I have verified as much as possible that any step that is less than or equal .05 will produce this error. Any step above .05 should be fine. I verified that 1/19 (0.052631579) does print a 1. (Obviously, a decimal denominator like 19.9 or 10.5 will not produce output from [0,1] inclusive.) Is there a possibility that this is not an error of the language? Both the interpreter and a regular Lua file produce this error.
This is a rounding problem. The issue is that 0.05 is represented as a floating point binary number, and it does not have an exact representation in binary. In base 2 (binary), it is a repeating decimal similar to numbers like 1/3 in base 10. When added repeatedly, the rounding results in a number which is slightly more than 1. It is only very, very slightly more than 1, so if you print it out, it shows 1 as the output, but it is not exactly 1.
> x=0.05+0.05+0.05+0.05+0.05+0.05+0.05+0.05+0.05+0.05+0.05+0.05+0.05+0.05+0.05+0.05+0.05+0.05+0.05+0.05
> print(x)
1
> print(1==x)
false
> print(x-1)
2.2204460492503e-16
So, as you can see, although really close to 1, it is actually slightly more.
A similar situation can come up in decimal when we have repeating fractions. If we were to add together 1/3 + 1/3 + 1/3, but we had to round to six digits to work with, we would add 0.333333 + 0.333333 + 0.333333 and get 0.999999 which is not actually 1. This is an analogous case for binary math. 1/20 cannot be precisely represented in binary.
Note that the rounding is slightly different for multiplication so
> print(0.05*20-1)
0
> print(0.05*20==1)
true
As a result, you could rewrite your code to say
for i=0,20,1 do
print(i*0.05)
end
And it would work correctly. In general, it's advisable not to use floating point numbers (that is, numbers with decimal points) for controlling loops when it can be avoided.
This is a result of floating-point inaccuracy. A binary64 floating point number is unable to store 0.05 and so the result will be rounded to a number which is very slightly more than 0.05. This rounding error remains in the repeated sum, and eventually the final value will be slightly more than 1.0, and so will not be displayed.
This is a floating point thing. Computers don't represent floating point numbers exactly. Tiny rounding errors make it so that 20 additions of +0.05 does not result in precisely 1.0.
Check out this article: "What every programmer should know about floating-point arithmetic."
To get your desired behavior, you could loop i over 1..20, and set f=i*0.05
This is not a bug in Lua. The same thing happens in the C program below. Like others have explained, it's due to floating-point inaccuracy, more precisely, to the fact that 0.05 is not a binary fraction (that is, does not have a finite binary representation).
#include <stdio.h>
int main(void)
{
double i;
for (i=0; i<=1; i+=0.05) printf("%g\n",i);
return 0;
}