Prometheus Union of Ranged Vectors - monitoring

I have two range vectors (# of hits and misses) that I want to aggregate by their types. Some of the types have hits, other misses, some with both. These are two independant metrics that I'm trying to get a union of but the resulting vector doesn't make sense. It's missing some of the values and I think it's because they have either all hits or misses. Am I doing this completely the wrong way?
sum by (type) (increase(metric_hit{}[24h]) + sum by (type) (increase(metric_miss{}[24h])

First off, it's recommended to always initialise all your potential label values to avoid this sort of issue.
This can be handled with the or operator:
sum by (type) (
(increase(metric_hit[1d]) or metric_miss * 0)
+
(increase(metric_miss[1d]) or metric_hit * 0)
)

Related

how to select SpatRaster layers from their names?

I've got a SpatRaster of (150 x 150 x 1377) that shows temporal evolution of precipitations. Each layer is a given hour in a 2-month interval, but some hours are missing, and the dataset isn't continuous. The layers names are strings as "YYYYMMDDhhmm".
I need to find the mean value every three hours even on whole intervals or on missing-data intervals. On entire ones I want to average three data and on missing-data ones I would like to average two of them or, if two are missing, to select the unique value as the averaged one.
How can I use data names to select how to act?
I've already tried this code but I'm averaging on three continuous layers by index and not by hours. How can I convert names in DateTime form from "tidyverse" in order to use rollapply() to see if two steps back I find the DateTime I am expecting? Is there any other method to check this out?
HSAF=rast(c((paste0(resfolder, "HSAF_final1_5.tif")),(paste0(resfolder, "HSAF_final6_10.tif")),(paste0(resfolder, "HSAF_final11_15.tif")),
(paste0(resfolder, "HSAF_final16_20.tif")),(paste0(resfolder, "HSAF_final21_25.tif")),(paste0(resfolder, "HSAF_final26_30.tif")),
(paste0(resfolder, "HSAF_final31_N04.tif")),(paste0(resfolder, "HSAF_finalN05_N08.tif")),(paste0(resfolder, "HSAF_finalN09_N13.tif")),
(paste0(resfolder, "HSAF_finalN14_N18.tif")),(paste0(resfolder, "HSAF_finalN19_N23.tif")),(paste0(resfolder, "HSAF_finalN24_N28.tif")),
(paste0(resfolder, "HSAF_finalN29_N30.tif"))))
index=names(HSAF)
j=2
for (i in seq(1,3, by=3))
{third_el<- HSAF[index[i+j]]
second_el <- HSAF[index[i+j-1]]
first_el<- HSAF[index[i+j-2]]
newraster<- c(first_el, second_el, third_el)
newraster<- mean(newraster, filename=paste0(tempfile(), ".tif"))
names(newraster)<- paste0(index[i+j-2],index[i+j-1],index[i+j])
}
for (i in seq(4,1374 , by=3))
{ third_el<- HSAF[index[i+j]]
second_el <- HSAF[index[i+j-1]]
first_el<- HSAF[index[i+j-2]]
subraster<- c(first_el, second_el, third_el)
subraster<- mean(subraster, filename=paste0(tempfile(), ".tif"))
names(subraster)<- paste0(index[i+j-2],index[i+j-1],index[i+j])
add(newraster)<- subraster
}

Automata and Computability

How long will it take for a program to print a truth table for n propositional symbols?
(Symbols: P1, P2, ..., Pn)
Can't seem to crack this question, not quite sure how to calculate this instance.
It will take time proportional to at least n*2^n. Each of the propositional symbols can assume one of two values - true or false. The portion of the truth table that lists all possible assignments of n such variables must have at least 2 * 2 * … * 2 (n times) = 2^n rows of n entries each; and that's not even counting the subexpressions that make up the rest of the table. This lower bound is tight since we can imagine the proposition P1 and P2 and … and Pn and the following procedure taking time Theta(n*2^n) to write out the answer:
fill up P1's column with 2^(n-1) TRUE and then 2^(n-1) FALSE
fill up P2's column with 2^(n-2) TRUE and then 2^(n-2) FALSE, alternating
…
fill up Pn's column with 1 TRUE and 1 FALSE, alternating
fill up the last column with a TRUE at the top and FALSE all the way down
If you have more complicated propositions then you should probably take the number of subexpressions as another independent variable since that could have an asymptotically relevant effect (using n propositional symbols, you can have arbitrarily many unique subexpressions that must be given their own columns in a complete truth table).

What is a better way than a for loop to implement an algorithm that involves sets?

I'm trying to create an algorithm along these lines:
-Create 8 participants
-Each participant has a set of interests
-Pair them with another participant with the least amount of interests
So what I've done so far is create 2 classes, the Participant and Interest, where the Interest is Hashable so that I can create a Set with it. I manually created 8 participants with different names and interests.
I've made an array of participants selected and I've used a basic for in loop to somewhat pair them together using the intersection() function of sets. Somehow my index always kicks out of range and I'm positive there's a better way of doing this, but it's just so messy and I don't know where to start.
for i in 0..<participantsSelected.count {
if participantsSelected[i].interest.intersection(participantsSelected[i+1].interest) == [] {
participantsSelected.remove(at: i)
participantsSelected.remove(at: i+1)
print (participantsSelected.count)
}
}
So my other issue is using a for loop for this specific algorithm seems a bit off too since what if they all have 1 similar interest, and it won't equal to [] / nil.
Basically the output I'm trying is to remove them from the participants selected array once they're paired up, and for them to be paired up they would have to be with another participant with the least amount of interests with each other.
EDIT: Updated code, here's my attempt to improve my algorithm logic
for participant in 0..<participantsSelected {
var maxInterestIndex = 10
var currentIndex = 1
for _ in 0..<participantsSelected {
print (participant)
print (currentIndex)
let score = participantsAvailable[participant].interest.intersection(participantsAvailable[currentIndex].interest)
print ("testing score, \(score.count)")
if score.count < maxInterestIndex {
maxInterestIndex = score.count
print ("test, \(maxInterestIndex)")
} else {
pairsCreated.append(participantsAvailable[participant])
pairsCreated.append(participantsAvailable[currentIndex])
break
// participantsAvailable.remove(at: participant)
// participantsAvailable.remove(at: pairing)
}
currentIndex = currentIndex + 1
}
}
for i in 0..<pairsCreated.count {
print (pairsCreated[i].name)
}
Here is a solution in the case that what you are looking for is to pair your participants (all of them) optimally regarding your criteria:
Then the way to go is by finding a perfect matching in a participants graph.
Create a graph with n vertices, n being the number of participants. We can denote by u_p the vertex corresponding to participant p.
Then, create weighted edges as follows:
For each couple of participants p, q (p != q), create the edge (u_p, u_q), and weight it with the number of interests these two participants have in common.
Then, run a minimum weight perfect matching algorithm on your graph, and the job is done. You will obtain an optimal result (meaning the best possible, or one among the best possible matchings) in polynomial time.
Minimum weight perfect matching algorithm: The problem is strictly equivalent to the maximum weight matching algorithm. Find the edge of maximum weight (let's denote by C its weight). Then replace the weight w of each edge by C-w, and run a maximum weight matching algorithm on the resulting graph.
I would suggest that yoy use Edmond's blossom algorithm to find a perfect matching in your graph. First because it is efficient and well documented, second because I believe you can find implementations in most existing languages, but also because it truly is a very, very beautiful algorithm (it ain't called blossom for nothing).
Another possibility, if you are sure that your number of participants will be small (you mention 8), you can also go for a brute-force algorithm, meaning to test all possible ways to pair participants.
Then the algorithm would look like:
find_best_matching(participants, interests, pairs):
if all participants have been paired:
return sum(cost(p1, p2) for p1, p2 in pairs), pairs // cost(p1, p2) = number of interests in common
else:
p = some participant which is not paired yet
best_sum = + infinite
best_pairs = empty_set
for all participants q which have not been paired, q != p:
add (p, q) to pairs
current_sum, current_pairs = find_best_matching(participants, interests, pairs)
if current_sum < best_sum:
best_sum = current_sum
best_pairs = current_pairs
remove (p, q) from pairs
return best_sum, best_pairs
Call it initially with find_best_matching(participants, interests, empty_set_of_pairs) to get the best way to match your participants.

how to get a random set of records from an index with cypher query

what's the syntax to get random records from a specific node_auto_index using cypher?
I suppose there is this example
START x=node:node_auto_index("uname:*") RETURN x SKIP somerandomNumber LIMIT 10;
Is there a better way that won't return a contiguous set?
there is no feature similar to SQL's Random() in neo4j.
you must either declare the random number in the SKIP random section before you use cypher (in case you are not querying directly from console and you use any upper language with neo4j)
- this will give a random section of nodes continuously in a row
or you must retrieve all the nodes and than make your own random in your upper language across these nodes - this will give you a random set of ndoes.
or, to make a pseudorandom function in cypher, we can try smthing like this:
START x=node:node_auto_index("uname:*")
WITH x, length(x.uname) as len
WHERE Id(x)+len % 3 = 0
RETURN x LIMIT 10
or make a sophisticated WHERE part in this query based upon the total number of uname nodes, or the ordinary ascii value of uname param, for example

How to calculate mean based on number of votes/scores/samples/etc?

For simplicity say we have a sample set of possible scores {0, 1, 2}. Is there a way to calculate a mean based on the number of scores without getting into hairy lookup tables etc for a 95% confidence interval calculation?
dreeves posted a solution to this here: How can I calculate a fair overall game score based on a variable number of matches?
Now say we have 2 scenarios ...
Scenario A) 2 votes of value 2 result in SE=0 resulting in the mean to be 2
Scenario B) 10000 votes of value 2 result in SE=0 resulting in the mean to be 2
I wanted Scenario A to be some value less than 2 because of the low number of votes, but it doesn't seem like this solution handles that (dreeve's equations hold when you don't have all values in your set equal to each other). Am I missing something or is there another algorithm I can use to calculate a better score.
The data available to me is:
n (number of votes)
sum (sum of votes)
{set of votes} (all vote values)
Thanks!
You could just give it a weighted score when ranking results, as opposed to just displaying the average vote so far, by multiplying with some function of the number of votes.
An example in C# (because that's what I happen to know best...) that could easily be translated into your language of choice:
double avgScore = Math.Round(sum / n);
double rank = avgScore * Math.Log(n);
Here I've used the logarithm of n as the weighting function - but it will only work well if the number of votes is neither too small or too large. Exactly how large is "optimal" depends on how much you want the number of votes to matter.
If you like the logarithmic approach, but base 10 doesn't really work with your vote counts, you could easily use another base. For example, to do it in base 3 instead:
double rank = avgScore * Math.Log(n, 3);
Which function you should use for weighing is probably best decided by the order of magnitude of the number of votes you expect to reach.
You could also use a custom weighting function by defining
double rank = avgScore * w(n);
where w(n) returns the weight value depending on the number of votes. You then define w(n) as you wish, for example like this:
double w(int n) {
// caution! ugly example code ahead...
// if you even want this approach, at least use a switch... :P
if (n > 100) {
return 10;
} else if (n > 50) {
return 8;
} else if (n > 40) {
return 6;
} else if (n > 20) {
return 3;
} else if (n > 10) {
return 2;
} else {
return 1;
}
}
If you want to use the idea in my other referenced answer (thanks!) of using a pessimistic lower bound on the average then I think some additional assumptions/parameters are going to need to be injected.
To make sure I understand: With 10000 votes, every single one of which is "2", you're very sure the true average is 2. With 2 votes, each a "2", you're very unsure -- maybe some 0's and 1's will come in and bring down the average. But how to quantify that, I think is your question.
Here's an idea: Everyone starts with some "baggage": a single phantom vote of "1". The person with 2 true "2" votes will then have an average of (1+2+2)/3 = 1.67 where the person with 10000 true "2" votes will have an average of 1.9997. That alone may satisfy your criteria. Or to add the pessimistic lower bound idea, the person with 2 votes would have a pessimistic average score of 1.333 and the person with 10k votes would be 1.99948.
(To be absolutely sure you'll never have the problem of zero standard error, use two different phantom votes. Or perhaps use as many phantom votes as there are possible vote values, one vote with each value.)

Resources