Count availability of service in percents (GAUGE chart) - devops

I need to count percentage of availibity of service regarding to the time. I came up with such idea:
(scalar(count_over_time(sum(status_metric{id="$id", status="OK"}[$__range])) / (scalar(count_over_time(sum(status_metric{id="$id", status=~".+"}[$__range])))*100
I would like to use var [$__range] due to the fact, we could see availability of service over the time which we choose, e.g. 30d, 90d, 1y and etc.
My guess is that this query doesn't handle the problem, when we have other type of status than "OK" e.g. "Not ok:)" So it will show 100% or nothing... Any ideas?

You need to swap sum() and count_over_time()functions:
sum(count_over_time(status_metric{id="$id", status="OK"}[5m])) / sum(count_over_time(status_metric{id="$id"}[5m])) * 100
Otherwise subquery mode is enabled, which may return unexpected results. See these docs on how subqueries work in general.

Related

Simple Cypher Query for apoc dijkstra taking FOREVER

Maybe I am very stupid or Neo4j is not supposed to be fast. (Disclaimer: I am a Neo4j noob)
I have the following simple dijkstra query which is taking forever to run. I have to atleast wait for 5-10 minutes for it to execute.Sometimes my Chrome browser crashes because of it.
Sample Graph
Cypther Query
profile MATCH (startNode:Stop)--(st:Stoptime),
(endNode:Stop)--(et:Stoptime)
where endNode.name = 'Hauptbahnhof Süd' and
(startNode.name = 'Schlump' or startNode.name = 'U Schlump')
call apoc.algo.dijkstra(st, et, 'PRECEDES', 'weight') YIELD path, weight
return startNode, endNode, path, weight
limit 100;
Computer Config
I am using a Ubuntu VM on windows machine which has 24GB Ram and 6 Cpus.
Indexes
Sysinfo
When I run profile on the above Query, i get the following information:
Profile Information
For the love of God, I cant figure out, where the bottleneck lies. I have checked all other answers on this, but to no avail.
Since I don't have the data set to test out my suggestion with, I can only point you in the direction that I would look. Hopefully, it leads you to the answer.
In looking at the profile and query I see that startNode and endNode are both type :Stop and that the Stop.name property is indexed.
When looking for endNode.name = 'Hauptbahnhof Süd' there are 3 estimated rows and 3 rows are returned.
However when looking for (startNode.name = 'Schlump' or startNode.name = 'U Schlump') there are 6 estimated rows, but 14827 returned.
Are there indeed 14827 :Stop nodes that contain either 'Schlump' or 'U Schlump'?
Or is it the 6 estimated rows? If the latter is the case can you run the query without the OR:
where endNode.name = 'Hauptbahnhof Süd' and startNode.name = 'Schlump'
to see what the profiler comes up with.
If that performs as expected then the solution may be to rewrite the query to include that OR logic in a different format?
Perhaps
where endNode.name = 'Hauptbahnhof Süd' and startNode.name IN ['Schlump','U Schlump']
Also found this older answer indicating an issue with the OR operator and indexes prior to 3.2.
I had remembered seeing another recent answer about some issue with OR, but can't seem to locate it now.
Good luck!

Esper very simple context and aggregation

I have a quite simple problem to modelize and I don't have experience in Esper, so I may be headed the wrong way so I'd like some insight.
Here's the scenario: I have one stream of events "ParkingEvent", with two types of events "SpotTaken" and "SpotFree". So I have an Esper context both partitioned by id and bordered by a starting event of type "SpotTaken" and an end event of type "SpotFree". The idea is to monitor a parking spot with a sensor and then aggregate data to count the number of times the spot has been taken and also the time occupation.
That's it, no time window or whatsoever, so it seems quite simple but I struggle aggregating data. Here's the code I got so far:
create context ParkingSpotOccupation
context PartionBySource
partition by source from SmartParkingEvent,
context ContextBorders
initiated by SmartParkingEvent(
type = "SpotTaken") as startEvent
terminated by SmartParkingEvent(
type = "SpotFree") as endEvent;
#Name("measurement_occupation")
context ParkingSpotOccupation
insert into CreateMeasurement
select
e.source as source,
"ParkingSpotOccupation" as type,
{
"startDate", min(e.time),
"endDate", max(e.time),
"duration", dateDifferenceInSec(max(e.time), min(e.time))
} as fragments
from
SmartParkingEvent e
output
snapshot when terminated;
I got the same data for min and max so I'm guessing I'm doing somthing wrong.
When I'm using context.ContextBorders.startEvent.time and context.ContextBorders.endEvent.time instead of min and max, the measurement_occupation statement is not triggered.
Given that measurements have already been computed by the EPL that you provided, this counts the number of times the spot has been taken (and freed) and totals up the duration:
select source, count(*), sum(duration) from CreateMeasurement group by source

neo4j cypher - Differing query plan behavior

Nodes with the Location node label have an index on Label.name
Profiling the following query gives me a smart plan, with a NodeHashJoin between the two sides of the graph on either side of Trip nodes. Very clever. Works great.
PROFILE MATCH (rosen:Location)<-[:OCCURS_AT]-(ev:Event)<-[:HAS]-(trip:Trip)-[:OPERATES_ON]->(date:Date)
WHERE rosen.name STARTS WITH "U Rosent" AND
ev.scheduled_departure_time > "07:45:00" AND
date.date = '2015-11-20'
RETURN rosen.name, ev.scheduled_departure_time, trip.headsign
ORDER BY ev.scheduled_departure_time
LIMIT 20;
However, just changing one line of the query from:
WHERE rosen.name STARTS WITH "U Rosent" AND
to
WHERE id(rosen) = 4752371 AND
seems to alter the entire behavior of the query plan, which now appears to become more "sequential", losing the parallel execution of (Trip)-[:OPERATES_ON]->(Date)
Much slower. 6x more DB hits in total.
Question
Why does changing the retrieval of one, seemingly-unrelated Location node via a different index/mechanism alter the behavior of the whole query?
(I'm not sure how best to convey more information about the graph model, but please advise, and I'd be happy to add details that are missing)
Edit:
It gets better. Changing that query line from:
WHERE rosen.name STARTS WITH "U Rosent" AND
to
WHERE rosen.name = "U Rosenthaler Platz." AND
results in the same loss of parallelism in the query plan!
Seems odd that a LIKE query is faster than an = ?

How to find nodes being contained in a node's properties interval?

I'm currently developing some kind of a configurator using neo4j as a backend. Now I ran into a problem, I don't know how to solve best.
I've got nodes created like this:
(A:Product {name:'ProductA', minWidth:20, maxWidth:200, minHeight:10, maxHeight:400})
(B:Product {name:'ProductB', minWidth:40, maxWidth:100, minHeight:20, maxHeight:300})
...
There is an interface where the user can input a desired width & height, f.e. Width=30, Height=250. Now I'd like to check which products match the input criteria. As the input might be any long value, the approach used in http://neo4j.com/blog/modeling-a-multilevel-index-in-neoj4/ with dates doesn't seem to be suitable for me. How can I run a cypher query giving me all the nodes matching the input criteria?
I don't know if I understand well what you are asking for, but if I do, here a simple query to get this:
Assuming the user wants width = 30 and height = 50
Match (p:Product)
WHERE
p.minWidth < 30 AND p.maxWidth > 30 AND
p.minHeight < 50 AND p.maxHeight > 50
RETURN
p
If this is not what you are looking for, feel free to say it as comment.

specific query with cypher

I need help with specific query. I am using neo4j. My database consists of companies (nodes) and transactions between them(relationship). Each relationship(PAID) has properties:
amount- for amount of transaction
year - year of transaction
month - month of transaction
What I need, is to find all cycles in a graph, starting at node A. It must also be true that transaction occurred one after another.
So valid example would be A PIAD B in march, B PAID C in april, C PAID A in june.
So is there any way to get all cycles from node A, so that transactions occur in continuous order?
You may want to set up a sample graph at Neo4j console to share or at least tell more about what version of Neo4j you are using, but if you're using 2.0 and if you store year and month as long or integer, then maybe you could try something like
MATCH a-[ab:PAID]->b-[bc:PAID]->c-[ca:PAID]->a
WHERE (ab.year + ab.month) > (bc.year + bc.month) > (ca.year + ca.month)
RETURN a,b,c
EDIT:
Actually that was hasty, the additions won't work that way of course, but the structure should be ok. Maybe
WHERE ((ab.year > bc.year) or (ab.year = bc.year AND ab.month > bc.month))
AND ((bc.year > ca.year) OR (bc.year = ca.year AND bc.month > ca.month))
or
WHERE (ab.year * 12 + ab.month) > (bc.year * 12 + bc.month) > (ca.year * 12 + ca.month)
If you only use dates for this type of comparison, consider storing them as one property, perhaps as milliseconds since 'epoch' 1/1 -70 GMT. That makes comparisons very easy. But if you need to return and display dates frequently, then keeping them separate might make sense.
EDIT2:
I can't think of a way to build your condition of "r1.date < r2.date" into the pattern, which means matching all variable depth cycles and then discarding some (most) of them. That's wont to become expensive in a large graph, and you may be better off building a traversal or server plugin, which can make complex iterative decisions during the traversal. In 2.0, thanks to Wes' elegant collection slicing, you could try something like this
MATCH path=a-[ab:PAID*..10]->a
WHERE ALL (ix IN range(0,length(ab)-2)
WHERE ((ab[ix]).year * 12 +(ab[ix]).month)<((ab[ix+1]).year * 12 +(ab[ix+1]).month))
RETURN path
The same could probably be achieved in 1.9 with HEAD() and TAIL(). Again, share sample data in a console and maybe someone else can pitch in.

Resources