How to keep distinctness until specific condition is true on EPL ESPER - esper

I have two event types (A and B). I would like to write a pattern which detects every B event which comes after A event with the same id: every A -> B (A.id = B.id). However, it should be distinct ids. In other words, the pattern should ignore all A events with the same id after the first one until the above expression is true, means B event with the same id comes.
For example, Assume this is an event stream:
1. A (id: 1); 2. A(id: 2); 3. A (id: 3); 4. A(id: 1); 5. A (id: 2); 6. B (id: 1); 7. B (id: 2); 8. A (id: 1); 9. B(id: 3); 10. A (id: 1); 11. B (id: 1)
The pattern should ignore event No4 as it has the same id with event No1. When the event No6 comes, the pattern should match 1. A (id: 1) -> 6. B (id: 1). Then, the pattern should allow new A event with id=1. So event No8 should not be ignored, but event No10 should be ignored. When event No11 comes, the pattern should match again 8. A(id: 1) -> 11. B(id: 1).
Besides, event No2 should match with event No7 and event No3 should match event No9.
I have tried to use EVERY-DISTINCT(A.id) A -> B (A.id=B.id), but it ignores all A events with the same id after the first one. Then I tried EVERY (A -> B (A.id = B.id)), but it didn't work either as it ignores all A events regardless of id until B event with the same id comes.

You can have every A -> B (A.id = B.id) statement within PATTERN statement and add additionally add #SuppressOverlappingMatches right after PATTERN keyword.
Complete statement will be something like this:
SELECT b.id FROM PATTERN #SuppressOverlappingMatches [every a=A -> b=B (a.id = b.id)]
Reference: http://www.espertech.com/esper/release-5.5.0/esper-reference/html/event_patterns.html#patterns-howto-suppress

Related

Esper statement to check, if A is followed by B without any other As in between

I have the two events: A and B. Everytime A occurs, B have to occur afterwards without any As in between.
Has anybody got an idea, how to implement this? I thought about something like
pattern[every A -> A until B]
But this statement is true, even if A is followed B without any other As in between. But it should only be true in case of AAB or AAAAB and so on..
Thank you for your help.
One possible solution is the pattern
A -> (A and not B)
Doing so the query is only true, when the rule is violated. But if it is fulfilled, I don't get any hint.
Is there a better solution?
Match-recognize pattern matching has immediately-followed-by-semantics. You could do something like this:
create schema A();
create schema B();
select * from pattern[every a=A or every b=B]
match_recognize (
measures p1 as a, p2 as b
pattern (p1 p2)
define
p1 as typeof(p1.a) = 'A',
p2 as typeof(p2.b) = 'B'
)
Or you could use an approach with insert-into.
insert into CombinedStream select id, 'a' as type from A;
insert into CombinedStream select id, 'b' as type from B;
select * from CombinedStream
match_recognize (
measures a as a, b as b
pattern (a b)
define
a as a.type = 'A',
b as b.type = 'B'
)
And when you want to go with EPL pattern langague that can also work. EPL patterns always add and remove from filter indexes and that can be less performant depending on how many incoming events are matched and unmatched/discarded (i.e. per-event-analysis versus need-in-a-haystack)
every A -> (B and not A) // read: every A followed by B and not A

Get shortest circle path without repeating any nodes

I need a query that will get me the shortest circle path between nodes (so if there are multiple paths just returns the shortest one). In addition, these paths shouldn't contain repeated nodes. Examples:
In this case, if I pass "Item B" as input, I should receive the path "Item B -> Item C -> Item E -> Item B" since the other path "Item B -> Item C -> Item A -> Item C -> Item E - Item B" not only is longer but also contains repeated nodes (Item C)
Using the same picture, if I pass "Item A" as input, I should receive the path "Item A -> Item C -> Item A"
In addition, it would be nice if the response could include all the nodes involved, without repeating the starting and final node that is the same in all cases.
Thanks in advance!
Try something like:
MATCH (n:Node{id:"a"})
MATCH p=(n)-[*..20]->(n)
WITH p, length(p) as len
ORDER by len ASC LIMIT 1
UNWIND nodes(p) as node
RETURN distinct node
Not sure how well it scales though, note that I added a filter that checks for paths only 20 or fewer hops away.

Neo4j - Find missing node to complete circle

I am trying to get a query that starting from a node, it returns the missing node that, when making a new relation to it, would complete a circle. Also it should respond which is the node that, if the circle is close, will end up having a relationship with the input node. Example:
Let's say I have B -> C and C -> A. In this case, if I pass A as input, I would like to receive { newRelationToMake: B, relationToInputNode: C } as a result, since connecting A -> B will result in a closed circle ABC and the relation that the node A will be having will come from C.
Ideally, this query should work for a maximum of n depths. For example for a depth of 4, with relations B -> C, C -> D and D -> A, and I pass A as input, I would need to receive { newRelationToMake: C, relationToInputNode: D} (since if I connect A -> C I close the ACD circle) but also receive {newRelationToMake: B, relationToInputNode: D }(since if I connect A -> B I would close the ABCD circle).
Is there any query to get this information?
Thanks in advance!
You are basically asking for all distinct nodes on paths leading to A, but which are not directly connected to A.
Here is one approach (assuming the nodes all have a Foo label and the relationships all have the BAR type):
MATCH (f:Foo)-[:BAR*2..]->(a:Foo)
WHERE a.id = 'A' AND NOT EXISTS((f)-[:BAR]->(a))
RETURN DISTINCT f AS missingNodes
The variable-length relationship pattern [:BAR*2..] looks for all paths of length 2 or more.

Neo4j mixing nodes with and without extended relationships

I have a simple social stream where few nodes and relationships look like the following:
row 1: (a) -> (stream) -> (d)
row 2: (a) -> (stream) -> (d) -> (source)
Basically I want to pull both rows but put some restrictions on the type of source, for example I currently use this:
MATCH (a)-[]->(stream)-[]->(d)
OPTIONAL MATCH (d)-[]->(source)
WHERE source.x = 3
RETURN stream, d, source
This works well. Almost. When source.x = 3, I get both the rows. Which is perfect. But when source.x != 3, I want the query to ignore the second row, but because of optional match the row still appears.
When source.x = 3
row 1. stream, d, null
row 2. stream, d, source
When source.x != 3
row 1. stream, d, null
row 2. stream, d, null
When source.x != 3, I want the query to ignore the second row. Because it contains the (source) node but not the one we want. The output should look like:
row 1. stream, d, null
Basically something like, if (source) does not exist show row.. if it does exist, show row only if source.x = 3
EDIT: As recommended in the comment, I have attached a simplified example.
You can pipe the results with WITH and a WHERE clause for filtering before returning them :
MATCH (me:User {name:'me'})-[:LIKES]->(transport)
OPTIONAL MATCH (transport)-[:HAS_DRIVER]->(driver:Driver {name:'Anne'})
WITH me, transport, driver
WHERE (NOT (transport)-[:HAS_DRIVER]->()) OR (NOT driver IS NULL)
RETURN me, transport, driver

NEO4J Find continual route by time

I have 40 station, identified by ID, then a I have about 30k relations between this station, realation has time property (arrival a departure time and name of line).
I need find route between station A and B, but with specific time range.
For example:
between station A and C is not direct route, you must use
A -> B -> C = means id: 1 -> 2 -> 3
I am using this query:
MATCH p=(s1:L2Station{id:1})-[r:RIDE*]->(s2:L2Station{id:3}) WHERE ALL(x in r where x.deptime>=1438605300 AND x.deptime<=1438691700)
WITH reduce(acc = [], route in rels(p)|
CASE
WHEN toInt(route.arrtime) < last(extract(b in acc| b.deptime)) THEN null
WHEN length(acc) > 0 AND last(extract(a in acc| a.rid)) = route.rid THEN acc + route
ELSE acc + route
END) as reducedRoutes
WHERE reducedRoutes is not null
return reducedRoutes, length(reducedRoutes) as len
order by len;
but this query a took about 8minutes :(
If I use this query:
MATCH p=(s1:L2Station{id:1})-[r:RIDE]->(s2:L2Station{id:3}) WHERE r.deptime>1438732800 AND r.deptime<1438819200
....
returns nothing. I am able get only station with direct route.
Can anybody help me?
Thanks
Ondra
You are probably seeing the impact of attempting to match with an unbounded path length: [r:RIDE*]. Logically, this forces neo4j to follow every possible path that starts at station 1, to see which one ends at station 2. Probably only a few of the attempted paths will ultimately match, but neo4j is forced to follow every path to the bitter end.
If possible, you should try putting an upper bound on the path length. For example, to match paths up to length 3: [r:RIDE*..3].

Resources