Filtering by aggregate function - esper

I am trying to raise an event when the average value of a field is over a threshold for a minute. I have the object defined as:
class Heartbeat
{
public string Name;
public int Heartbeat;
}
My condition is defined as
select avg(Heartbeat) , Name
from Heartbeat.std:groupwin(Name).win:time(60 sec)
having avg(Heartbeat) > 100
However, the event never gets fired despite the fact that I fire a number of events with the Heartbeat value over 100. Any suggestions on what I have done wrong?
Thanks in advance

It confuses many people, but since time is the same for all groups you can simplify the query and remove the groupwin. The documentation note in this section explains why: http://esper.codehaus.org/esper-4.11.0/doc/reference/en-US/html_single/index.html#view-std-groupwin
The semantics with or without groupwin are the same.
I think you want group-by (and not groupwin) since group-by controls the aggregation level and groupwin controls the data window level.
New query:
select avg(Heartbeat) , Name from Heartbeat.win:time(60 sec) group by Name having avg(Heartbeat) > 100

Related

Neo4j Cypher NULL values and descending sorting

I have an auditing filed for all of my entities:
createDate
updateDate
I always initialize createDate during the entity creation but updateDate can contain NULL until the first update.
I have to implement sorting feature over these fields.
With createDate everything works fine but with updateDate I have issues.
In case of a mixed set of NULLs and Dates in updateDate during the descending sort, I have the NULLs first and this is not the something I'm expecting here.
I understand that according to the Neo4j documentation, this is an expecting behavior - When sorting the result set, null will always come at the end of the result set for ascending sorting, and first when doing descending sort. but I don't know right now how to implement the proper sorting from the user perspective where the user will see the latest updated documents at the top of the list. Some time ago I have even created GitHub issue for this feature https://github.com/opencypher/openCypher/issues/238
One workaround I can see here - is to populate also updateDate together with createDate during the entity creation but I really hate this solution.
Are there any other solutions in order to properly implement this task?
You can try using the coalesce() function. It will return the first non-null value in the list of expressions passed to it.
MATCH (n:Node)
RETURN n
ORDER BY coalesce(n.updateDate, 0) DESC
EDIT:
From comments:
on the database level it is something like this: "updateDate":
"2017-09-07T22:27:11.012Z". On the SDN4 level it is a Java -
java.util.Date type
In this case you can change the 0 by a date representing an Start-Of-Time constant (like "1970-01-01T00:00:00.000Z").
MATCH (n:Node)
RETURN n
ORDER BY coalesce(n.updateDate, "1970-01-01T00:00:00.000Z") DESC
I'd just use the createDate as the updateDate when updateDate IS NULL:
MATCH (n:Node)
RETURN n
ORDER BY coalesce(n.updateDate, n.createDate) DESC
You may want to consider storing your ISO 8601 timestamp strings as (millisecond) integers instead. That could make most queries that involve datetime manipulations more efficient (or even possible), and would also use up less DB space compared to the equivalent string.
One way to do that conversion is to use the APOC function apoc.date.parse. For example, this converts 2017-09-07T22:27:11.012Z to an integer (in millisecond units):
apoc.date.parse('2017-09-07T22:27:11.012Z', 'ms', "yyyy-MM-dd'T'HH:mm:ss.SSSX")
With this change to your data model, you could also initialize updateDate to 0 at node creation time. This would allow you to avoid having to use COALESCE(n.updateDate, 0) for sorting purposes (as suggested by #Bruno Peres),
and the 0 value would serve as an indication that the node was never updated.
(But the drawback would be that all nodes would have an updateDate property, even the ones that were never updated.)

Esper very simple context and aggregation

I have a quite simple problem to modelize and I don't have experience in Esper, so I may be headed the wrong way so I'd like some insight.
Here's the scenario: I have one stream of events "ParkingEvent", with two types of events "SpotTaken" and "SpotFree". So I have an Esper context both partitioned by id and bordered by a starting event of type "SpotTaken" and an end event of type "SpotFree". The idea is to monitor a parking spot with a sensor and then aggregate data to count the number of times the spot has been taken and also the time occupation.
That's it, no time window or whatsoever, so it seems quite simple but I struggle aggregating data. Here's the code I got so far:
create context ParkingSpotOccupation
context PartionBySource
partition by source from SmartParkingEvent,
context ContextBorders
initiated by SmartParkingEvent(
type = "SpotTaken") as startEvent
terminated by SmartParkingEvent(
type = "SpotFree") as endEvent;
#Name("measurement_occupation")
context ParkingSpotOccupation
insert into CreateMeasurement
select
e.source as source,
"ParkingSpotOccupation" as type,
{
"startDate", min(e.time),
"endDate", max(e.time),
"duration", dateDifferenceInSec(max(e.time), min(e.time))
} as fragments
from
SmartParkingEvent e
output
snapshot when terminated;
I got the same data for min and max so I'm guessing I'm doing somthing wrong.
When I'm using context.ContextBorders.startEvent.time and context.ContextBorders.endEvent.time instead of min and max, the measurement_occupation statement is not triggered.
Given that measurements have already been computed by the EPL that you provided, this counts the number of times the spot has been taken (and freed) and totals up the duration:
select source, count(*), sum(duration) from CreateMeasurement group by source

Esper EPL statement each time a value has increased a multiple

I am looking for an EPL statement which fires an event each time a certain value has increased by a specified amount, with any number of events in between, for example:
Considering a stream, which continuously provides new prices.
I want to get a notification, e.g. if the price is greater than the first price + 100. Something like
select * from pattern[a=StockTick -> every b=StockTick(b.price>=a.price+100)];
But how to realize that I get the next event(s), if the increase is >= 200, >=300 and so forth?
Diverse tests with context and windows has not been successful so far, so I appreciate any help! Thanks!
The contexts would be the right way to go.
You could start by defining a start event like this:
create schema StartEvent(threshold int);
And then have context that uses the start event:
create context ThresholdContext inititiated by StartEvent as se
terminated after 5 years
context ThresholdContext select * from pattern[a=StockTick -> every b=StockTick(b.price>=context.se.threshold)];
You can generate the StartEvent using "insert into" from the same pattern (probably want to remove the "every") or have the listener send in a StartEvent or declare another pattern that fires just once for creating a StartEvent.

Esper Grouping Causes Duplicates

I have a basic Esper query as follows:
#Name("MyTestQuery")
#Description("My First Test Query")
select sum(qty), venue
from MyTestWindow
group by venue
The query seems to duplicate the results of my sum i.e. if I send in a qty of 10 my query will fire multiple times and output:
10, 20, 30, 40
However, if I remove the group by function then it just outputs 10.
Is anyone able to advise why this might happen?
Typically you need to qualify the Stream name (MyTestWindow) with a window, so it is
"from MyTestWindow.win:time(1 sec) ". You need to select an appropriate window type from many Epser offers, depending on your application.
This example:
select sum(qty), venue
from MyTestWindow.win:time_batch(1 sec)
group by venue
having sum(qty) is not null
You can run a simple test of this at http://esper-epl-tryout.appspot.com/epltryout/mainform.html
the best way of doing a group by is to trigger an artificial "event" after sending in all events. this way you can fully control what you want you want to output and not let Esper's engine run in real time.
You might have to use the "distinct" feature in select to avoid duplicates. Esper can sometimes create duplicate events when you aren't using trigger variables, so distinct will allow you to get rid of unwanted events.
You can use win:time_batch to specified time interval in one update and coalesce function to handle the null value
select venue, sum(coalesce(ty, 0))
from MyTestWindow.win:time_batch(1 sec)
group by venue

How can I remove elements from a stream

I currently have an order object. We can assume it has three fields called orderId, state and price.
class Order
{
public int orderId;
public String state;
public int filled;
}
Through out the life time of the order the state and filled quantity will change. Each time there is a field change we push it to the esper runtime via:
Order o .....;
epService.EPRuntime.SendEvent(o);
Now each time the order is added via SendEvent its a different object than the previous order object( ie not a reference). This means the old order object should no longer be in stream for statements to see
I would like statements like the one below to only operate on the most recent version of the Order in the stream, ie conceptually there should only be one order object for each physical order in the stream.
"select filled from OrderStream.win:keepall() where orderId= 1234"
Is there a way to remove old Order objects?
Can I use a reference so I just update the old order object and then push it again?
Is there another way??
I'm currently using Nesper
You could create a named window that holds unique events (older duplicates are evicted)
something like :
"create window OrderWin.std:unique(orderId) as Order"
"insert into "OrderWin select * from Order"
"select * from OrderWin where ..."
Another answer is to change the window. Keeping all events probably isn't what you wanted. Try using a different window, for example...
Let's say you only wanted the last trade per symbol. You could do the following:
select * from tradeEvent.std:unique(symbol)
This keeps only the last event for each event matching a given symbol.

Resources