Esper EPL Query for Time(t) and Time(t-1) - esper

I am trying to implement an EPL query that can pick up the avg for Time(t) & Time(t-1).
For example:
a) in the first 5 seconds (seconds 0-5) there are 2 events with an avg of 12
b) in the next 5 seconds (seconds 5-10) there are 3 events with an avg of 23 , and in the EPL query that catches this information, I am able to also see the avg of 12 from the previous time window of the first 5 seconds
The idea I have is to stagger the objects/queries in such a way that the final epl query has a snapshot of Time(t) & Time(t-1), as seen in the virtually created object ScoreInfoBeforeAfter . However it's not working.
Any ideas would be greatly appreciated. Thanks.
~~~~
// The object being published to the Esper stream:
class ScoreEvent { int score; ... }

Looks like the keyword prior is the solution.
http://esper.codehaus.org/esper-2.1.0/doc/reference/en/html/functionreference.html
See: Section 7.1.9
In terms of the example I described in the original post, here's the corresponding solution I found. It seems to be working correctly.
INSERT INTO ScoreInfo
SELECT
'ScoreInfo' as a_Label,
average AS curAvg,
prior(1, average) AS prevAvg
FROM
ScoreEvent.win:time_batch(5 sec).stat:uni(score);
SELECT
*
FROM
ScoreInfo.win:length(1);
..
And then it's nice, because you can do stuff like this:
SELECT
'GT curAvg > prevAvg' as a_Label,
curAvg,
prevAvg
FROM
ScoreInfo.win:length(1)
WHERE
curAvg > prevAvg;
SELECT
'LTE curAvg <= prevAvg' as a_Label,
curAvg,
prevAvg
FROM
ScoreInfo.win:length(1)
WHERE
curAvg <= prevAvg;

Related

InfluxDB: How to create a continuous query to calculate delta values?

I'd like to calculate the delta values for a series of measurements stored in an InfluxDB. The values are readings from an electricity meter taken every 5 minutes. The values increase over time. Here is subset of the data to give you an idea (commands shown below are executed in the InfluxDB CLI):
> SELECT "Haushaltstromzaehler - cnt" FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z'
name: myhome_measurements
time Haushaltstromzaehler - cnt
---- --------------------------
2018-02-02T10:00:12.610811904Z 11725.638
2018-02-02T10:05:11.242021888Z 11725.673
2018-02-02T10:10:10.689827072Z 11725.707
2018-02-02T10:15:12.143326976Z 11725.736
2018-02-02T10:20:10.753357056Z 11725.768
2018-02-02T10:25:11.18448512Z 11725.803
2018-02-02T10:30:12.922032896Z 11725.837
2018-02-02T10:35:10.618788096Z 11725.867
2018-02-02T10:40:11.820355072Z 11725.9
2018-02-02T10:45:11.634203904Z 11725.928
2018-02-02T10:50:11.10436096Z 11725.95
2018-02-02T10:55:10.753853952Z 11725.973
Calculating the differences in the InfluxDB CLI is pretty straightforward with the difference() function. This gives me the electricity consumed within the 5 minutes intervals:
> SELECT difference("Haushaltstromzaehler - cnt") FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z'
name: myhome_measurements
time difference
---- ----------
2018-02-02T10:05:11.242021888Z 0.03499999999985448
2018-02-02T10:10:10.689827072Z 0.033999999999650754
2018-02-02T10:15:12.143326976Z 0.02900000000045111
2018-02-02T10:20:10.753357056Z 0.0319999999992433
2018-02-02T10:25:11.18448512Z 0.03499999999985448
2018-02-02T10:30:12.922032896Z 0.033999999999650754
2018-02-02T10:35:10.618788096Z 0.030000000000654836
2018-02-02T10:40:11.820355072Z 0.03299999999944703
2018-02-02T10:45:11.634203904Z 0.028000000000247383
2018-02-02T10:50:11.10436096Z 0.02200000000084401
2018-02-02T10:55:10.753853952Z 0.02299999999922875
Where I struggle is getting this to work in a continuous query. Here is the command I used to setup the continuous query:
CREATE CONTINUOUS QUERY cq_Haushaltstromzaehler_cnt ON myhomedb
BEGIN
SELECT difference(sum("Haushaltstromzaehler - cnt")) AS "delta" INTO "Haushaltstromzaehler_delta" FROM "myhome_measurements" GROUP BY time(1h)
END
Looking in the InfluxDB log file I see that no data is written in the new 'delta' measurement from the continuous query execution:
...finished continuous query cq_Haushaltstromzaehler_cnt, 0 points(s) written...
After much troubleshooting and experimenting I now understand why no data is generated. Setting up a continuous query requires to use the GROUP BY time() statement. This in turn requires to use an aggregate function within the differences() function. The problem now is that the aggregate function returns only one value for the time period specified by GROUP BY time(). Obviously, the differences() function cannot calculate a difference from just one value. Essentially, continuous query executes a command like this:
> SELECT difference(sum("Haushaltstromzaehler - cnt")) FROM "myhome_measurements" WHERE time >= '2018-02-02T10:00:00Z' AND time < '2018-02-02T11:00:00Z' GROUP BY time(1h)
>
I'm now somewhat clueless as to how to make this work and appreciate any advice you might have.
Does it help using the last aggregate function? Not tested this as a cq yet.
Select difference(last(T1_Consumed)) AS T1_Delta, difference(last(T2_Consumed)) AS T2_Delta
from P1Data
where time >= 1551648871000000000 group by time(1h)
DIFFERENCE() would calculate delta from the "aggregated" value taken from previous group, not within current group.
So fill free to use selector function there - since your counters seemed to be cumulative, LAST() should be working well.

Find first/last event according to time tree in Neo4j

I created a time tree (Day-Month-Year) and assigned events to it. Now I try to find the first and the last event for a user, who causes the events. This is my code to find the last event (assuming all events happen in the same month):
match (day:Day)<--(event:Event)-->(user:User{userID:"007"})
with MAX(day.Day) AS max
match (day) where day.Day=max
return day
But this query returns ALL days, and not only the one with the highest .Day-Property.
After finding the node, i will process with it, so solutions as the following are not suitable
RETURN ... ORDER BY ... DESC LIMIT 1
Thanks a lot!
Note: Time-Tree-Model is designed is shown in the picture.
Source: graphaware.com
that works:
match (day:Day)<--(event:Event)-->(user:User{UserID:"007"})
with MAX(day.Day) AS max, collect(day) as days
match (day) where day in days anD day.Day=max
return day

Query Execution Time Varies - IBM Informix - Data Studio

I am executing one SQL statement in Informix Data Studio 12.1. It takes around 50 to 60 ms for execution(One day date).
SELECT
sum( (esrt.service_price) * (esrt.confirmed_qty + esrt.pharmacy_confirm_quantity) ) AS net_amount
FROM
episode_service_rendered_tbl esrt,
patient_details_tbl pdt,
episode_details_tbl edt,
ms_mat_service_header_sp_tbl mmshst
WHERE
esrt.patient_id = pdt.patient_id
AND edt.patient_id = pdt.patient_id
AND esrt.episode_id = edt.episode_id
AND mmshst.material_service_sp_id = esrt.material_service_sp_id
AND mmshst.bill_heads_id = 1
AND esrt.delete_flag = 1
AND esrt.customer_sp_code != '0110000006'
AND pdt.patient_category_id IN(1001,1002,1003,1004,1005,1012,1013)
AND edt.episode_type ='ipd'
AND esrt.generated_date BETWEEN '2017-06-04' AND '2017-06-04';
When i am trying to execute the same by creating function it takes around 35 to 40 Seconds for execution.
Please find the code below.
CREATE FUNCTION sb_pharmacy_account_summary_report_test1(START_DATE DATE,END_DATE DATE)
RETURNING VARCHAR(100),DECIMAL(10,2);
DEFINE v_sale_credit_amt DECIMAL(10,2);
BEGIN
SELECT
sum( (esrt.service_price) * (esrt.confirmed_qty +
esrt.pharmacy_confirm_quantity) ) AS net_amount
INTO
v_sale_credit_amt
FROM
episode_service_rendered_tbl esrt,
patient_details_tbl pdt,
episode_details_tbl edt,
ms_mat_service_header_sp_tbl mmshst
WHERE
esrt.patient_id = pdt.patient_id
AND edt.patient_id = pdt.patient_id
AND esrt.episode_id = edt.episode_id
AND mmshst.material_service_sp_id = esrt.material_service_sp_id
AND mmshst.bill_heads_id = 1
AND esrt.delete_flag = 1
AND esrt.customer_sp_code != '0110000006'
AND pdt.patient_category_id IN(1001,1002,1003,1004,1005,1012,1013)
AND edt.episode_type ='ipd'
AND esrt.generated_date BETWEEN START_DATE AND END_DATE;
RETURN 'SALE CREDIT','' with resume;
RETURN 'IP SB Credit Amount',v_sale_credit_amt;
END
END FUNCTION;
Can someone tell me what is the reason for this time variation?
..in very easy words.
If you create a function the sql is parsed and stored with some optimization stuff in the database. If you call the function, optimizer knows about the sql and execute it. So optimization is done only once, if you create the function.
If you run the SQL, Optimizer parse the sql, optimizes it and then execute it, every time you execute the SQL.
This explains the time difference.
I would say the difference in time is due the parametrized query.
The first SQL has hardcoded dates values, the one in the SPL has parameters. That may cause a different query plan (e.g: which index to follow) to be applied to the query in the SPL than the one executed from Data Studio.
You can try getting the query plan (using set explain) from the first SQL and then use directives in the SPL to force the engine to use that same path.
have a look at:
https://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.perf.doc/ids_prf_554.htm
it explains how to use optimizer directives to speed up queries.

Selecting greatest date range count in a rails array

I have a database with a bunch of deviceapi entries, that have a start_date and end_date (datetime in the schema) . Typically these entries no more than 20 seconds long (end_date - start_date). I have the following setup:
data = Deviceapi.all.where("start_date > ?", DateTime.now - 2.weeks)
I need to get the hour within data that had the highest number of Deviceapi entries. To make it a bit clearer, this was my latest try on it (code is approximated, don't mind typos):
runningtotal = 0
(2.weeks / 1.hour).to_i.times do |interval|
current = data.select{ |d| d.start_time > (start_date + (1.hour * (interval - 1))) }.select{ |d| d.end_time < (start_date + (1.hour * interval)) }.count
if current > runningtotal
runningtotal = current
end
The problem: this code works just fine. So did about a dozen other incarnations of it, using .where, .select, SQL queries, etc. But it is too slow. Waaaaay too slow. Because it has to loop through every hour within 2 weeks. Then this method might need to be called itself dozens of times.
There has to be a faster way to do this, maybe a sort? I'm stumped, and I've been searching for hours with no luck. Any ideas?
To get adequate performance, you'll want to do everything in a single query, which will mean avoiding ActiveRecord functionality and doing a raw query (e.g. via ActiveRecord::Base.connection.execute).
I have no way to test it, since I have neither your data nor schema, but I think something along these lines will do what you are looking for:
select y.starting_hour, max(y.num_entries) as max_entries
from
(
select x.starting_hour, count(*) as num_entries
from
(
select date_trunc('hour', start_time) starting_hour
from deviceapi as d
) as x
group by x.starting_hour
) as y
where y.num_entries = max(y.num_entries);
The logic of this is as follows, from the inner-most query out:
"Bucket" each starting time to the hour
From the resulting table of buckets, get the total number of entries in each bucket
Get the maximum number of entries from that table, and then use that number to match back to get the starting_hour itself.
If there happen to be more than one bucket with the same number of entries, you could determine a consistent way to pick one -- say the min(starting_hour) or similar (since that would stay the same even as data gets added, assuming you are not deleting items).
If you wanted to limit the initial time slice -- I see 2 weeks referenced in your post -- you could do that in the inner-most query with a where clause bracketing the date range.

specific query with cypher

I need help with specific query. I am using neo4j. My database consists of companies (nodes) and transactions between them(relationship). Each relationship(PAID) has properties:
amount- for amount of transaction
year - year of transaction
month - month of transaction
What I need, is to find all cycles in a graph, starting at node A. It must also be true that transaction occurred one after another.
So valid example would be A PIAD B in march, B PAID C in april, C PAID A in june.
So is there any way to get all cycles from node A, so that transactions occur in continuous order?
You may want to set up a sample graph at Neo4j console to share or at least tell more about what version of Neo4j you are using, but if you're using 2.0 and if you store year and month as long or integer, then maybe you could try something like
MATCH a-[ab:PAID]->b-[bc:PAID]->c-[ca:PAID]->a
WHERE (ab.year + ab.month) > (bc.year + bc.month) > (ca.year + ca.month)
RETURN a,b,c
EDIT:
Actually that was hasty, the additions won't work that way of course, but the structure should be ok. Maybe
WHERE ((ab.year > bc.year) or (ab.year = bc.year AND ab.month > bc.month))
AND ((bc.year > ca.year) OR (bc.year = ca.year AND bc.month > ca.month))
or
WHERE (ab.year * 12 + ab.month) > (bc.year * 12 + bc.month) > (ca.year * 12 + ca.month)
If you only use dates for this type of comparison, consider storing them as one property, perhaps as milliseconds since 'epoch' 1/1 -70 GMT. That makes comparisons very easy. But if you need to return and display dates frequently, then keeping them separate might make sense.
EDIT2:
I can't think of a way to build your condition of "r1.date < r2.date" into the pattern, which means matching all variable depth cycles and then discarding some (most) of them. That's wont to become expensive in a large graph, and you may be better off building a traversal or server plugin, which can make complex iterative decisions during the traversal. In 2.0, thanks to Wes' elegant collection slicing, you could try something like this
MATCH path=a-[ab:PAID*..10]->a
WHERE ALL (ix IN range(0,length(ab)-2)
WHERE ((ab[ix]).year * 12 +(ab[ix]).month)<((ab[ix+1]).year * 12 +(ab[ix+1]).month))
RETURN path
The same could probably be achieved in 1.9 with HEAD() and TAIL(). Again, share sample data in a console and maybe someone else can pitch in.

Resources