Esper Grouping Causes Duplicates - esper

I have a basic Esper query as follows:
#Name("MyTestQuery")
#Description("My First Test Query")
select sum(qty), venue
from MyTestWindow
group by venue
The query seems to duplicate the results of my sum i.e. if I send in a qty of 10 my query will fire multiple times and output:
10, 20, 30, 40
However, if I remove the group by function then it just outputs 10.
Is anyone able to advise why this might happen?

Typically you need to qualify the Stream name (MyTestWindow) with a window, so it is
"from MyTestWindow.win:time(1 sec) ". You need to select an appropriate window type from many Epser offers, depending on your application.
This example:
select sum(qty), venue
from MyTestWindow.win:time_batch(1 sec)
group by venue
having sum(qty) is not null
You can run a simple test of this at http://esper-epl-tryout.appspot.com/epltryout/mainform.html

the best way of doing a group by is to trigger an artificial "event" after sending in all events. this way you can fully control what you want you want to output and not let Esper's engine run in real time.

You might have to use the "distinct" feature in select to avoid duplicates. Esper can sometimes create duplicate events when you aren't using trigger variables, so distinct will allow you to get rid of unwanted events.

You can use win:time_batch to specified time interval in one update and coalesce function to handle the null value
select venue, sum(coalesce(ty, 0))
from MyTestWindow.win:time_batch(1 sec)
group by venue

Related

How to count the number of two events over a period of time?

Here are two events:AppStartEvent and AppCrashEvent.
I need to count the number of two events over a period of time, and then to calculate the count(AppStartEvent)/count(AppCrashEvent).
My EPL is here
create context ctx4NestInCR
context ctx4Time initiated #now and pattern [every timer:interval(1 minute)] terminated after 15 minutes,
context ctx4AppName partition by appName from AppStartEvent, appName from AppCrashEvent
<------------------->
context ctx4NestInCR select count(s),count(c) from AppStartEvent as s, AppCrashEvent as c output last when terminated
And it does not work
Error starting statement: Joins require that at least one view is specified for each stream, no view was specified for s
Your post doesn't have the join? It only has the context and that wouldn't produce the message. I would suggest to correct the post.
You can also join streams by merging the two streams and treating them as one.
insert into AppEvent select 'crash' as condition from AppCrashEvent;
insert into AppEvent select 'start' as condition from AppStartEvent;
select count(condition='crash')/count(condition='start') from AppEvent;

Query with difference returns no data

I've a query that uses difference function and I can't understand why it returns no data.
The query is:
SELECT
difference(FIRST(grid_power_counter)) as grid_power_consumed
FROM homesolar.origin.main GROUP BY time(15m)
If I remove the difference function it returns data:
SELECT
FIRST(grid_power_counter) as grid_power_consumed
FROM homesolar.origin.main GROUP BY time(15m)
Also, I can get results if I add a where time > now()-24h to the select with difference function.
I really can't understand that behavior. Can someone help me?
Q: My query would only work if I add the where filter to it. Why is that so?
Quoted from influxdb's Groupby time doc:
Basic GROUP BY time() queries require an InfluxQL function in the
SELECT clause and a time range in the WHERE clause.
I suspect your first DIFFERENCE query didn't work because it was missing the mandatory WHERE filter for the Groupby time(...) function.
The Group by time() clause could be returning no rows and hence not.
This could potentially be a github issue for the influx team as I think their query parser should be complaining to you about the missing where filter for Group by time.
References:
https://docs.influxdata.com/influxdb/v1.5/query_language/data_exploration/#the-group-by-clause

InfluxDB mixing agregation function with non-aggregat fields/values

I have a following issue:
I need to calculate difference between consecutive points where some arbitrary ID is equal. The following:
SELECT difference(value_field) FROM mesurementName WHERE "IdField" = '10'
Works, returns difference between each consecutive point with IdField BUT IdField is lost (only time is propagated to query result). In my case time is not unique (i.e. measurement may contain many points with same timestamp, but different IdField). So I tried:
SELECT difference(value_field), IdField FROM mesurementName WHERE "IdField" = '10'
which yields:
error parsing query: mixing aggregate and non-aggregate queries is not supported!!
My next attempt was using sub-query:
SELECT IdField, diff
FROM (
SELECT
difference(flow_val) as diff
FROM
mesurementA
WHERE "IdField" = '10'
)
Which resulted in always null value in IdField.
I'd like to ask you for help or suggestion how to solve issue. By the way, we are using InfluxDB 1.3, which is not supporting JOIN anymore
If anyone would stuck as I was, then solution is following:
SELECT difference(value_field) FROM mesurementName GROUP BY "IdField"
Above somehow implicitly add "IdField" to result series and is propagated to resulting measurements with INTO clause

Esper very simple context and aggregation

I have a quite simple problem to modelize and I don't have experience in Esper, so I may be headed the wrong way so I'd like some insight.
Here's the scenario: I have one stream of events "ParkingEvent", with two types of events "SpotTaken" and "SpotFree". So I have an Esper context both partitioned by id and bordered by a starting event of type "SpotTaken" and an end event of type "SpotFree". The idea is to monitor a parking spot with a sensor and then aggregate data to count the number of times the spot has been taken and also the time occupation.
That's it, no time window or whatsoever, so it seems quite simple but I struggle aggregating data. Here's the code I got so far:
create context ParkingSpotOccupation
context PartionBySource
partition by source from SmartParkingEvent,
context ContextBorders
initiated by SmartParkingEvent(
type = "SpotTaken") as startEvent
terminated by SmartParkingEvent(
type = "SpotFree") as endEvent;
#Name("measurement_occupation")
context ParkingSpotOccupation
insert into CreateMeasurement
select
e.source as source,
"ParkingSpotOccupation" as type,
{
"startDate", min(e.time),
"endDate", max(e.time),
"duration", dateDifferenceInSec(max(e.time), min(e.time))
} as fragments
from
SmartParkingEvent e
output
snapshot when terminated;
I got the same data for min and max so I'm guessing I'm doing somthing wrong.
When I'm using context.ContextBorders.startEvent.time and context.ContextBorders.endEvent.time instead of min and max, the measurement_occupation statement is not triggered.
Given that measurements have already been computed by the EPL that you provided, this counts the number of times the spot has been taken (and freed) and totals up the duration:
select source, count(*), sum(duration) from CreateMeasurement group by source

SQLite slow query on iPad

I have a table with almost 300K records in it. I run a simple select statement with a where clause on an indexed column ('type' is indexed):
SELECT *
FROM Asset_Spec
WHERE type = 'County'
That query is fast - about 1 second. Additionally I want to test against status:
SELECT *
FROM Asset_Spec
WHERE type = 'County'
AND status = 'Active'
The second one is VERY slow (minutes). Status is NOT indexed and in this particular case 99.9% of values in the db ARE 'Active'.
Any ideas how I can get better performance? We are compiling our own version of SQLite so I can tweak many settings (FYI - same performance on iOS pre-canned SQLite)
I looked at the query plan and the estimate for number of rows was off by an astounding amount. Asset_Spec (~2 rows) - actual number of rows is almost 300,000. Ran 'ANALYZE' - now the same query runs in 16ms.
the first thing I would try is using a subquery
SELECT * FROM
(SELECT *
FROM Asset_Spec
WHERE type = 'County')
WHERE status = 'Active'
and as Robert suggests, adding an index on any column you want to filter by is a good idea. I'd also consider changing fields Type and Status to be something other than string.
Any reason you need to select *?
Suggestions:
Do you need to retrieve multiple records? If all you need is the first record found, then add "limit 1" to the end of the query.
If you're just checking for the existence of a row, i.e. you only need to know that there is one row with status active, then "select 1" instead of "select *".

Resources