How to Batching ESPER EPL events - esper

I'm trying to batch Notification events like this and I'm getting one Notifications event with a single notification event. Can anyone help me?
Thnks in advance.
Relevant Statements
INSERT INTO Notification SELECT d.id as id,a.stationId as stationId,d.firebaseToken as firebaseToken, d.position as devicePos,a.location as stationPos,a.levelNumber as levelNumber,a.levelName as levelName FROM AirQualityAlert.win:time(3sec) as a, device.win:time(3sec) as d WHERE d.position.distance(a.location) < 300
INSERT INTO Notifications SELECT * FROM Notification.std:groupwin(id).win:time_batch(20sec) for grouped_delivery(id)

This solution delivers a row per 'id' that contain a column with the list of events.
create context Batch20Sec start #now end after 20 sec;
context Batch20Sec select id, window(*) as data
from Notifications#keepall
group by id
output all when terminated;
I think that is what you want.

Related

Does anybody have a KSQL query that counts event in a topic on a per hour basis?

I am new to KSQL and I am trying to get the counts of the events in a topic group per hour. If not I would settle for counting the events in the topic. Then I could change the query to work in a windowing basis. The timestamp is the
To give more context let's assume my topic is called messenger and the events are in JSON format. And here is a sample message:
{"name":"Won","message":"This message is from Won","ets":1642703358124}
Partition:0 Offset:69 Timestamp:1642703359427
First create a stream over the topic:
CREATE STREAM my_stream (NAME VARCHAR, MESSAGE VARCHAR)
WITH (KAFKA_TOPIC='my_topic', FORMAT='JSON');
Then use a TUMBLING window aggregation and a dummy GROUP BY field:
SELECT TIMESTAMPTOSTRING(WINDOWSTART,'yyyy-MM-dd HH:mm:ss','Europe/London')
AS WINDOW_START_TS,
COUNT(*) AS RECORD_CT
FROM my_stream
WINDOW TUMBLING (SIZE 1 HOURS)
GROUP BY 1
EMIT CHANGES;
If you want to override the timestamp being picked up from the message timestamp and use a custom timestamp field (I can see ets in your sample) you would do that in the stream definition:
CREATE STREAM my_stream (NAME VARCHAR, MESSAGE VARCHAR, ETS BIGINT)
WITH (KAFKA_TOPIC='my_topic', FORMAT='JSON', TIMESTAMP='ets');
Ref: https://rmoff.net/2020/09/08/counting-the-number-of-messages-in-a-kafka-topic/

Esper - concatenate values from multiple rows to a list

I have an Esper query that returns multiple rows, but I'd like to instead get one row, where that row has a list (or concatenated string) of all of the values from the (corresponding columns of the) matching rows that my current query returns.
For example:
SELECT Name, avg(latency) as avgLatency
FROM MyStream.win:time(5 min)
GROUP BY Name
HAVING avgLatency / 1000 > 60
OUTPUT last every 5 min
Returns:
Name avgLatency
---- ----------
A 65
B 70
C 75
What I'd really like:
Name
----
{A, B, C}
Is this possible to do via the query itself? I tried to make this work using subqueries, but I'm not working with multiple streams. I can't find any aggregation functions or enumeration functions in the Esper documentation that fits what I'm trying to do either.
Thanks to anybody that has any insight or direction for me here.
EDIT:
If this can't be done via the query, I'm open to changing the subscriber, or anything else, if necessary.
You can have a subscriber or listener do the concat. There is a "Multi-Row Delivery" for subscribers. Or use a table like below.
// create table to hold aggregation result
create table LatencyTable(name string primary key, avgLatency avg(double));
// update aggregations in table from events coming in
into LatencyTable select name, avg(latency) as avgLatency from MyStream#time(5 min) group by name;
// do a select with the "aggregate" enumeration method
select (select * from LatencyTable where avgLatency > x).aggregate(....) from pattern[every timer:interval(5 min)]

Esper output clause with named windows

I have a question about using the output clause in combination with a named window, patterns and the insert into statement.
My goal is to detect the absence of an event, store it in a named window and when the events start coming again select and delete the row and use that as an "online" indicator (see Esper - detect event after absence)
I somehow want to be able limit the rate of events when there are multiple offline - online events in a short period of time (disable false positives). I thought the output clause could help here but when I use that on the insert into statement no events are stored in the named window. Is this the right approach or is there an other way to limit the events in this scenario?
This is my code in Esper EPL online:
create schema MonitorStats(id string, time string, host string, alert string);
create window MonitorWindow.win:keepall() as select id, alert, time, host from MonitorStats;
insert into MonitorWindow select a.id as id, 'offline' as alert, a.time as time, a.host as host from pattern
[every a=MonitorStats(id='1234') -> (timer:interval(25 sec) and not MonitorStats(id=a.id))];
on pattern[every b=MonitorStats(id='1234') -> (timer:interval(25 sec) and MonitorStats(id=b.id))]
select and delete b.id as id, 'online' as alert, b.time as time, b.host as host from MonitorWindow as win
where win.id = b.id;
You may insert output events into another stream and use "output first every".
insert into DetectedStream select ....;
select * from DetectedStream output first every 10 seconds;

Transform SQL JOIN SELECT to Esper EPL syntax

Let's consider a simple object with the same representation in a SQL database with properties(columns¨): Id, UserId,Ip.
I would like to prepare a query that would generate event in case that one user logs in from 2 IP adresses (or more) within 1 hour period.
My SQL looks like:
SELECT id,user_id,ip FROM w_log log
LEFT JOIN
(SELECT user_id, count(distinct ip) AS ip_count FROM w_log GROUP BY user_id) ips
ON log.user_id = ips.user_id
WHERE ips.ip_count > 1
Transformation to EPL:
SELECT * FROM LogEntry.win:time(1 hour) logs LEFT INNER join
(select UserId,count(distinct Ip) as IpCount FROM LogEntry.win:time(1 hour)) ips
ON logs.UserId = ips.UserId where ips.IpCount>1
Exception:
Additional information: Incorrect syntax near '(' at line 1 column 100,
please check the outer join within the from clause near reserved keyword 'select'
UPDATE:
I was successfuly able to create a schema, named window and insert data into it (or update it). I would like to increase the counter when a new LogEvent arrives in the .win:time(10 seconds) and decrease it when the event is leaving the 10 seconds window. Unfortunately the istream() doesn't seem to provide the true/false when the event is in remove stream.
create schema IpCountRec as (ip string, hitCount int)
create window IpCountWindow.win:time(10 seconds) as IpCountRec
on LogEvent.win:time(10 seconds) log
merge IpCountWindow ipc
where ipc.ip = log.ip
when matched and istream()
then update set hitCount = hitCount + 1
when matched and not istream()
then update set hitCount = hitCount - 1
when not matched
then insert select ip, 1 as hitCount
Is there something I missed?
In EPL I don't think it is possible to put a query into the from-part. You can change using "insert into". An EPL alternative is also a named window or table.

esper how to find ID that exists in streamA but NOT EXITS in streamB

The problem is pretty simple: extract only the not exists records from 2 different streams using Esper engine.
ID exists in streamA but NOT EXITS in streamB.
In SQL it would look like this:
SELECT *
FROM tableA
WHERE NOT EXISTS (SELECT *
FROM tableB
WHERE tableA.Id = tableB.Id)
I've tried it Esper style but it doesn't work:
SELECT *
FROM streamA.win:ext_timed(timestamp, 5 seconds) as stream_A
WHERE NOT EXSITS
(SELECT stream_B.Id
FROM streamB.win:ext_timed(timestamp, 5 seconds) as stream_B
WHERE stream_A.Id = stream_B.Id)
Sadly if stream_A.Id inserted before stream_B.id than it will answer the query conditions and the query won't work.
Any suggestions on how to identify "ID exists in streamA but NOT EXITS in streamB" using Esper?
One simple way is to time-order the stream, so that A and B are timestamp ordered before sending events in.
Or you could delay A such as this query:
select * from pattern [every a=streamA -> timer:interval(1 sec)] as delayed_a
where not exists (... where delayed_a.a.id = b.id)
There is no need for an externally timed window for streamA. For externally timed behavior in general use external timer events.

Resources