Transform SQL JOIN SELECT to Esper EPL syntax

Transform SQL JOIN SELECT to Esper EPL syntax - esper

Let's consider a simple object with the same representation in a SQL database with properties(columns¨): Id, UserId,Ip.
I would like to prepare a query that would generate event in case that one user logs in from 2 IP adresses (or more) within 1 hour period.
My SQL looks like:
SELECT id,user_id,ip FROM w_log log
LEFT JOIN
(SELECT user_id, count(distinct ip) AS ip_count FROM w_log GROUP BY user_id) ips
ON log.user_id = ips.user_id
WHERE ips.ip_count > 1
Transformation to EPL:
SELECT * FROM LogEntry.win:time(1 hour) logs LEFT INNER join
(select UserId,count(distinct Ip) as IpCount FROM LogEntry.win:time(1 hour)) ips
ON logs.UserId = ips.UserId where ips.IpCount>1
Exception:
Additional information: Incorrect syntax near '(' at line 1 column 100,
please check the outer join within the from clause near reserved keyword 'select'
UPDATE:
I was successfuly able to create a schema, named window and insert data into it (or update it). I would like to increase the counter when a new LogEvent arrives in the .win:time(10 seconds) and decrease it when the event is leaving the 10 seconds window. Unfortunately the istream() doesn't seem to provide the true/false when the event is in remove stream.
create schema IpCountRec as (ip string, hitCount int)
create window IpCountWindow.win:time(10 seconds) as IpCountRec
on LogEvent.win:time(10 seconds) log
merge IpCountWindow ipc
where ipc.ip = log.ip
when matched and istream()
then update set hitCount = hitCount + 1
when matched and not istream()
then update set hitCount = hitCount - 1
when not matched
then insert select ip, 1 as hitCount
Is there something I missed?

In EPL I don't think it is possible to put a query into the from-part. You can change using "insert into". An EPL alternative is also a named window or table.

Related

esper query is half executed - only part of it is actually running

we have a strange issue, where esper query is only partial executed...
select cseShutDownAlarm.INSTANCEID as INSTANCEID, swtDownAlarm.source
from Alarm(severity.getValue()=5, eventType.getValue() = 'SWT_SWITCH_DOWN').win:time(60 sec) as swtDownAlarm,
sql:stormdb['select id as INSTANCEID, source as SOURCE
from Alarm where EVENTTYPE_VALUE =\'cseShutDownNotify\' and source = ${swtDownAlarm.source} and severity !=5'] as cseShutDownAlarm,
sql:stormdb['insert into wcsdba.dyinggasp (parameter, value) VALUES (\'cseShutDownAlarm.SOURCE\', ${cseShutDownAlarm.SOURCE}) '],
sql:stormdb['insert into wcsdba.dyinggasp (parameter, value) VALUES (\'swtDownAlarm.source\', ${swtDownAlarm.source}) '] where swtDownAlarm.source = cseShutDownAlarm.SOURCE
and in the DB, we see that:
SQL> /
PARAMETER VALUE
cseShutDownAlarm.SOURCE 172.16.148.48
cseShutDownAlarm.SOURCE 172.16.148.48
SQL>
but second source (swtDownAlarm.source) is not printed...
If I switch the order then the other one will be inserted only.
any reason why it does not execute both inserts? also the rest of the condition is not checked... as the source on both are identical, but condition not fulfilled.
Thanks,
fcbman

Joins are meant to be the place for event stream joins against SQL database for purpose of querying. The "insert into" isn't a query and would never return rows. And if an inner join doesn't return rows it ends early (outer join would be right then)

Duplicates in the result of a subquery

I am trying to count distinct sessionIds from a measurement. sessionId being a tag, I count the distinct entries in a "parent" query, since distinct() doesn't works on tags.
In the subquery, I use a group by sessionId limit 1 to still benefit from the index (if there is a more efficient technique, I have ears wide open but I'd still like to understand what's going on).
I have those two variants:
> select count(distinct(sessionId)) from (select * from UserSession group by sessionId limit 1)
name: UserSession
time count
---- -----
0 3757
> select count(sessionId) from (select * from UserSession group by sessionId limit 1)
name: UserSession
time count
---- -----
0 4206
To my understanding, those should return the same number, since group by sessionId limit 1 already returns distinct sessionIds (in the form of groups).
And indeed, if I execute:
select * from UserSession group by sessionId limit 1
I have 3757 results (groups), not 4206.
In fact, as soon as I put this in a subquery and re-select fields in a parent query, some sessionIds have multiple occurrences in the final result. Not always, since there is 17549 rows in total, but some are.
This is the sign that the limit 1 is somewhat working, but some sessionId still get multiple entries when re-selected. Maybe some kind of undefined behaviour?

I can confirm that I get the same result.
In my experience using nested queries does not always deliver what you expect/want.
Depending on how you use this you could retrieve a list of all values for a tag with:
SHOW TAG VALUES FROM UserSession WITH KEY=sessionId
Or to get the cardinality (number of distinct values for a tag):
SHOW TAG VALUES EXACT CARDINALITY FROM UserSession WITH KEY=sessionId.
Which will return a single row with a single column count, containing a number. You can remove the EXACT modifier if you don't need to be exact about the result: SHOW TAG VALUES CARDINALITY on Influx Documentation.

Esper output clause with named windows

I have a question about using the output clause in combination with a named window, patterns and the insert into statement.
My goal is to detect the absence of an event, store it in a named window and when the events start coming again select and delete the row and use that as an "online" indicator (see Esper - detect event after absence)
I somehow want to be able limit the rate of events when there are multiple offline - online events in a short period of time (disable false positives). I thought the output clause could help here but when I use that on the insert into statement no events are stored in the named window. Is this the right approach or is there an other way to limit the events in this scenario?
This is my code in Esper EPL online:
create schema MonitorStats(id string, time string, host string, alert string);
create window MonitorWindow.win:keepall() as select id, alert, time, host from MonitorStats;
insert into MonitorWindow select a.id as id, 'offline' as alert, a.time as time, a.host as host from pattern
[every a=MonitorStats(id='1234') -> (timer:interval(25 sec) and not MonitorStats(id=a.id))];
on pattern[every b=MonitorStats(id='1234') -> (timer:interval(25 sec) and MonitorStats(id=b.id))]
select and delete b.id as id, 'online' as alert, b.time as time, b.host as host from MonitorWindow as win
where win.id = b.id;

You may insert output events into another stream and use "output first every".
insert into DetectedStream select ....;
select * from DetectedStream output first every 10 seconds;

bigQuery - Join

I'm trying to join two databases on the ID. The first database on price quotes does not have the data on websites, so I want to join it in from the logs database. However, in the logs database the ID is not unique, but the first chronological appearance of the ID - this is the right website.
When I run the query below, I get:
Resources exceeded during query execution.
Hence I don't know whether the problem is the code or something else.
Thanks
SELECT ID, user,busWeek, count(*) as num FROM [datastore.quotes]
Join (
select objectID, first(website) from (
select objectID, website, date from [datastore.allLogs]
order by date) group by objectID)
as Logs
on ID = objectID
group by ID,user,busWeek

Can you try:
SELECT ID, user,busWeek, count(*) as num FROM [datastore.quotes]
Join EACH (
select objectID, first(website) from (
select objectID, website, date from [datastore.allLogs]
order by date) group EACH by objectID)
as Logs
on ID = objectID
group by ID,user,busWeek
Note the 'EACH' - that keyword won't be needed in the future, but it's still useful today.

I think the issue is in ORDER BY. This brings all calculation to one node which causes "Resources Exceeded" message. I understand you need it to bring first (by date) website for each object.
Try to rewrite this select (inside join) to be partitioned.
For example using window functions with OVER(PARTITION BY ... ORDER BY)
In this case, I think, you have chance to make this in parallel
See below for reference
Window Functions

esper how to find ID that exists in streamA but NOT EXITS in streamB

The problem is pretty simple: extract only the not exists records from 2 different streams using Esper engine.
ID exists in streamA but NOT EXITS in streamB.
In SQL it would look like this:
SELECT *
FROM tableA
WHERE NOT EXISTS (SELECT *
FROM tableB
WHERE tableA.Id = tableB.Id)
I've tried it Esper style but it doesn't work:
SELECT *
FROM streamA.win:ext_timed(timestamp, 5 seconds) as stream_A
WHERE NOT EXSITS
(SELECT stream_B.Id
FROM streamB.win:ext_timed(timestamp, 5 seconds) as stream_B
WHERE stream_A.Id = stream_B.Id)
Sadly if stream_A.Id inserted before stream_B.id than it will answer the query conditions and the query won't work.
Any suggestions on how to identify "ID exists in streamA but NOT EXITS in streamB" using Esper?

One simple way is to time-order the stream, so that A and B are timestamp ordered before sending events in.
Or you could delay A such as this query:
select * from pattern [every a=streamA -> timer:interval(1 sec)] as delayed_a
where not exists (... where delayed_a.a.id = b.id)
There is no need for an externally timed window for streamA. For externally timed behavior in general use external timer events.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Transform SQL JOIN SELECT to Esper EPL syntax - esper

In EPL I don't think it is possible to put a query into the from-part. You can change using "insert into". An EPL alternative is also a named window or table.

Related

esper query is half executed - only part of it is actually running

Duplicates in the result of a subquery

Esper output clause with named windows

bigQuery - Join

esper how to find ID that exists in streamA but NOT EXITS in streamB

Categories

Resources