KSQL select one row per group that corresponds to having the least timestamp

KSQL select one row per group that corresponds to having the least timestamp - ksqldb

In KSQL is there a row_number kind of function that can be used in combination with TUMBLING WINDOW in order to group by and only get the event that corresponds to having the least timestamp within the group by thats used.

No, there isn't.
You can use the LATEST_BY_OFFSET aggregation, this may do what you're after.

Related

Updating a query for false positives

I work in a compliance role at a very small start-up and review a lot of information,for example bank transfers/direct deposits/ACHs every day. A report is pulled from BigQuery,which is exported to Google Sheets.
My question is there are a lot of false positives (basically, "posting data" that repeats often). I'm trying to eliminate it.
One idea, was just to update the query for key words:
WHERE postingdata LIKE 'PersonName%'
But it's tired and time-consuming. And I feel e there's a better way, perhaps 'filtering' the results and then feeding it back to the query. Any ideas or tips or just general thoughts?

In this case you can use group by in your query. This is how you can use this clause.
You can see this code.
SELECT account,TypeTransaction,amount,currency
FROM `tblBankTransaction`
The code returns this data, and some rows are repeated; for example, rows 1 and 7 with the account 894526972455, and it's a deposit.
In this case, I will use the group by clause.
SELECT account,TypeTransaction,amount,currency
FROM `tblBankTransaction`
group by account,TypeTransaction,amount,currency
And it returns this data:
You can see in this example that the account 894526972455 with a deposit only returns 1 row. The same account returns a second row, but is a transfer; it’s a different type of transaction. It depends on the information you have and what column you want to group.

within GS you can try UNIQUE or QUERY with group by aggregation or SORTN with mode 2 as 3rd parameter

Is it possible to retrieve only the timestamp from a influxDb query

Is it possible to pass the timestamp retuned in influxDb query to another query.
Select max("value")
from "temp" where ("floor" = "1);
Output
time max
---- ---
2020-01-17T00:00:00Z 573.44
Is it possible to pass the time from the result to another query?

You cannot do this with InfluxQL,it is not possible to nest the queries in a way that could pass the time range of the inner query to the outer query. it's another matter if you were using Flux (new Query Language but still in BETA).
In Flux this is possible, because you can access time as a column, which you can then use to query your other measurements as required. You can also use JOIN to do more advanced operations like cross measurement calculations etc.

How to measure throughput with dynamic interval in Grafana

We are measuring throughput using Grafana and Influx. Of course, we would like to measure throughput in terms how many requests, approximately, happens every single second (rps).
The typical request is:
SELECT sum("count") / 10 FROM "http_requests" GROUP BY time(10s)
But we are loosing possibility to use astonishing dynamic $__interval that very useful when graph scope is large, like a day of week. When we are changing interval we should change divider into SELECT expression.
SELECT sum("count") / $__interval FROM "http_requests" GROUP BY time($__interval)
But this approach does not work, because of empty result returns.
How to create request using dynamic $__interval for throughput measuring?

The reason you get no results is that $__interval is not a number but a string such as 10s, 1m, etc. that is understood by influxdb as a time range. So it is not possible to use it the way you are trying.
However, what you want to calculate is the mean which is available as a function in InfluxQL. The way to get the behavior that you want is with something like this.
SELECT mean("count") FROM "http_requests" GROUP BY time($__interval)
EDIT: On a second thought that is not quite what you want.
You'd probably need to use derivative. I'll come back to you on that one later.
Edit2: Do you think this answers the question that you have Calculating request per second using InfluxDB on Grafana
Edit3: Third edit's a charm.
We use your starting query and wrap it in another one as such:
SELECT sum("rps") from (SELECT sum("count") / 10 as rps FROM "http_requests" GROUP BY time(10s)) GROUP BY time($__interval)

How to find last(sum(count)) in influxdb?

Here in my db measurement, the tags are "source", "edition", "date" and the field is "count".
Each source has multiple editions. I am pushing data on the basis of editions.
I want the sum of editions for each source. I can get it by the normal query.
But here I want this for multiple dates. It is also possible by using nested functions or function of functions.
select mean(sum),last(sum) from (select sum(count) from epaper_edition where (date='2017-11-03' or date='2017-11-04' or date='2017-11-05' or date='2017-11-06') group by date,source) group by source
last() function is based on timestamp.
The last(sum(count)) results in random sum(count), not the Last one.
I have a solution by using a separate query sum(count) for the last date. I need it on one query.
Thanking you advance for giving me a better solution for this.

If you want sums of individual editions per source, you need to group by edition as well as source.
select mean(sum),last(sum)
from (select sum(count) from epaper_edition where (date='2017-11-03' or date='2017-11-04' or date='2017-11-05' or date='2017-11-06')
group by date, source, edition)
group by source, edition

abas ERP: Limit for table rows in additional database

Is there a limit for table rows in additional databases in abas erp?
If there is a limit: On which factor the limit is based, how can I calculate the limit and what happens if I try to add more lines by GUI, FO or EDP/EPI?
Can I find it documented in the abas online help? I haven't.

Yes there is a limit, which is unfortunately not customizable.
You can see a full list of know limitations under help/hd/html/49B.1.4.html
In your specific case the limit of lines in additional databases is 65535.
If you reach the limit, the abas core will show an error message and terminate your current FOP. You can (and should) get the current amount of lines by evaluating the variable tzeilen (currTabRow)

In this case I'm also not aware of any other than the one you mentioned, but you can query ozeilen in a selection list (for master files, not for i.e. sales and purchasing because the rows there aren't physically 'rows'). tzeilen (currTabRow) is buffer related.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

KSQL select one row per group that corresponds to having the least timestamp - ksqldb

In KSQL is there a row_number kind of function that can be used in combination with TUMBLING WINDOW in order to group by and only get the event that corresponds to having the least timestamp within the group by thats used.

No, there isn't. You can use the LATEST_BY_OFFSET aggregation, this may do what you're after.

Related

Updating a query for false positives

Is it possible to retrieve only the timestamp from a influxDb query

How to measure throughput with dynamic interval in Grafana

How to find last(sum(count)) in influxdb?

abas ERP: Limit for table rows in additional database

Categories

Resources