Esper - concatenate values from multiple rows to a list - esper

I have an Esper query that returns multiple rows, but I'd like to instead get one row, where that row has a list (or concatenated string) of all of the values from the (corresponding columns of the) matching rows that my current query returns.
For example:
SELECT Name, avg(latency) as avgLatency
FROM MyStream.win:time(5 min)
GROUP BY Name
HAVING avgLatency / 1000 > 60
OUTPUT last every 5 min
Returns:
Name avgLatency
---- ----------
A 65
B 70
C 75
What I'd really like:
Name
----
{A, B, C}
Is this possible to do via the query itself? I tried to make this work using subqueries, but I'm not working with multiple streams. I can't find any aggregation functions or enumeration functions in the Esper documentation that fits what I'm trying to do either.
Thanks to anybody that has any insight or direction for me here.
EDIT:
If this can't be done via the query, I'm open to changing the subscriber, or anything else, if necessary.

You can have a subscriber or listener do the concat. There is a "Multi-Row Delivery" for subscribers. Or use a table like below.
// create table to hold aggregation result
create table LatencyTable(name string primary key, avgLatency avg(double));
// update aggregations in table from events coming in
into LatencyTable select name, avg(latency) as avgLatency from MyStream#time(5 min) group by name;
// do a select with the "aggregate" enumeration method
select (select * from LatencyTable where avgLatency > x).aggregate(....) from pattern[every timer:interval(5 min)]

Related

is it possible to get a single column output from a multi-column Query?

I'm using a QUERY function in Google Sheets. I have a named data range ("Contributions" in table on another sheet) that consists of many columns, but I'm only concerned with two of them. For simplicity sake, it looks something like this:
I have another table that contains the unique set of names (e.g.: "Fred", "Ginger", etc. each only once) and I want to extract the level # (column B) from the above table to insert the most recent (largest number) in this second table.
Right now, my query looks like this:
=QUERY(Contributions, "select B,C where C='"&A5&"' order by B desc limit 1",1)
The problem is, that it outputs both B & C data - e.g.:
11 Fred
But since I already have the name (in column A of this other table) I only want it to output the value from B - e.g.:
11
Is there a way to output only a subset (in this case 1 of 2) of the columns of output based on a directive within the query itself (as opposed to doing post-processing of the results)?
Outputting a Subset of Columns Used in Query
In order to output only certain columns of a query result, the query only needs to select the columns to be displayed while the constraints / conditions may utilize other columns of data.
For example (as an answer to my own question) - I have a table like this:
I needed to get the data from the row with a name matching another cell (on another sheet) and with the latest (largest) number - but I only want to output the number part.
My initial attempt was:
=QUERY(Contributions, "select B,C where C='"&A5&"' order by B desc limit 1",1)
But that output both B & C where I only wanted B. The answer (thanks to # Calculuswhiz) was to continue using C for the condition but only select on B:
=QUERY(Contributions, "select B where C='"&A5&"' order by B desc limit 1",1)

KSQL Group By to drop previous values and only use the LAST

I have a Kafka topic "events" which records user image votes and has json in the following structure:
{"category":"image","action":"vote","label":"amsterdam","ip":"1.1.1.1","value":2}
I need to receive on another topic the sum of all votes for the label (e.g. amsterdam) but drop any votes that came from the same IP address using only the last vote. This topic should have json in this format:
{label:”amsterdam”,SCORE:8,TOTAL:3}
SCORE is a sum of all votes and TOTAL is the number of votes counted.
The solution I made creates a stream from the topic events:
CREATE STREAM st_events
(CATEGORY STRING, ACTION STRING, LABEL STRING, VALUE BIGINT, IP STRING)
WITH (KAFKA_TOPIC='events', VALUE_FORMAT='JSON');
Then, I create a table tb_votes which calculates the score and total for each label and IP address:
CREATE TABLE tb_votes WITH (KAFKA_TOPIC='tb_votes', PARTITIONS=1, REPLICAS=1) AS SELECT
st_events.LABEL "label", SUM(st_events.VALUE-1) "score", CAST(COUNT(*) AS BIGINT) "total"
FROM st_events
WHERE
st_events.category='image' AND st_events.action='vote'
GROUP BY st_events.label, st_events.ip
EMIT CHANGES;
The problem is that instead of dropping all the previous votes coming from the same ip address for the same image, Kafka uses all of them. This makes sense as it is a GROUP BY.
Any idea how to "drop" all previous votes and only use the latest values for an image/ IP?
You need a two stage aggregation.
The first stage should build a table with a primary key containing both the ip and label and another column holding the value.
Build a second table from this first table to get the count and sum per-label that you need.
If another vote comes in from the same ip for the same label then the first table will be updated with the new value and the second table will be correctly updated. It will first remove the old value from the count and sum and then apply the new value.
ksqlDB does not yet support multiple primary key columns (though its coming VERY soon!). So when you group by two columns it just does a funky string concatenation. But we can work with that for now.
CREATE TABLE BY_IP_AND_LABEL AS
SELECT
label + '-' + ip AS ipAndLabel,
value
FROM st_events
GROUP BY ip + '#' + label;
CREATE TABLE BY_LABEL AS
SELECT
SUBSTRING(labelAndIp, INSTR(labelAndIp, '#')) AS label,
SUM(VALUE-1) AS score,
COUNT(*) AS total
FROM BY_IP_AND_LABEL
GROUP BY SUBSTRING(ipAndLabel, INSTR(ipAndLabel, '#'));
The first table creates a composite key with and # as the separator. The second table uses INSTR and SUBSTRING to find the separator and extract the label.
Note: I've not tested this - I could have some 'off-by-one' errors in the logic.
This should do what you need.

SSRS Calculated Conditional Field

I have a stored procedure that assigns a "totalstype_number" based on the field (1=AGS, 2=invoice count, etc). When I want to display a given totals type, I group by the totalstype, and filter it to totalstype ="1" (or whatever number I am trying to populate). What I need to do is create a field that takes totaltype = 1 and divide it by totalstype = 2. I am not really sure how to go about this, I have tried doing it at the tablix level, but I am not sure how to set my row group to group by. Column groups are by channel (# 1-5) and then 2 adjacent child groups underneath that are value1 (year1) and value2 (year2).Thanks!
You do not need to work with the SSRS, modify your query in your SP instead, add group by totalstype and count(*) as total_Number_this_salesType in your original SP, the easiest way is to have two more parameters to store the total number for each sales_type. Which would be like:
#parameter1 = (query to get total number of sales type 1)
#parameter2 = (query to get total number of sales type 2)
then populate #parameter1/#parameter2

How to get the number of entries in a measurement

I am a newbie to influxdb. I just started to read the influx documentation.
I cant seem to get the equivalent of 'select count(*) from table' to work in influx db.
I have a measurement called cart:
time status cartid
1456116106077429261 0 A
1456116106090573178 0 B
1456116106095765618 0 C
1456116106101532429 0 D
but when I try to do
select count(cartid) from cart
I get the error
ERR: statement must have at least one field in select clause
I suppose cartId is a tag rather than a field value? count() currently can't be used on tag and time columns. So if your status is a non-tag column (a field), do the count on that.
EDIT:
Reference
This works as long as no field or tag exists with the name count:
SELECT SUM(count) FROM (SELECT *,count::INTEGER FROM MyMeasurement GROUP BY count FILL(1))
If it does use some other name for the count field. This works by first selecting all entries including an unpopulated field (count) then groups by the unpopulated field which does nothing but allows us to use the fill operator to assign 1 to each entry for count. Then we select the sum of the count fields in a super query. The result should look like this:
name: MyMeasurement
----------------
time sum
0 47799
It's a bit hacky but it's the only way to guarantee a count of all entries when no field exists that is always present in all entries.

How to get row Count of the sqlite3_stmt *statement? [duplicate]

I want to get the number of selected rows as well as the selected data. At the present I have to use two sql statements:
one is
select * from XXX where XXX;
the other is
select count(*) from XXX where XXX;
Can it be realised with a single sql string?
I've checked the source code of sqlite3, and I found the function of sqlite3_changes(). But the function is only useful when the database is changed (after insert, delete or update).
Can anyone help me with this problem? Thank you very much!
SQL can't mix single-row (counting) and multi-row results (selecting data from your tables). This is a common problem with returning huge amounts of data. Here are some tips how to handle this:
Read the first N rows and tell the user "more than N rows available". Not very precise but often good enough. If you keep the cursor open, you can fetch more data when the user hits the bottom of the view (Google Reader does this)
Instead of selecting the data directly, first copy it into a temporary table. The INSERT statement will return the number of rows copied. Later, you can use the data in the temporary table to display the data. You can add a "row number" to this temporary table to make paging more simple.
Fetch the data in a background thread. This allows the user to use your application while the data grid or table fills with more data.
try this way
select (select count() from XXX) as count, *
from XXX;
select (select COUNT(0)
from xxx t1
where t1.b <= t2.b
) as 'Row Number', b from xxx t2 ORDER BY b;
just try this.
You could combine them into a single statement:
select count(*), * from XXX where XXX
or
select count(*) as MYCOUNT, * from XXX where XXX
To get the number of unique titles, you need to pass the DISTINCT clause to the COUNT function as the following statement:
SELECT
COUNT(DISTINCT column_name)
FROM
'table_name';
Source: http://www.sqlitetutorial.net/sqlite-count-function/
For those who are still looking for another method, the more elegant one I found to get the total of row was to use a CTE.
this ensure that the count is only calculated once :
WITH cnt(total) as (SELECT COUNT(*) from xxx) select * from xxx,cnt
the only drawback is if a WHERE clause is needed, it should be applied in both main query and CTE query.
In the first comment, Alttag said that there is no issue to run 2 queries. I don't agree with that unless both are part of a unique transaction. If not, the source table can be altered between the 2 queries by any INSERT or DELETE from another thread/process. In such case, the count value might be wrong.
Once you already have the select * from XXX results, you can just find the array length in your program right?
If you use sqlite3_get_table instead of prepare/step/finalize you will get all the results at once in an array ("result table"), including the numbers and names of columns, and the number of rows. Then you should free the result with sqlite3_free_table
int rows_count = 0;
while (sqlite3_step(stmt) == SQLITE_ROW)
{
rows_count++;
}
// The rows_count is available for use
sqlite3_reset(stmt); // reset the stmt for use it again
while (sqlite3_step(stmt) == SQLITE_ROW)
{
// your code in the query result
}

Resources