I'm using a QUERY function in Google Sheets. I have a named data range ("Contributions" in table on another sheet) that consists of many columns, but I'm only concerned with two of them. For simplicity sake, it looks something like this:
I have another table that contains the unique set of names (e.g.: "Fred", "Ginger", etc. each only once) and I want to extract the level # (column B) from the above table to insert the most recent (largest number) in this second table.
Right now, my query looks like this:
=QUERY(Contributions, "select B,C where C='"&A5&"' order by B desc limit 1",1)
The problem is, that it outputs both B & C data - e.g.:
11 Fred
But since I already have the name (in column A of this other table) I only want it to output the value from B - e.g.:
11
Is there a way to output only a subset (in this case 1 of 2) of the columns of output based on a directive within the query itself (as opposed to doing post-processing of the results)?
Outputting a Subset of Columns Used in Query
In order to output only certain columns of a query result, the query only needs to select the columns to be displayed while the constraints / conditions may utilize other columns of data.
For example (as an answer to my own question) - I have a table like this:
I needed to get the data from the row with a name matching another cell (on another sheet) and with the latest (largest) number - but I only want to output the number part.
My initial attempt was:
=QUERY(Contributions, "select B,C where C='"&A5&"' order by B desc limit 1",1)
But that output both B & C where I only wanted B. The answer (thanks to # Calculuswhiz) was to continue using C for the condition but only select on B:
=QUERY(Contributions, "select B where C='"&A5&"' order by B desc limit 1",1)
I have an Esper query that returns multiple rows, but I'd like to instead get one row, where that row has a list (or concatenated string) of all of the values from the (corresponding columns of the) matching rows that my current query returns.
For example:
SELECT Name, avg(latency) as avgLatency
FROM MyStream.win:time(5 min)
GROUP BY Name
HAVING avgLatency / 1000 > 60
OUTPUT last every 5 min
Returns:
Name avgLatency
---- ----------
A 65
B 70
C 75
What I'd really like:
Name
----
{A, B, C}
Is this possible to do via the query itself? I tried to make this work using subqueries, but I'm not working with multiple streams. I can't find any aggregation functions or enumeration functions in the Esper documentation that fits what I'm trying to do either.
Thanks to anybody that has any insight or direction for me here.
EDIT:
If this can't be done via the query, I'm open to changing the subscriber, or anything else, if necessary.
You can have a subscriber or listener do the concat. There is a "Multi-Row Delivery" for subscribers. Or use a table like below.
// create table to hold aggregation result
create table LatencyTable(name string primary key, avgLatency avg(double));
// update aggregations in table from events coming in
into LatencyTable select name, avg(latency) as avgLatency from MyStream#time(5 min) group by name;
// do a select with the "aggregate" enumeration method
select (select * from LatencyTable where avgLatency > x).aggregate(....) from pattern[every timer:interval(5 min)]
I have data in a table (named: TESTING) on a dashDB2 on IBM bluemix (Db2 Warehouse on Cloud) which is looking like this:
ID TIMESTAMP NAME VALUE
abc 2017-12-21 19:55:38.762 test1 123
abc 2017-12-21 19:55:42.762 test2 456
abc 2017-12-21 19:57:38.762 test1 789
abc 2017-12-21 19:58:38.762 test3 345
def 2017-12-21 19:59:38.762 test1 678
I am looking for a query that:
samples the data (for each NAME) to a given timeformat (ex. to a 1 minute based timestamp)
VALUES in same timerange (in same minute) should be averaged, empty times should be NULL
for 1. and 2. something like (only for one NAME working):
with dummy(temporaer) as (
select TIMESTAMP('2017-12-01') from SYSIBM.SYSDUMMY1
union all
select temporaer + 1 MINUTES from dummy where temporaer < TIMESTAMP('2018-02-01')
)
select temporaer, avg(VALUE) as test1 from dummy
LEFT OUTER JOIN TESTING ON temporaer=date_trunc('minute', TIMESTAMP) and ID='abc' and NAME='test1'
group by temporaer
ORDER BY temporaer ASC;
join all different NAMES column-wise to a matrix, like:
TIMESTAMP test1 test2 test3
2017-12-01 00:00:00 null null null
...
2017-12-21 19:55:00 123 456 null
2017-12-21 19:56:00 null null null
2017-12-21 19:57:00 789 null null
2017-12-21 19:58:00 678 null 345
...
2018-01-31 23:59:00 null null null
the query result should be exportet as a csv. or given back as csv-string
Does anybody know how this could be done in one query or in a simple and fast way? Or is it necessary to save the data in another tabe-format - can you give me a hint?
here is a code snipped that does the job, but needs very long time:
WITH
-- get all distinct names in table:
header(names) AS (SELECT DiSTINCT name
FROM FIELDTEST
WHERE ID='7b9bbe44d45d8f2ac324849a4951da54' AND REGEXP_LIKE(trim(VALUE),'^\d+(\.\d*)?$') AND DATE(TIMESTAMP)>='2017-12-19' AND DATE(TIMESTAMP)<'2017-12-24'),
-- select data (names, values without stringvalues) from table dedicated by timestamp to bigger timeinterval (here minutes):
dummie(time, names, values) AS (SELECT date_trunc('minute', TIMESTAMP), NAME, VALUE
FROM FIELDTEST
WHERE ID='7b9bbe44d45d8f2ac324849a4951da54' AND REGEXP_LIKE(trim(VALUE),'^\d+(\.\d*)?$')),
-- generate a range of times from date to date in defined steps:
dummy(time, rangeEnd) AS (SELECT a, a + 1 MINUTE
FROM (VALUES(TIMESTAMP('2017-12-19'))) D(a)
UNION ALL
SELECT rangeEnd, rangeEnd + 1 MINUTE
FROM dummy
WHERE rangeEnd < TIMESTAMP('2017-12-24')),
-- add each name (from header) to each time/row (in dummy):
dumpy(time, names) AS (SELECT Dummy.time, Header.names
FROM Dummy
LEFT OUTER JOIN Header
ON Dummy.time IS NOT NULL),
-- averages values by name and timeinterval and sorts result to dummy:
dummj(time, names, avgvalues) AS (SELECT Dummy.time, Dummie.names, AVG(Dummie.values)
FROM Dummy
LEFT OUTER JOIN Dummie
ON Dummie.time = Dummy.time
GROUP BY Dummie.names, Dummy.time),
-- joins the averages (by time, name) values to the times and names in dumpy (on empty value use -9999):
testo(time, names, avgvalues) AS (SELECT Dumpy.time, Dumpy.names, COALESCE(Dummj.avgvalues,-9999)
FROM Dumpy
LEFT OUTER JOIN Dummj
ON Dummj.time = Dumpy.time AND Dummj.names = Dumpy.names),
-- converts the high amount of rows to less rows with delimited strings:
test(time, names, avgvalues) AS (SELECT time, LISTAGG(names,';') WITHIN GROUP(ORDER BY names), LISTAGG(avgvalues,';') WITHIN GROUP(ORDER BY names)
FROM Testo
GROUP BY time)
SELECT* FROM test ORDER BY time ASC, names ASC;
The performance problem is in the "testo" subquery. Does anybody have an idea what is the failure here or know how to improve the query?
Well, one problem I see is that you keep using functions on columns, but that shouldn't be too big a drain if id is reasonably unique. If this query is very common, it may also be worth it to permanently build and index the range table. Hmm, you probably need several indices (starting with FieldTest.id), but you might also try this version:
-- let's name things properly, too, to keep them straight.
WITH
-- generate a range of times from date to date in defined steps:
Range (rangeStart, rangeEnd) AS (SELECT a, a + 1 MINUTE
FROM (VALUES(TIMESTAMP('2017-12-19'))) D(a)
UNION ALL
SELECT rangeEnd, rangeEnd + 1 MINUTE
FROM Range
WHERE rangeEnd < TIMESTAMP('2017-12-24')),
-- get all distinct names in table:
Header(names) AS (SELECT DISTINCT name
FROM FieldTest
WHERE ID = '7b9bbe44d45d8f2ac324849a4951da54'
-- just make the white space check part of the regex
AND REGEXP_LIKE(VALUE, '^\s*\d+(\.\d*)?\s*$')
AND timestamp >= TIMESTAMP('2017-12-19')
AND timestamp < TIMESTAMP('2017-12-24')),
-- I'm assuming the (id, name) tuple is unique, which means we don't need to repeat the regex later
Data (rangeStart, name, averaged) AS (SELECT Range.rangeStart, Header.names, COALESCE(AVG(FieldTest.value), -9999)
FROM Range
CROSS JOIN Header
LEFT JOIN FieldTest
ON FieldTest.id = '7b9bbe44d45d8f2ac324849a4951da54'
AND FieldTest.names = Header.names
AND FieldTest.timestamp >= Range.rangeStart
AND FieldTest.timestamp < Range.rangeEnd
GROUP BY Range.rangeStart, Header.names),
-- I can't recall if DB2 allows using the new column name this way, you may need to wrap this again
SELECT rangeStart,
-- converts the high amount of rows to less rows with delimited strings:
LISTAGG(names,';') WITHIN GROUP(ORDER BY names) AS names,
LISTAGG(avgvalues,';') WITHIN GROUP(ORDER BY names)
GROUP BY rangeStart
ORDER BY rangeStart, names
(not tested)
the CROSS JOIN was defenitly a nice hint. Also I was not able to implement the following LEFT JOIN like you suggested, I found a workaround, which - I am sure - still keeps room for improvement but at this moment is acceptable for me (timesaving about factor 30 compared to my first query solution). Here the actual code:
WITH
-- generate a range of times from date to date in defined steps:
TimeRange(rangeStart, rangeEnd) AS (SELECT a, a + 1 MINUTE
FROM (VALUES(TIMESTAMP('2017-12-19'))) D(a)
UNION ALL
SELECT rangeEnd, rangeEnd + 1 MINUTE
FROM TimeRange
WHERE rangeEnd < TIMESTAMP('2017-12-24')),
-- get all distinct names in table:
Header(names) AS (SELECT DISTINCT name
FROM FIELDTEST
WHERE ID = '7b9bbe44d45d8f2ac324849a4951da54'
AND REGEXP_LIKE(VALUE, '^\s*\d+(\.\d*)?\s*$')
AND timestamp >= TIMESTAMP('2017-12-19')
AND timestamp < TIMESTAMP('2017-12-24')),
-- select data (names, values without stringvalues) from table dedicated by timestamp to bigger timeinterval (here minutes):
rawData(time, names, values) AS (SELECT date_trunc('minute', TIMESTAMP), NAME, VALUE
FROM FIELDTEST
WHERE ID = '7b9bbe44d45d8f2ac324849a4951da54'
AND REGEXP_LIKE(VALUE, '^\s*\d+(\.\d*)?\s*$')),
-- I'm assuming the (id, name) tuple is unique, which means we don't need to repeat the regex later
Data(rangeStart, name, averaged) AS (SELECT TimeRange.rangeStart, Header.names, COALESCE(AVG(rawData.values), -9999)
FROM TimeRange
CROSS JOIN Header
LEFT JOIN rawData
ON rawData.names = Header.names
AND rawData.time = TimeRange.rangeStart
GROUP BY TimeRange.rangeStart, Header.names),
test(time, names, avgvalues) AS (SELECT Data.rangeStart,
LISTAGG(Data.name,';') WITHIN GROUP(ORDER BY name),
LISTAGG(Data.averaged,';') WITHIN GROUP(ORDER BY name)
FROM Data
GROUP BY Data.rangeStart)
-- build my own delimited export-string:
SELECT CONCAT(CONCAT(SUBSTR(REPLACE(time,'.',':'),1,19),';'), REPLACE(CAST(avgvalues AS VARCHAR(3980)),'-9999',''))
FROM test
UNION ALL
SELECT CONCAT(CAST('TIME;' AS VARCHAR(5)), CAST(LISTAGG(names,';') WITHIN GROUP(ORDER BY names) AS VARCHAR(3980)))
FROM Header;
I am a newbie to influxdb. I just started to read the influx documentation.
I cant seem to get the equivalent of 'select count(*) from table' to work in influx db.
I have a measurement called cart:
time status cartid
1456116106077429261 0 A
1456116106090573178 0 B
1456116106095765618 0 C
1456116106101532429 0 D
but when I try to do
select count(cartid) from cart
I get the error
ERR: statement must have at least one field in select clause
I suppose cartId is a tag rather than a field value? count() currently can't be used on tag and time columns. So if your status is a non-tag column (a field), do the count on that.
EDIT:
Reference
This works as long as no field or tag exists with the name count:
SELECT SUM(count) FROM (SELECT *,count::INTEGER FROM MyMeasurement GROUP BY count FILL(1))
If it does use some other name for the count field. This works by first selecting all entries including an unpopulated field (count) then groups by the unpopulated field which does nothing but allows us to use the fill operator to assign 1 to each entry for count. Then we select the sum of the count fields in a super query. The result should look like this:
name: MyMeasurement
----------------
time sum
0 47799
It's a bit hacky but it's the only way to guarantee a count of all entries when no field exists that is always present in all entries.
I have a table that gets populated every day with records from reporting systems.
I have a list of the serial numbers those i am interested in returning in an asset list.
How do I get Grails to return the records that match the maximum "epoch" entry for each asset? In sql I would cross join the table back to itself after picking out the maximum such as:
select a.* from assetTable a inner join (select sn, max(epoch) epoch from assetTable group by sn) b on a.sn = b.sn and a.epoch = b.epoch
but I cannot figure out how to get this done efficiently with Grails...
From a domain class perspective it is pretty simple. Consider for the same of example that I have a single domain class "AssetTable" and it has Integer epoch, String sn, ...
Literally, all I want to do is get the latest entry (all fields) for a subset of serial numbers (sn) that I have in a List.