Query a materialized view with group by and latest_by_offset returns same key twice

Query a materialized view with group by and latest_by_offset returns same key twice - ksqldb

I am following this link to query a materialized view and is expecting a group by to return only one row per key but it is not (sensor-1 appears twice in below query):
ksql> SELECT sensor,
> LATEST_BY_OFFSET(area) AS area,
> LATEST_BY_OFFSET(reading) AS last
> FROM readings
> GROUP BY sensor
> EMIT CHANGES;
+------------------------------------------------+------------------------------------------------+------------------------------------------------+
|SENSOR |AREA |LAST |
+------------------------------------------------+------------------------------------------------+------------------------------------------------+
|sensor-1 |wheel |45 |
|sensor-2 |motor |41 |
|sensor-1 |wheel |92 |
and same result with the view materialized:
CREATE TABLE latest_readings AS
SELECT sensor,
LATEST_BY_OFFSET(area) AS area,
LATEST_BY_OFFSET(reading) AS last
FROM readings
GROUP BY sensor
EMIT CHANGES;
This seems to be different from robin-moffatt answer as in Is it possible to get the latest value for a message key from kafka messages
Did I miss something?

It is my understanding that the emit changes will push out updates as they occur, therefore, when there is an update to the set for a given key a change will be emitted for that key.

Related

call data from sql in processmaker

i have a table in sql which is like this:
| product code | weight|
| ----------------------|-----------|
| 1235896 | 0.5|
| 3256kms | 1.5|
| dk-kmsw | 3 |
and the data type for [product code] is nvarchar
now i want to call the weight by putting the product code in processmaker
the code that i wrote is this:
select [weight] from table where [product code] = ##textVar047
and by this code i get nothing, i have changed the ## to ##, #= but it did not work.
how can i do this?
any comment is appreciated.

When you use ## in the SQL of a control, it means you are referencing another control's value. If that's your scenario I'd suggest first to retrieve the full list of product codes in a Suggest control (instead of a Textbox) with this SQL:
select PRODUCT_CODE, PRODUCT_CODE FROM YOUR_TABLE_NAME
(you call product code twice since Suggest controls, like dropdowns, need 2 values to be filled up, one for the id and one for the label)
Now that you have a way to obtain the actual code and it's saved in the suggest control id, you can make another field a dependent field with the SQL you where proposing:
select WEIGHT FROM YOUR_TABLE_NAME where PRODUCT_CODE = ##your_suggest_control_id
(## should work just fine as it just adds quotes to the variable)
You can also check the wiki page to get an in depth explanation of this. https://wiki.processmaker.com/3.1/Dependent_Fields
I hope this helps!

select CAST(weight AS nvarchar(max)) from table where [product code] = ##textVar047

Esper - concatenate values from multiple rows to a list

I have an Esper query that returns multiple rows, but I'd like to instead get one row, where that row has a list (or concatenated string) of all of the values from the (corresponding columns of the) matching rows that my current query returns.
For example:
SELECT Name, avg(latency) as avgLatency
FROM MyStream.win:time(5 min)
GROUP BY Name
HAVING avgLatency / 1000 > 60
OUTPUT last every 5 min
Returns:
Name avgLatency
---- ----------
A 65
B 70
C 75
What I'd really like:
Name
----
{A, B, C}
Is this possible to do via the query itself? I tried to make this work using subqueries, but I'm not working with multiple streams. I can't find any aggregation functions or enumeration functions in the Esper documentation that fits what I'm trying to do either.
Thanks to anybody that has any insight or direction for me here.
EDIT:
If this can't be done via the query, I'm open to changing the subscriber, or anything else, if necessary.

You can have a subscriber or listener do the concat. There is a "Multi-Row Delivery" for subscribers. Or use a table like below.
// create table to hold aggregation result
create table LatencyTable(name string primary key, avgLatency avg(double));
// update aggregations in table from events coming in
into LatencyTable select name, avg(latency) as avgLatency from MyStream#time(5 min) group by name;
// do a select with the "aggregate" enumeration method
select (select * from LatencyTable where avgLatency > x).aggregate(....) from pattern[every timer:interval(5 min)]

Query the most recent timestamp (MAX/Last) for a specific key, in Influx

Using InfluxDB (v1.1), I have the requirement where I want to get the last entry timestamp for a specific key. Regardless of which measurement this is stored and regardless of which value this was.
The setup is simple, where I have three measurements: location, network and usage.
There is only one key: device_id.
In pseudo-code, this would be something like:
# notice the lack of a FROM clause on measurement here...
SELECT MAX(time) WHERE 'device_id' = 'x';
The question: What would be the most efficient way of querying this?
The reason why I want this is that there will be a decentralised sync process. Some devices may have been updated in the last hour, whilst others haven't been updated in months. Being able to get a distinct "last updated on" timestamp for a device (key) would allow me to more efficiently store new points to Influx.
I've also noticed there is a similar discussion on InfluxDB's GitHub repo (#5793), but the question there is not filtering by any field/key. And this is exactly what I want: getting the 'last' entry for a specific key.

Unfortunately there wont be single query that will get you what you're looking for. You'll have to do a bit of work client side.
The query that you'll want is
SELECT last(<field name>), time FROM <measurement> WHERE device_id = 'x'
You'll need to run this query for each measurement.
SELECT last(<field name>), time FROM location WHERE device_id = 'x'
SELECT last(<field name>), time FROM network WHERE device_id = 'x'
SELECT last(<field name>), time FROM usage WHERE device_id = 'x'
From there you'll get the one with the greatest time stamp
> select last(value), time from location where device_id = 'x'; select last(value), time from network where device_id = 'x'; select last(value), time from usage where device_id = 'x';
name: location
time last
---- ----
1483640697584904775 3
name: network
time last
---- ----
1483640714335794796 4
name: usage
time last
---- ----
1483640783941353064 4

tl;dr;
The first() and last() selectors will NOT work consistently if the measurement have multiple fields, and fields have NULL values. The most efficient solution is to use these queries
First:
SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1
Last:
SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1
Explanation:
If you have a single field in your measurement, then the suggested solutions will work, but if you have more than one field and values can be NULL then first() and last() selectors won't work consistently and may return different timestamps for each field. For example, let's say that you have the following data set:
time fieldKey_1 fieldKey_2 device
------------------------------------------------------------
2019-09-16T00:00:01Z NULL A 1
2019-09-16T00:00:02Z X B 1
2019-09-16T00:00:03Z Y C 2
2019-09-16T00:00:04Z Z NULL 2
In this case querying
SELECT first(fieldKey_1) FROM <measurement> WHERE device = "1"
will return
time fieldKey_1
---------------------------------
2019-09-16T00:00:02Z X
and the same query for first(fieldKey_2) will return a different time
time fieldKey_2
---------------------------------
2019-09-16T00:00:01Z A
A similar problem will happen when querying with last.
And in case you are wondering, it wouldn't do querying 'first(*)' since you'll get an 'epoch-0' time in the results, such as:
time first_fieldKey_1 first_fieldKey_2
-------------------------------------------------------------
1970-01-01T00:00:00Z X A
So, the solution would be querying using combinations of LIMIT and ORDER BY.
For instance, for the first time value you can use:
SELECT * FROM <measurement> [WHERE <tag>=value] LIMIT 1
and for the last one you can use
SELECT * FROM <measurement> [WHERE <tag>=value] ORDER BY time DESC LIMIT 1
It is safe and fast as it will relay on indexes.
Is curious to mention that this more simple approach was mentioned in the thread linked in the opening post, but was discarded. Maybe it was just lost overlooked.
Here there's a thread in InfluxData blogs about the subject also suggesting to use this approach.

I tried this and it worked for me in a single command :
SELECT last(<field name>), time FROM location, network, usage WHERE device_id = 'x'
The result I got :
name: location
time last
---- ----
1483640697584904775 3
name: network
time last
---- ----
1483640714335794796 4
name: usage
time last
---- ----
1483640783941353064 4

Does update change the order of records in a table in PostgreSQL?

My code depends on the order of records in the table. My assumption was that a table can be considered a list so that the records maintain order. I have a small update code as shown below that will update a record at a particular index in the table.
p = pieces[index]
p.position = 0
p.save
I check the order of records before this update and after this update then i see that after the update the record that is updated is moved to the last of the list. I print Piece.all to print the list. The order is maintained in mysql but when i deploy it to heroku which uses postgre the order was not maintained so this was a surprising find for me.
Is there no guarantee of order in tables and one should not depend on the order? Please correct my misunderstanding and thanks for the clarification.

You should NEVER depend on the order in my honest opinion.
Rows are returned in an unspecified order, per sql specs, unless you add an order by clause. In Postgres, that means you'll get rows in, basically, the order that live rows read on the disk.
MySQL tends to return rows in the order they're inserted, and this is why you see the different in behavior.
If you want them to always be returned in the order they were created, you can use Item.order("created_at")

You state:
My assumption was that a table can be considered a list so that the
records maintain order.
This is incorrect. A table represents an unordered set. There is no inherent ordering in the table. A result set similarly lacks ordering. The only way to guarantee the ordering of a result set is to use ORDER BY in the query.
So, an update changes values in one or more columns in one or more rows. It does not change the "ordering" of rows, because they are not ordered.
Note: Under some circumstances, a query may appear to return results in a particular order. You really should not depend on this behavior, unless the query has an explicit ORDER BY.

Tables normally are unordered, and should be presumed to be unordered unless they have a CLUSTER(ed) index. That's an important piece of information because understanding clustered indexes is somewhat useful. That said, what you receive back from a query, the resultset, should be presumed to be unordered because the join-order is always undefined.
So if order matters always be explicit and use ORDER BY. Now for illustration let's have some fun.
CREATE TABLE bar ( qux serial PRIMARY KEY, asdf text );
INSERT INTO bar (asdf) ( VALUES ('z'),('x'),('g'),('a') );
Now we've got this,
SELECT * FROM BAR;
qux | asdf
-----+------
1 | z
2 | x
3 | g
4 | a
Now we create a CLUSTERed index,
CREATE INDEX asdfidx ON bar (asdf);
CLUSTER bar USING asdfidx;
Now the order is guaranteed,
SELECT * FROM bar;
qux | asdf
-----+------
4 | a
3 | g
2 | x
1 | z

How to show same column in dbgrid with different criteria

i need your help to finish my delphi homework.
I use ms access database and show all data in 1 dbgrid using sql. I want to show same column but with criteria (50 record per column)
i want select query to produce output like:
No | Name | No | Name |
1 | A | 51 | AA |
2 | B | 52 | BB |
3~50 | | 53~100| |
Is it possible ?

I can foresee issues if you choose to return a dataset with duplicate column names. To fix this, you must change your query to enforce strictly unique column names, using as. For example...
select A.No as No, A.Name as Name, B.No as No2, B.Name as Name2 from TableA A
join TableB B on B.Something = A.Something
Just as a note, if you're using a TDBGrid, you can customize the column titles. Right-click on the grid control in design-time and select Columns Editor... and a Collection window will appear. When adding a column, link it to a FieldName and then assign a value to Title.Caption. This will also require that you set up all columns. When you don't define any columns here, it automatically returns all columns in the query.
On the other hand, a SQL query may contain duplicate field names in the output, depending on how you structure the query. I know this is possible in SQL Server, but I'm not sure about MS Access. In any case, I recommend always returning a dataset with unique column names and then customizing the DB Grid's column titles. After all, it is also possible to connect to an excel spreadsheet, which can very likely have identical column names. The problem arrives when you try to read from one of those columns for another use.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Query a materialized view with group by and latest_by_offset returns same key twice - ksqldb

It is my understanding that the emit changes will push out updates as they occur, therefore, when there is an update to the set for a given key a change will be emitted for that key.

Related

call data from sql in processmaker

Esper - concatenate values from multiple rows to a list

Query the most recent timestamp (MAX/Last) for a specific key, in Influx

Does update change the order of records in a table in PostgreSQL?

How to show same column in dbgrid with different criteria

Categories

Resources