KSQLDB: Group By Concate Equivalent - ksqldb

I have a stream such as the following:
ksql> select * from customerstream;
+-------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------+
|EVENT |CONTENT |
+-------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------+
|create |{name=bob, location=NY, id=1} |
|update |{location=AM} |
|update |{location=BER} |
|update |{name=bob_new} |
|delete |{id=1} |
Now I would like to group the events by the id and ignore customers, that have already been deleted.
I look for something like this:
select content['id'] from customer group by content['id'] HAVING 'delete' not in collect_set(event);

I found it!
select
content['id'],latest_by_offset(content['location']),collect_set(event)
from customerstream group by content['id'] HAVING NOT
ARRAY_CONTAINS(collect_set(event),'delete') emit changes;

Related

LibreOffice HSQLDB WHERE clause with LEFT JOIN and MAX?

I'm running macOS 11.6,LibreOffice 7.2.2.2,HSQLDB (my understanding is this is v.1.8, but don't know how to verify)
I'm a newbie to SQL, and I'm trying to write a DB to maintain a club membership roster. I'm trying to find everyone in the DB to whom renewal letters should be sent. The quirk is, if a person has never paid in the past, they should be sent a renewal letter. Old members who haven't renewed recently don't get a renewal, and obviously, each individual should only get one letter. I've created a toy example to display the problem I'm having...
Members table:
Key (Integer, Primary key, Autoincrement)
Name (Varchar)
+-----+----------+
| Key | Name |
+-----+----------+
| 0 | Abby |
| 1 | Bob |
| 2 | Dave |
| 3 | Ellen |
+-----+----------+
Payments table:
Key (Integer, Primary Key, autoincrement)
MemberKey (Integer, foreign key to Member table)
Payment Date (Date)
+-----+-----------+--------------+
| Key | MemberKey | Payment Date |
+-----+-----------+--------------+
| 0 | 0 | 2020-05-23 |
| 1 | 0 | 2021-06-12 |
| 2 | 1 | 2016-05-28 |
| 3 | 2 | 2020-07-02 |
+-----+-----------+--------------+
The only way I've found to include everyone is with a LEFT JOIN. The only way I've found to pick the most recent payment is with MAX. The following query produces a list of everyone's most recent payments, including people who've never paid:
SELECT "Members"."Key", "Members"."Name", MAX( "Payments"."Payment Date" ) AS "Last Payment"
FROM { oj "Members" LEFT OUTER JOIN "Payments" ON "Members"."Key" = "Payments"."MemberKey" }
GROUP BY "Members"."Key", "Members"."Name"
It returns the result below, which includes all members only once (Abby has 2 payments but only appears once with the most recent payment). Unfortunately it still includes people like Bob who've been out of the club so long that we don't want to send them a renewal notice.
+-----+----------+--------------+
| Key | Name | Last Payment |
+-----+----------+--------------+
| 0 | Abby | 2021-06-12 |
| 1 | Bob | 2016-05-28 |
| 2 | Dave | 2020-07-02 |
| 3 | Ellen | |
+-----+----------+--------------+
Where I hit a wall is when I try to perform any kind of conditional operation on the Last Payment, to determine whether it's recent enough to include in the list of renewal notices. For instance, in HSQLDB, the query below returns the error, "The data content could not be loaded. Not a condition." The only change in this query from the 1st one is the addition of the WHERE clause.
SELECT "Members"."Key", "Members"."Name", MAX( "Payments"."Payment Date" ) AS "Last Payment"
FROM { oj "Members" LEFT OUTER JOIN "Payments" ON "Members"."Key" = "Payments"."MemberKey" }
WHERE "Last Payment" >= '2020-01-01'
GROUP BY "Members"."Key", "Members"."Name"
The desired output should look like this:
+-----+----------+--------------+
| Key | Name | Last Payment |
+-----+----------+--------------+
| 0 | Abby | 2021-06-12 |
| 2 | Dave | 2020-07-02 |
| 3 | Ellen | |
+-----+----------+--------------+
I've been digging around the web trying anything that looks relevant. I've tried "HAVING" clauses--I can make them work with a COUNT(*) function, but I can't make them work with a MAX(*) function. I've tried using my 1st query as a subquery, and applying the WHERE clause on "Last Payment" in the main query. I've tried solutions people say work in MySQL, but I can't get them to work in HSQLDB. I tried using the 1st query as a View, and writing a query against the View. I've tried a dozen other things I don't even remember. Everything past the 1st query above throws an error. I wanted to include my toy DB, but can't find a way to attach it to the post.
Can anyone help please?
This worked for me.
SELECT "Members"."Key", "Members"."Name", MAX( "Payments"."Payment Date" ) AS "Last Payment"
FROM {oj "Members" LEFT OUTER JOIN "Payments" ON "Members"."Key" = "Payments"."MemberKey"
WHERE "Payments"."Payment Date" >= '2020-01-01'
OR "Payments"."Payment Date" IS NULL}
GROUP BY "Members"."Key", "Members"."Name"
Result:
This works as well.
SELECT "Members"."Key", "Members"."Name", MAX( "Payments"."Payment Date" ) AS "Last Payment"
FROM { oj "Members" LEFT OUTER JOIN "Payments" ON "Members"."Key" = "Payments"."MemberKey" }
WHERE "Payments"."Payment Date" >= '2020-01-01'
OR "Payments"."Payment Date" IS NULL
GROUP BY "Members"."Key", "Members"."Name"
Perhaps the problem you were having is that "Last Payment" is only a column title and not the actual name of any column.

Query only records with max value within a group

Say you have the following users table on PostgreSQL:
id | group_id | name | age
---|----------|---------|----
1 | 1 | adam | 10
2 | 1 | ben | 11
3 | 1 | charlie | 12 <-
3 | 2 | donnie | 20
4 | 2 | ewan | 21 <-
5 | 3 | fred | 30 <-
How can I query all columns only from the oldest user per group_id (those marked with an arrow)?
I've tried with group by, but keep hitting "users.id" must appear in the GROUP BY clause.
(Note: I have to work the query into a Rails AR model scope.)
After some digging, you can do use PostgreSQL's DISTINCT ON (col):
select distinct on (users.group_id) users.*
from users
order by users.group_id, users.age desc;
-- you might want to add extra column in ordering in case 2 users have the same age for same group_id
Translated in Rails, it would be:
User
.select('DISTINCT ON (users.group_id), users.*')
.order('users.group_id, users.age DESC')
Some doc about DISTINCT ON: https://www.postgresql.org/docs/9.3/sql-select.html#SQL-DISTINCT
Working example: https://www.db-fiddle.com/f/t4jeW4Sy91oxEfjMKYJpB1/0
You could use ROW_NUMBER/RANK(if ties are possible) windowed functions:
SELECT *
FROM (SELECT *,ROW_NUMBER() OVER(PARTITION BY group_id ORDER BY age DESC) AS rn
FROM tab) s
WHERE s.rn = 1;
you can use a subquery wuth aggreagated resul in join
select m.*
from users m
inner join (
select group_id, max(age) max_age
from users
group by group_id
) AS t on (t.group_id = m.group_id and t.max_age = m.age)

look up table names in PSQL

My DB has a lot of tables (Say 400+), and I only remember part of the name of the one I am looking for.
I know \d would show all the tables, but that's too much to look at. Is there some command to list all the tables whose names match the given regex?
Thanks
It's built in to psql, you can use wildcards in \d, \dt, etc, eg:
craig=> \dt test*
List of relations
Schema | Name | Type | Owner
--------+-----------+-------+-------
public | test | table | craig
public | testtable | table | craig
public | testu | table | craig
public | testx | table | craig
(4 rows)
You'll want to use \dt since \d will display details for each table, not just list the table.
You can do this with schemas too, eg:
\dt *.sometable
will list all tables named sometable in any schema.
Much more convenient than writing queries against pg_class joined to pg_namespace, or querying information_schema.
The usual globbing syntax is accepted, where ? is any single character and * is zero or more characters. So \dt ???? would list all tables with four-character names.
Multiple wildcards are permitted, eg:
craig=> \dt public.*e?t*
List of relations
Schema | Name | Type | Owner
--------+--------------+-------+-------
public | exclude_test | table | craig
public | prep_test | table | craig
public | test | table | craig
public | testtable | table | craig
public | testu | table | craig
public | testx | table | craig
(6 rows)
Not very convenient unless you make it a proc, but;
SELECT * FROM pg_tables WHERE SUBSTRING(tablename FROM '<regex>') <> '';
To make it more convenient, you can create and call a proc as;
CREATE FUNCTION ft(TEXT) RETURNS SETOF pg_tables AS
'SELECT * FROM pg_tables WHERE SUBSTRING(tablename from $1) <> '''';'
LANGUAGE SQL;
SELECT * FROM ft('.*oc.*') -- Gets all tables matching `.*oc.*`
An SQLfiddle to test both with.
There is a table called pg_tables which has all table names in it.

select distinct records based on one field while keeping other fields intact

I've got a table like this:
table: searches
+------------------------------+
| id | address | date |
+------------------------------+
| 1 | 123 foo st | 03/01/13 |
| 2 | 123 foo st | 03/02/13 |
| 3 | 456 foo st | 03/02/13 |
| 4 | 567 foo st | 03/01/13 |
| 5 | 456 foo st | 03/01/13 |
| 6 | 567 foo st | 03/01/13 |
+------------------------------+
And want a result set like this:
+------------------------------+
| id | address | date |
+------------------------------+
| 2 | 123 foo st | 03/02/13 |
| 3 | 456 foo st | 03/02/13 |
| 4 | 567 foo st | 03/01/13 |
+------------------------------+
But ActiveRecord seems unable to achieve this result. Here's what I'm trying:
Model has a 'most_recent' scope: scope :most_recent, order('date_searched DESC')
Model.most_recent.uniq returns the full set (SELECT DISTINCT "searches".* FROM "searches" ORDER BY date DESC) -- obviously the query is not going to do what I want, but neither is selecting only one column. I need all columns, but only rows where the address is unique in the result set.
I could do something like Model.select('distinct(address), date, id'), but that feels...wrong.
You could do a
select max(id), address, max(date) as latest
from searches
group by address
order by latest desc
According to sqlfiddle that does exactly what I think you want.
It's not quite the same as your requirement output, which doesn't seem to care about which ID is returned. Still, the query needs to specify something, which is here done by the "max" aggregate function.
I don't think you'll have any luck with ActiveRecord's autogenerated query methods for this case. So just add your own query method using that SQL to your model class. It's completely standard SQL that'll also run on basically any other RDBMS.
Edit: One big weakness of the query is that it doesn't necessarily return actual records. If the highest ID for a given address doesn't corellate with the highest date for that address, the resulting "record" will be different from the one actually stored in the DB. Depending on the use case that might matter or not. For Mysql simply changing max(id) to id would fix that problem, but IIRC Oracle has a problem with that.
To show unique addresses:
Searches.group(:address)
Then you can select columns if you want:
Searches.group(:address).select('id,date')

Rails ActiveRecord return records where id exists in related table

I have a Client model and a Product model where a Client has many Products and a Product belongs to a CLient.
I need to find a query that only returns Clients if they have a record in the Product table
clients table
id | name
--------------
1 | Company A
2 | Company B
3 | Company C
products table
id | name | client_id
---------------------------
1 | Product A | 1
2 | Product B | 1
3 | Product C | 3
4 | Product D | 3
5 | Product E | 1
I only need Clients 1 3
For example something like
#clients = Client.where("client exists in products") #something to this effect
Simplest but not the fastest:
Client.where(:id => Product.select(:client_id).map(&:client_id))
SQL subquery (more faster):
Client.where("EXISTS(SELECT 1 from products where clients.id = products.client_id)")
Here's another solution. It's a subquery like Valery's second solution, but without writing out the sql:
Client.where(Product.where(client_id: Client.arel_table[:id]).exists)
Here is the solution which uses Where Exists gem (disclosure: I'm its author):
Client.where_exists(:products)
Another gem that exists to do that: activerecord_where_assoc (I'm the author)
With it:
Client.where_assoc_exists(:products)
If you had to also specify some of the products, when you could do it like this:
Client.where_assoc_exists(:products, id: my_products.map(&:id))
Doing it without a gem makes it easy to do mistakes.
Read more in the documentation. Here is an introduction and examples.
Also not the fastest but is concise:
Client.where(:id => Product.pluck(:client_id))

Resources