LibreOffice HSQLDB WHERE clause with LEFT JOIN and MAX? - join

I'm running macOS 11.6,LibreOffice 7.2.2.2,HSQLDB (my understanding is this is v.1.8, but don't know how to verify)
I'm a newbie to SQL, and I'm trying to write a DB to maintain a club membership roster. I'm trying to find everyone in the DB to whom renewal letters should be sent. The quirk is, if a person has never paid in the past, they should be sent a renewal letter. Old members who haven't renewed recently don't get a renewal, and obviously, each individual should only get one letter. I've created a toy example to display the problem I'm having...
Members table:
Key (Integer, Primary key, Autoincrement)
Name (Varchar)
+-----+----------+
| Key | Name |
+-----+----------+
| 0 | Abby |
| 1 | Bob |
| 2 | Dave |
| 3 | Ellen |
+-----+----------+
Payments table:
Key (Integer, Primary Key, autoincrement)
MemberKey (Integer, foreign key to Member table)
Payment Date (Date)
+-----+-----------+--------------+
| Key | MemberKey | Payment Date |
+-----+-----------+--------------+
| 0 | 0 | 2020-05-23 |
| 1 | 0 | 2021-06-12 |
| 2 | 1 | 2016-05-28 |
| 3 | 2 | 2020-07-02 |
+-----+-----------+--------------+
The only way I've found to include everyone is with a LEFT JOIN. The only way I've found to pick the most recent payment is with MAX. The following query produces a list of everyone's most recent payments, including people who've never paid:
SELECT "Members"."Key", "Members"."Name", MAX( "Payments"."Payment Date" ) AS "Last Payment"
FROM { oj "Members" LEFT OUTER JOIN "Payments" ON "Members"."Key" = "Payments"."MemberKey" }
GROUP BY "Members"."Key", "Members"."Name"
It returns the result below, which includes all members only once (Abby has 2 payments but only appears once with the most recent payment). Unfortunately it still includes people like Bob who've been out of the club so long that we don't want to send them a renewal notice.
+-----+----------+--------------+
| Key | Name | Last Payment |
+-----+----------+--------------+
| 0 | Abby | 2021-06-12 |
| 1 | Bob | 2016-05-28 |
| 2 | Dave | 2020-07-02 |
| 3 | Ellen | |
+-----+----------+--------------+
Where I hit a wall is when I try to perform any kind of conditional operation on the Last Payment, to determine whether it's recent enough to include in the list of renewal notices. For instance, in HSQLDB, the query below returns the error, "The data content could not be loaded. Not a condition." The only change in this query from the 1st one is the addition of the WHERE clause.
SELECT "Members"."Key", "Members"."Name", MAX( "Payments"."Payment Date" ) AS "Last Payment"
FROM { oj "Members" LEFT OUTER JOIN "Payments" ON "Members"."Key" = "Payments"."MemberKey" }
WHERE "Last Payment" >= '2020-01-01'
GROUP BY "Members"."Key", "Members"."Name"
The desired output should look like this:
+-----+----------+--------------+
| Key | Name | Last Payment |
+-----+----------+--------------+
| 0 | Abby | 2021-06-12 |
| 2 | Dave | 2020-07-02 |
| 3 | Ellen | |
+-----+----------+--------------+
I've been digging around the web trying anything that looks relevant. I've tried "HAVING" clauses--I can make them work with a COUNT(*) function, but I can't make them work with a MAX(*) function. I've tried using my 1st query as a subquery, and applying the WHERE clause on "Last Payment" in the main query. I've tried solutions people say work in MySQL, but I can't get them to work in HSQLDB. I tried using the 1st query as a View, and writing a query against the View. I've tried a dozen other things I don't even remember. Everything past the 1st query above throws an error. I wanted to include my toy DB, but can't find a way to attach it to the post.
Can anyone help please?

This worked for me.
SELECT "Members"."Key", "Members"."Name", MAX( "Payments"."Payment Date" ) AS "Last Payment"
FROM {oj "Members" LEFT OUTER JOIN "Payments" ON "Members"."Key" = "Payments"."MemberKey"
WHERE "Payments"."Payment Date" >= '2020-01-01'
OR "Payments"."Payment Date" IS NULL}
GROUP BY "Members"."Key", "Members"."Name"
Result:
This works as well.
SELECT "Members"."Key", "Members"."Name", MAX( "Payments"."Payment Date" ) AS "Last Payment"
FROM { oj "Members" LEFT OUTER JOIN "Payments" ON "Members"."Key" = "Payments"."MemberKey" }
WHERE "Payments"."Payment Date" >= '2020-01-01'
OR "Payments"."Payment Date" IS NULL
GROUP BY "Members"."Key", "Members"."Name"
Perhaps the problem you were having is that "Last Payment" is only a column title and not the actual name of any column.

Related

How to join exclusively by date range in Hive SQL?

I have two subqueries that i'd like to join only by the date range between open and closed date from the first table.
First table example:
| id_original | open_datetime | close_datetime |
|-------------|-------------------|-------------------|
| 1 |2019-01-01 10:00:02|2019-01-02 11:00:21|
| 2 |2019-01-01 10:05:52|2019-01-05 16:45:12|
| 3 |2019-01-03 00:00:43|2019-01-03 23:12:44|
Second table example:
| category | all other columns...| open_date |
|----------|---------------------|-------------------|
| A | ... |2019-01-01 11:00:00|
| B | ... |2019-01-02 19:10:10|
| C | ... |2019-01-03 08:23:45|
| D | ... |2019-01-04 18:10:53|
Desired output:
| id_original | category | all other columns...| open_date |
|-------------|----------|---------------------|-------------------|
| 1 | A | ... |2019-01-01 11:00:00|
| 2 | A | ... |2019-01-01 11:00:00|
| 2 | B | ... |2019-01-02 19:10:10|
| 2 | C | ... |2019-01-03 08:23:45|
| 2 | D | ... |2019-01-04 18:10:53|
| 3 | C | ... |2019-01-03 08:23:45|
This is my code:
SELECT *
FROM (
SELECT id, open_datetime, close_datetime
FROM table1
WHERE id IN (list_of_ids)
) t1
LEFT JOIN (
SELECT *
FROM table2
WHERE other_conditions
) t2 ON t2.open_date >= t1.open_datetime AND t2.open_date <= t1.close_datetime
I know that Hive SQL doesn't support inequality as conditions for a JOIN. But how should I approach this matter?
Note: The join I need is exclusively for dates, there is no equal key from t1 and t2 that I can use to join them.
Thanks!
Move the join condition to the WHERE clause. In this case LEFT JOIN is transformed to CROSS, because you do not have other join conditions, and join without conditions is CROSS-join. After the cross join, filter rows in the WHERE clause. Though CROSS join may cause serious performance issues if it is not possible to filter rows or join by some other key to avoid CROSS product. If one of the table is small enough to fit into memory, CROSS-join will be executed as map-join, this also will help to improve performance.
set hive.auto.convert.join=true;
set hive.mapjoin.smalltable.filesize=512000000; --try to set it bigger and see if map-join works
--setting too big value may cause OOM exception
SELECT *
FROM (
SELECT id, open_datetime, close_datetime
FROM table1
WHERE id IN (list_of_ids)
) t1
CROSS JOIN
(
SELECT *
FROM table2
WHERE other_conditions
) t2
WHERE (t2.open_date >= t1.open_datetime AND t2.open_date <= t1.close_datetime)
OR t2.category is NULL --to allow absence of t2 like in LEFT join
;

What is the best way to attach a running total to selected row data?

I have a table that looks like this:
Created at | Amount | Register Name
--------------+---------+-----------------
01/01/2019... | -150.01 | Front
01/01/2019... | 38.10 | Back
What is the best way to attach an ascending-by-date running total to each record which applies only to the register name the record has? I can do this in Ruby, but doing it in the database will be much faster as it is a web application.
The application is a Rails application running Postgres 10, although the answer can be Rails-agnostic of course.
Use the aggregate sum() as a window function, e.g.:
with my_table (created_at, amount, register_name) as (
values
('2019-01-01', -150.01, 'Front'),
('2019-01-01', 38.10, 'Back'),
('2019-01-02', -150.01, 'Front'),
('2019-01-02', 38.10, 'Back')
)
select
created_at, amount, register_name,
sum(amount) over (partition by register_name order by created_at)
from my_table
order by created_at, register_name;
created_at | amount | register_name | sum
------------+---------+---------------+---------
2019-01-01 | 38.10 | Back | 38.10
2019-01-01 | -150.01 | Front | -150.01
2019-01-02 | 38.10 | Back | 76.20
2019-01-02 | -150.01 | Front | -300.02
(4 rows)

Query only records with max value within a group

Say you have the following users table on PostgreSQL:
id | group_id | name | age
---|----------|---------|----
1 | 1 | adam | 10
2 | 1 | ben | 11
3 | 1 | charlie | 12 <-
3 | 2 | donnie | 20
4 | 2 | ewan | 21 <-
5 | 3 | fred | 30 <-
How can I query all columns only from the oldest user per group_id (those marked with an arrow)?
I've tried with group by, but keep hitting "users.id" must appear in the GROUP BY clause.
(Note: I have to work the query into a Rails AR model scope.)
After some digging, you can do use PostgreSQL's DISTINCT ON (col):
select distinct on (users.group_id) users.*
from users
order by users.group_id, users.age desc;
-- you might want to add extra column in ordering in case 2 users have the same age for same group_id
Translated in Rails, it would be:
User
.select('DISTINCT ON (users.group_id), users.*')
.order('users.group_id, users.age DESC')
Some doc about DISTINCT ON: https://www.postgresql.org/docs/9.3/sql-select.html#SQL-DISTINCT
Working example: https://www.db-fiddle.com/f/t4jeW4Sy91oxEfjMKYJpB1/0
You could use ROW_NUMBER/RANK(if ties are possible) windowed functions:
SELECT *
FROM (SELECT *,ROW_NUMBER() OVER(PARTITION BY group_id ORDER BY age DESC) AS rn
FROM tab) s
WHERE s.rn = 1;
you can use a subquery wuth aggreagated resul in join
select m.*
from users m
inner join (
select group_id, max(age) max_age
from users
group by group_id
) AS t on (t.group_id = m.group_id and t.max_age = m.age)

select distinct records based on one field while keeping other fields intact

I've got a table like this:
table: searches
+------------------------------+
| id | address | date |
+------------------------------+
| 1 | 123 foo st | 03/01/13 |
| 2 | 123 foo st | 03/02/13 |
| 3 | 456 foo st | 03/02/13 |
| 4 | 567 foo st | 03/01/13 |
| 5 | 456 foo st | 03/01/13 |
| 6 | 567 foo st | 03/01/13 |
+------------------------------+
And want a result set like this:
+------------------------------+
| id | address | date |
+------------------------------+
| 2 | 123 foo st | 03/02/13 |
| 3 | 456 foo st | 03/02/13 |
| 4 | 567 foo st | 03/01/13 |
+------------------------------+
But ActiveRecord seems unable to achieve this result. Here's what I'm trying:
Model has a 'most_recent' scope: scope :most_recent, order('date_searched DESC')
Model.most_recent.uniq returns the full set (SELECT DISTINCT "searches".* FROM "searches" ORDER BY date DESC) -- obviously the query is not going to do what I want, but neither is selecting only one column. I need all columns, but only rows where the address is unique in the result set.
I could do something like Model.select('distinct(address), date, id'), but that feels...wrong.
You could do a
select max(id), address, max(date) as latest
from searches
group by address
order by latest desc
According to sqlfiddle that does exactly what I think you want.
It's not quite the same as your requirement output, which doesn't seem to care about which ID is returned. Still, the query needs to specify something, which is here done by the "max" aggregate function.
I don't think you'll have any luck with ActiveRecord's autogenerated query methods for this case. So just add your own query method using that SQL to your model class. It's completely standard SQL that'll also run on basically any other RDBMS.
Edit: One big weakness of the query is that it doesn't necessarily return actual records. If the highest ID for a given address doesn't corellate with the highest date for that address, the resulting "record" will be different from the one actually stored in the DB. Depending on the use case that might matter or not. For Mysql simply changing max(id) to id would fix that problem, but IIRC Oracle has a problem with that.
To show unique addresses:
Searches.group(:address)
Then you can select columns if you want:
Searches.group(:address).select('id,date')

SelfJoin using Symfony 1.4/propel 1.4

I need to do self join using Symfony 1.4/Propel 1.4. My tables/db are too big to put here but an example table is given below to replicate the issue I'm facing.
Consider following example table with example data
Table Employee
----------------------------------------
|id | name | mid |
----------------------------------------
|1 | CEO |NULL |
|2 | CTO |1 |
|3 | CFO |1 |
|4 | PM1 |2 |
|5 | TL1 |4 |
----------------------------------------
Here first column is employee, second is employee name and 3rd is manager id. mid is link to another row in same table. For example, CTO(2) reports to CEO(1) so mid in second row is 1.
I need following output:
---------------------
|ename | manager |
---------------------
|CTO | CEO |
|CFO | CEO |
|PM1 | CTO |
|TL1 | PM1 |
---------------------
The SQL query will be:
SELECT e.name,m.name
FROM employee e, employee m
WHERE e.mid=m.id
AND e.mid NOT NULL;
My problem is, how do I write same query in Symfony/Propel 1.4? I try following
$c = new Criteria();
$c->clearSelectColumns();
$c->addSelectColumn(EmployeePeer::NAME.' as ename');
$c->addSelectColumn(EmployeePeer::NAME.' as manager');
$c->setPrimaryTableName(EmployeePeer::TABLE_NAME);
$c->addJoin(EmployeePeer::MID, EmployeePeer::ID, Criteria::INNER_JOIN);
$c->add(EmployeePeer::MID, NULL, Criteria::EQUAL);
Even I know this query do not make any sense and as per my expectation, I got PropelException.
But self join is one of the common database operation and I'm sure Propel must support that. Can someone please tell how to achieve above requirements in Symfony/Propel 1.4
According to this SQLFiddle, the SQL you want to perform is:
SELECT e.name as ename, m.name as manager
FROM employee e
LEFT JOIN employee m ON e.mid = m.id WHERE e.mid IS NOT NULL;
Like YouthPark, I think addAlias is the solution and I will do something like that:
$c = new Criteria();
$c->clearSelectColumns();
$c->addSelectColumn(EmployeePeer::NAME.' as ename');
$c->addSelectColumn(EmployeePeer::NAME.' as manager');
$c->addAlias('c2', EmployeePeer::TABLE_NAME);
$c->addJoin(EmployeePeer::ID, EmployeePeer::alias('c2', EmployeePeer::MID), Criteria::LEFT_JOIN);
$c->add(EmployeePeer::MID, Criteria::ISNOTNULL);
I'm not sure about the addSelectColumn part, by the way.
Well I never tried so not sure if that help you or not but there is no other answers so you might try/further search addAlias method, if you are stuck.
$notifCrit->addAlias("A", ThreadsPeer::TABLE_NAME);
$notifCrit->add("A.father_id", ThreadsPeer::FATHER_ID."=A.father_id", Criteria::CUSTOM);
Taken from last comment of old symfony forums
Not sure but Propel 1.4 might not support self join with build in methods as it need to set alias. So you need custom query as in above example.
$c = new Criteria();
$c->addJoin(ArticlePeer::AUTHOR_ID, AuthorPeer::ID);
$c->add(AuthorPeer::NAME, 'John Doe');
$articles = ArticlePeer::doSelect($c);

Resources