Kusto join two tables based on latest available date - join

Is there a way to join two tables on Kusto, and join values based on latest available date from the second table?
Let's say we get distinct names from first table, and want to join values from the second table based on latest available dates.
I would also only keep matches from left column.
table1
names
-----
Alex
John
Mary
table2
name weight date
----- ------ ------
Alex. 160. 2023-01-20
Alex. 168. 2023-01-28
Mary. 120. 2022-08-28
Mary. 140. 2020-09-17
Sample code:
table1
|distinct names
|join kind=inner table2 on $left.names==$right.name

let table1 = datatable(names:string)
[
"Alex"
,"John"
,"Mary"
];
let table2 = datatable(name:string, weight:real ,["date"]:datetime)
[
"Alex" ,160 ,datetime(2023-01-20)
,"Alex" ,168 ,datetime(2023-01-28)
,"Mary" ,120 ,datetime(2022-08-28)
,"Mary" ,140 ,datetime(2020-09-17)
];
table1
| distinct names
| join kind=inner (table2 | summarize arg_max(['date'], *) by name) on $left.names==$right.name
names
name
date
weight
Mary
Mary
2022-08-28T00:00:00Z
120
Alex
Alex
2023-01-28T00:00:00Z
168
Fiddle

Related

select join in the same table

I have a table like this
TRX_NUMBER is a invoice, and this field have a return number inside of the invoice.
And I want to select table and join to the same table use CUSTOMER_TRX_ID and PREVIOUS_CUSTOMER_TRX_ID as the connection (ON)
And the result what I want
Can you help me about this ?
Here's one option:
Sample data:
SQL> with invoice (customer_trx_id, trx_number, previous_customer_trx_id) as
2 (select 81196, 'ARR05-09', 22089 from dual union all
3 select 22089, 'IJU86-09', null from dual union all
4 select 13931, 'IJU07-09', null from dual
5 )
Query begins here:
6 select a.trx_number, b.trx_number as retur
7 from invoice a left join invoice b on a.customer_trx_id = b.previous_customer_trx_id
8 where not exists (select null
9 from invoice c
10 where c.customer_trx_id = a.previous_customer_trx_id);
TRX_NUMBER RETUR
--------------- --------
IJU86-09 ARR05-09
IJU07-09
SQL>
Use an alias to take the same table multiple times.
For example we have a table INVOICE:
SELECT t1.TRX_NUMBER AS TRX_NUMBER, t2.TRX_NUMBER AS RETUR
FROM INVOICE t1
LEFT JOIN INVOICE t2 ON t1.CUSTOMER_TRX_ID = t2.PREVIOUS_CUSTOMER_TRX_ID
if only one level then add a condition
where not exists (select null
from invoice t3
where t3.customer_trx_id = t1.previous_customer_trx_id)
or exclude everything where there is a previous number - it means they are already a level lower
where t1.PREVIOUS_CUSTOMER_TRX_ID is null

Many-to-many SELECT divided into multiple rows for some reason

I have two tables joined via third in a many-to-many relationship. To simplify:
Table A
ID-A (int)
Name (varchar)
Score (numeric)
Table B
ID-B (int)
Name (varchar)
Table AB
ID-AB (int)
A (foreign key ID-A)
B (foreign key ID-B)
What I want is to display the B-Name and a sum of the "Score" values of all the As belonging to the given B. However, the following code:
WITH "Data" AS(
SELECT "B."."Name" As "BName", "A"."Name", "Score"
FROM "AB"
LEFT OUTER JOIN "A" ON "AB"."A" = "A"."ID-A"
LEFT OUTER JOIN "B" ON "AB"."B" = "B"."ID-B")
SELECT "BName", SUM("Score") AS "Total"
FROM "Data"
GROUP BY "Name", "Score"
ORDER BY "Total" DESC
The results display several rows for every "BName" with the "score" divided into semingly random increments between these rows. For example, if the desired result for Johnny is 12 and for April it's 25, the query may shows something like:
Johnny | 7
Johnny | 3
Johnny | 2
April | 19
April | 5
April | 1
etc.
Even after trying to nest the query and doing another SELECT with SUM("Score"), the results are the same. I'm not sure what I'm doing wrong?
Remove Score from the GROUP BY clause:
SELECT BName, SUM(Score) AS Total
FROM Data
GROUP BY BName
ORDER BY Total DESC;
The purpose of your query is to summarize by name, so name alone should appear in the GROUP BY clause. By also including the score, you will get a record in the output for each unique name/score combination.
Okay, I figured out my problem. Indeed, I had to GROUP BY "Name" only, but Firebird I thought wasn't letting me do that. Turns out it was just a typo. Oops.

Hive outer join: how to change the default NULL value

For hive outer join, if a joining key does not exist in one table,hive will put NULL. Is that possible to use another value for this? For example:
Table1:
user_id, name, age
1 Bob 23
2 Jim 43
Table2:
user_id, txn_amt, date
1 20.00 2013-12-10
1 10.00 2014-07-01
If I do a LEFT OUTER JOIN on user_id:
INSERT INTO TABLE user_txn
SELECT
Table1.user_id,
Table1.name,
Table2.txn_amt,
Table2.date
FROM
Table2
LEFT OUTER JOIN
Table1
ON
Table1.user_id = Table2.user_id;
I want the output be like this:
user_id, name, tnx_amt, date
1 Bob 20.00 2013-12-10
1 Bob 10.00 2014-07-01
2 Jim 0.00 2099-12-31
Note the txn_amt and date columns for Jim. Is there any way in hive to define default values like this?
You can use COALESCE for this, instead of solely Table2.txn_amt
COALESCE(Table2.txn_amt, 0.0)
What this does is returns the first value that is not null. So, if txn_amt is null, it'll go to the second value in the list. 0.0 is never null, so it'll pick that. If txn_amt has a value in it, it'll return that value.

How to join with three tables with multiple conditions?

I want to join three tables with multiple conditions. Below is table structure and expected result
users_id users_first_name
1 rocky
2 James
3 john
meeting_details_id meeting_title users_id meeting_lead close_meeting (NO)
1 newmeet 3 1 No
2 testmeet 2 2 No
Attended_meetings
project_meeting_attendeeid meeting_details_id users_id access_type (attendee)
1 1 2 attendee
Expected output:
Query should check if meeting is attended OR users is meeting lead
meeting_title creator meeting_lead close_meeting (NO)
newmeet john rocky No
testmeet james james No
Try this:
SELECT M.meeting_title,U1.users_first_name as creator,U2.users_first_name as meeting_lead,M.close_meeting
FROM MeetingTable M LEFT OUTER JOIN
Users U1 ON M.creator=U1.users_id LEFT OUTER JOIN
Users U2 ON M.meeting_lead=U2.users_id
Result:
MEETING_TITLE CREATOR MEETING_LEAD CLOSE_MEETING
newmeet john rocky No
testmeet James James No
See result in SQL Fiddle.

sqlite2: Joining max values per column from another table (subquery reference)?

I'm using the following database:
CREATE TABLE datas (d_id INTEGER PRIMARY KEY, name_id numeric, countdata numeric);
INSERT INTO datas VALUES(1,1,20); //(NULL,1,20);
INSERT INTO datas VALUES(2,1,47); //(NULL,1,47);
INSERT INTO datas VALUES(3,2,36); //(NULL,2,36);
INSERT INTO datas VALUES(4,2,58); //(NULL,2,58);
INSERT INTO datas VALUES(5,2,87); //(NULL,2,87);
CREATE TABLE names (n_id INTEGER PRIMARY KEY, name text);
INSERT INTO names VALUES(1,'nameA'); //(NULL,'nameA');
INSERT INTO names VALUES(2,'nameB'); //(NULL,'nameB');
What I would like to do, is to select all values (rows) of names - to which all columns of datas will be appended, for the row where datas.countdata is maximum for n_id (and of course, where name_id = n_id).
I can somewhat get there with the following query:
sqlite> .header ON
sqlite> SELECT * FROM names AS n1
LEFT OUTER JOIN (
SELECT d_id, name_id, countdata FROM datas AS d1
WHERE d1.countdata IN (
SELECT MAX(countdata) FROM datas
WHERE name_id=1
)
) AS p1 ON n_id=name_id;
n1.n_id|n1.name|p1.d_id|p1.name_id|p1.countdata
1|nameA|2|1|47
2|nameB|||
... however - obviously - it only works for a single row (the one explicitly set by name_id=1).
The problem is, the SQL query fails whenever I try to somehow reference the "current" n_id:
sqlite> SELECT * FROM names AS n1
LEFT OUTER JOIN (
SELECT d_id, name_id, countdata FROM datas AS d1
WHERE d1.countdata IN (
SELECT MAX(countdata) FROM datas
WHERE name_id=n1.n_id
)
) AS p1 ON n_id=name_id;
SQL error: no such column: n1.n_id
Is there any way of achieving what I want in Sqlite2??
Thanks in advance,
Cheers!
Oh, well - that wasn't trivial at all, but here is a solution:
sqlite> SELECT * FROM names AS n1
LEFT OUTER JOIN (
SELECT d1.*
FROM datas AS d1, (
SELECT max(countdata) as countdata,name_id
FROM datas
GROUP BY name_id
) AS ttemp
WHERE d1.name_id = ttemp.name_id AND d1.countdata = ttemp.countdata
) AS p1 ON n1.n_id=p1.name_id;
n1.n n1.name p1.d_id p1.name_id p1.countdata
---- ------------ ---------- ---------- -----------------------------------
1 nameA 2 1 47
2 nameB 5 2 87
Well, hope this ends up helping someone, :)
Cheers!
Notes: note that just calling max(countdata) screws up competely d_id:
sqlite> select d_id,name_id,max(countdata) as countdata from datas group by name_id;
d_id name_id countdata
---- ------------ ----------
3 2 87
1 1 47
so to get correct corresponding d_id, we must do max() on datas separately - and then perform sort of an intersect with the full datas (except that intersect in sqlite requires that there are equal number of columns in both datasets, which is not the case here - and even if we made it that way, as seen above d_id will be wrong, so intersect will not work).
One way to do that is in using a sort of a temporary table, and then utilize a multiple table SELECT query so as to set conditions between full datas and the subset returned via max(countdata), as shown below:
sqlite> CREATE TABLE ttemp AS SELECT max(countdata) as countdata,name_id FROM datas GROUP BY name_id;
sqlite> SELECT d1.*, ttemp.* FROM datas AS d1, ttemp WHERE d1.name_id = ttemp.name_id AND d1.countdata = ttemp.countdata;
d1.d d1.name_id d1.countda ttemp.coun ttemp.name_id
---- ------------ ---------- ---------- -----------------------------------
2 1 47 47 1
5 2 87 87 2
sqlite> DROP TABLE ttemp;
or, we can rewrite the above so a SELECT subquery (sub-select?) is used, like this:
sqlite> SELECT d1.* FROM datas AS d1, (SELECT max(countdata) as countdata,name_id FROM datas GROUP BY name_id) AS ttemp WHERE d1.name_id = ttemp.name_id AND d1.countdata = ttemp.countdata;
d1.d d1.name_id d1.countda
---- ------------ ----------
2 1 47
5 2 87

Resources