create view of count based on latest timestamp for a particular id - mysql-5.1

I have been provided with the following code to run a query that counts the number of connector_pks grouped by group_status based on the latest timestamp:
SELECT
`group_status`,COUNT(*) 'Count of status '
FROM
(SELECT `connector_pk`, `group_status`, `status_timestamp`
FROM connector_status_report t1
WHERE `status_timestamp` = (SELECT MAX(`status_timestamp`)
FROM connector_status_report t2 WHERE t2.`connector_pk` = t1.`connector_pk`))
t3
GROUP BY `group_status`
Unfortunately this takes about 30 minutes to run so I was hoping for an optimised solution.
Example table
connector_pk group_status status timestamp
1 Available 2020-02-11 19:14:45
1 Charging 2020-02-11 19:18:45
2 Available 2020-02-11 19:15:45
2 Not Available 2020-02-11 19:18:45
3 Not Available 2020-02-11 19:14:45
The desired output would look like this:
group_Status | Count of status
Available | 0
Charging | 1
Not Available | 2
For my original question I was pointed to the following question (and answers):
Get records with max value for each group of grouped SQL results
I would like to create a view with the output
Is it possible to also add the following to the query to include in the View:
SELECT status, = IF(status = 'charging', 'Charging', if(status = 'Not
Occupied','Available', 'Occupied') AS group_status FROM
connector_status_report

I managed to speed up the query using the following:
CREATE VIEW statuscount AS
Select group_status, COUNT(*) 'Count of status'
FROM
(SELECT tt.*
FROM connector_status_report tt
INNER JOIN
(SELECT connector_pk, MAX(status_timestamp) AS MaxDateTime
FROM connector_status_report
GROUP BY connector_pk) groupedtt
ON tt.connector_pk = groupedtt.connector_pk
AND tt.status_timestamp = groupedtt.MaxDateTime) t3
GROUP BY group_status
If anybody can help with inserting the query that creates the 'Group_Status' column it would be much appreciated

Related

select join in the same table

I have a table like this
TRX_NUMBER is a invoice, and this field have a return number inside of the invoice.
And I want to select table and join to the same table use CUSTOMER_TRX_ID and PREVIOUS_CUSTOMER_TRX_ID as the connection (ON)
And the result what I want
Can you help me about this ?
Here's one option:
Sample data:
SQL> with invoice (customer_trx_id, trx_number, previous_customer_trx_id) as
2 (select 81196, 'ARR05-09', 22089 from dual union all
3 select 22089, 'IJU86-09', null from dual union all
4 select 13931, 'IJU07-09', null from dual
5 )
Query begins here:
6 select a.trx_number, b.trx_number as retur
7 from invoice a left join invoice b on a.customer_trx_id = b.previous_customer_trx_id
8 where not exists (select null
9 from invoice c
10 where c.customer_trx_id = a.previous_customer_trx_id);
TRX_NUMBER RETUR
--------------- --------
IJU86-09 ARR05-09
IJU07-09
SQL>
Use an alias to take the same table multiple times.
For example we have a table INVOICE:
SELECT t1.TRX_NUMBER AS TRX_NUMBER, t2.TRX_NUMBER AS RETUR
FROM INVOICE t1
LEFT JOIN INVOICE t2 ON t1.CUSTOMER_TRX_ID = t2.PREVIOUS_CUSTOMER_TRX_ID
if only one level then add a condition
where not exists (select null
from invoice t3
where t3.customer_trx_id = t1.previous_customer_trx_id)
or exclude everything where there is a previous number - it means they are already a level lower
where t1.PREVIOUS_CUSTOMER_TRX_ID is null

JOIN ON second highest value (Impala)

I don't know how or even if this is possible.... I am trying to JOIN tables on the second highest value. I tried rowNumber, lag, lead & rank but haven't been able to get any of them to do what I need. To summarize, I'm just trying to shift the activitydate table down one row to join on rollDate minus 1 (but can't use -1 because they are not consistent dates, there are days missing.)
Does anyone know a good way to do this? Any suggestions are appreciated!
Select
ds.activitydate
,sum(ws.weeklyTotals / ds.daysBetween) as newRunRates -- getting an average of daily activity from weekly totals
from
(select
fsc.activitydate
,fsc.weekstart
,max(fsc.activitydate) OVER (partition by fsc.weekstart) as rollUpDate
,datediff(to_date(max(fsc.activitydate) OVER (partition by fsc.weekstart)), to_date(fsc.weekstart)) + 1 as daysBetween
from fiscalcalendar fsc
) ds -- used this to get a week-ending date bc that is what I need to join on. I only have a week start in this table
left join
(select
activitydate_iso
,count(distinct assignedmaincomponentid) as weeklyTotals
from activityTable
group by 1
) ws -- weeklySplits -- this gives me my weekly totals by a week ending date
on ds.rollUpDate = ws.activitydate_iso
-- need this join logic to actually be
-- on ds.rollUpDate = (max(ws.activitydate_iso) where activitydate_iso < rollUpDate)
where activitydate between '2020-05-22' and '2020-06-15'
group by 1,2
order by 1,2 ```

Which SQL Join do I use to see results from one table that are not present in set from the same table?

I have a massive table wherein records are generated each month, and the results tagged with month_of column. I need to compare these month_of result sets each month to find new activations (new records that are present this month that weren't there last month)
The goal:
Get a set of results from the CURRENT MONTH where in the unique-ids are not present in the PREVIOUS MONTH.
Explained:
Last month (March), I had 10 records marked status="ACTIVE" with month_of "MARCH"
This month (April), I have 11 records marked status="ACTIVE" with month_of "APRIL"
Something like this I've already tried:
if ta1 = current months's report, and ta2 = last month's report
SELECT id FROM table ta1
LEFT OUTER JOIN table ta2
ON ta1.status = ta2.status
WHERE ta1.month_of = #{current_month}
AND ta2.month_of = #{last_month}
AND ta1.status = 'ACTIVE'
AND ta2.id IS null
I need the query that would return the 1 new record with month_of "APRIL" that isn't present in the month_of "MARCH" results.
Can anyone point me at the right join to use in order to get what I'm looking for? This solution is going to apply to a table with almost a billion records.
select id
from ta1
where month_of = (current_month)
and id not in (select id from ta1 where month_of = (last_month))
and status in ('Active')
or you could do:
select a.id
from(select id, month_of from ta1 where month_of = (current_month) and status = 'Active'
)a
left join (select id, month_of from ta1 where month_of = (last_month)
)b on a.id != b.id
THIS ARTICLE was immensely helpful in my understanding of joins and how to use them.
I actually ended up using a LEFT OUTER JOIN as follows:
SELECT se1.id FROM service_effectivenesses se1
LEFT OUTER JOIN service_effectivenesses se2
ON se1.vehicle_id = se2.vehicle_id
AND se1.dealership_id = se2.dealership_id
WHERE se1.dealership_id = #{dealershio_id}
AND se2.id IS NULL
AND se1.month_of = DATE('#{month_of.strftime("%Y-%m-%d")}')
AND se2.month_of = DATE('#{month_of.strftime("%Y-%m-%d")}') - interval '1 month'
AND se1.status = 'ACTIVE'
The #{} variables are from our Rails app, so ignore those.

Find records with ID in array of IDS and keep the order of records matching that of IDs [duplicate]

I have a simple SQL query in PostgreSQL 8.3 that grabs a bunch of comments. I provide a sorted list of values to the IN construct in the WHERE clause:
SELECT * FROM comments WHERE (comments.id IN (1,3,2,4));
This returns comments in an arbitrary order which in my happens to be ids like 1,2,3,4.
I want the resulting rows sorted like the list in the IN construct: (1,3,2,4).
How to achieve that?
You can do it quite easily with (introduced in PostgreSQL 8.2) VALUES (), ().
Syntax will be like this:
select c.*
from comments c
join (
values
(1,1),
(3,2),
(2,3),
(4,4)
) as x (id, ordering) on c.id = x.id
order by x.ordering
In Postgres 9.4 or later, this is simplest and fastest:
SELECT c.*
FROM comments c
JOIN unnest('{1,3,2,4}'::int[]) WITH ORDINALITY t(id, ord) USING (id)
ORDER BY t.ord;
WITH ORDINALITY was introduced with in Postgres 9.4.
No need for a subquery, we can use the set-returning function like a table directly. (A.k.a. "table-function".)
A string literal to hand in the array instead of an ARRAY constructor may be easier to implement with some clients.
For convenience (optionally), copy the column name we are joining to ("id" in the example), so we can join with a short USING clause to only get a single instance of the join column in the result.
Works with any input type. If your key column is of type text, provide something like '{foo,bar,baz}'::text[].
Detailed explanation:
PostgreSQL unnest() with element number
Just because it is so difficult to find and it has to be spread: in mySQL this can be done much simpler, but I don't know if it works in other SQL.
SELECT * FROM `comments`
WHERE `comments`.`id` IN ('12','5','3','17')
ORDER BY FIELD(`comments`.`id`,'12','5','3','17')
With Postgres 9.4 this can be done a bit shorter:
select c.*
from comments c
join (
select *
from unnest(array[43,47,42]) with ordinality
) as x (id, ordering) on c.id = x.id
order by x.ordering;
Or a bit more compact without a derived table:
select c.*
from comments c
join unnest(array[43,47,42]) with ordinality as x (id, ordering)
on c.id = x.id
order by x.ordering
Removing the need to manually assign/maintain a position to each value.
With Postgres 9.6 this can be done using array_position():
with x (id_list) as (
values (array[42,48,43])
)
select c.*
from comments c, x
where id = any (x.id_list)
order by array_position(x.id_list, c.id);
The CTE is used so that the list of values only needs to be specified once. If that is not important this can also be written as:
select c.*
from comments c
where id in (42,48,43)
order by array_position(array[42,48,43], c.id);
I think this way is better :
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY id=1 DESC, id=3 DESC, id=2 DESC, id=4 DESC
Another way to do it in Postgres would be to use the idx function.
SELECT *
FROM comments
ORDER BY idx(array[1,3,2,4], comments.id)
Don't forget to create the idx function first, as described here: http://wiki.postgresql.org/wiki/Array_Index
In Postgresql:
select *
from comments
where id in (1,3,2,4)
order by position(id::text in '1,3,2,4')
On researching this some more I found this solution:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY CASE "comments"."id"
WHEN 1 THEN 1
WHEN 3 THEN 2
WHEN 2 THEN 3
WHEN 4 THEN 4
END
However this seems rather verbose and might have performance issues with large datasets.
Can anyone comment on these issues?
To do this, I think you should probably have an additional "ORDER" table which defines the mapping of IDs to order (effectively doing what your response to your own question said), which you can then use as an additional column on your select which you can then sort on.
In that way, you explicitly describe the ordering you desire in the database, where it should be.
sans SEQUENCE, works only on 8.4:
select * from comments c
join
(
select id, row_number() over() as id_sorter
from (select unnest(ARRAY[1,3,2,4]) as id) as y
) x on x.id = c.id
order by x.id_sorter
SELECT * FROM "comments" JOIN (
SELECT 1 as "id",1 as "order" UNION ALL
SELECT 3,2 UNION ALL SELECT 2,3 UNION ALL SELECT 4,4
) j ON "comments"."id" = j."id" ORDER BY j.ORDER
or if you prefer evil over good:
SELECT * FROM "comments" WHERE ("comments"."id" IN (1,3,2,4))
ORDER BY POSITION(','+"comments"."id"+',' IN ',1,3,2,4,')
And here's another solution that works and uses a constant table (http://www.postgresql.org/docs/8.3/interactive/sql-values.html):
SELECT * FROM comments AS c,
(VALUES (1,1),(3,2),(2,3),(4,4) ) AS t (ord_id,ord)
WHERE (c.id IN (1,3,2,4)) AND (c.id = t.ord_id)
ORDER BY ord
But again I'm not sure that this is performant.
I've got a bunch of answers now. Can I get some voting and comments so I know which is the winner!
Thanks All :-)
create sequence serial start 1;
select * from comments c
join (select unnest(ARRAY[1,3,2,4]) as id, nextval('serial') as id_sorter) x
on x.id = c.id
order by x.id_sorter;
drop sequence serial;
[EDIT]
unnest is not yet built-in in 8.3, but you can create one yourself(the beauty of any*):
create function unnest(anyarray) returns setof anyelement
language sql as
$$
select $1[i] from generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
that function can work in any type:
select unnest(array['John','Paul','George','Ringo']) as beatle
select unnest(array[1,3,2,4]) as id
Slight improvement over the version that uses a sequence I think:
CREATE OR REPLACE FUNCTION in_sort(anyarray, out id anyelement, out ordinal int)
LANGUAGE SQL AS
$$
SELECT $1[i], i FROM generate_series(array_lower($1,1),array_upper($1,1)) i;
$$;
SELECT
*
FROM
comments c
INNER JOIN (SELECT * FROM in_sort(ARRAY[1,3,2,4])) AS in_sort
USING (id)
ORDER BY in_sort.ordinal;
select * from comments where comments.id in
(select unnest(ids) from bbs where id=19795)
order by array_position((select ids from bbs where id=19795),comments.id)
here, [bbs] is the main table that has a field called ids,
and, ids is the array that store the comments.id .
passed in postgresql 9.6
Lets get a visual impression about what was already said. For example you have a table with some tasks:
SELECT a.id,a.status,a.description FROM minicloud_tasks as a ORDER BY random();
id | status | description
----+------------+------------------
4 | processing | work on postgres
6 | deleted | need some rest
3 | pending | garden party
5 | completed | work on html
And you want to order the list of tasks by its status.
The status is a list of string values:
(processing, pending, completed, deleted)
The trick is to give each status value an interger and order the list numerical:
SELECT a.id,a.status,a.description FROM minicloud_tasks AS a
JOIN (
VALUES ('processing', 1), ('pending', 2), ('completed', 3), ('deleted', 4)
) AS b (status, id) ON (a.status = b.status)
ORDER BY b.id ASC;
Which leads to:
id | status | description
----+------------+------------------
4 | processing | work on postgres
3 | pending | garden party
5 | completed | work on html
6 | deleted | need some rest
Credit #user80168
I agree with all other posters that say "don't do that" or "SQL isn't good at that". If you want to sort by some facet of comments then add another integer column to one of your tables to hold your sort criteria and sort by that value. eg "ORDER BY comments.sort DESC " If you want to sort these in a different order every time then... SQL won't be for you in this case.

How to use joins and averages together in Hive queries

I have two tables in hive:
Table1: uid,txid,amt,vendor Table2: uid,txid
Now I need to join the tables on txid which basically confirms a transaction is finally recorded. There will be some transactions which will be present only in Table1 and not in Table2.
I need to find out number of avg of transaction matches found per user(uid) per vendor. Then I need to find the avg of these averages by adding all the averages and divide them by the number of unique users per vendor.
Let's say I have the data:
Table1:
u1,120,44,vend1
u1,199,33,vend1
u1,100,23,vend1
u1,101,24,vend1
u2,200,34,vend1
u2,202,32,vend2
Table2:
u1,100
u1,101
u2,200
u2,202
Example For vendor vend1:
u1-> Avg transaction find rate = 2(matches found in both Tables,Table1 and Table2)/4(total occurrence in Table1) =0.5
u2 -> Avg transaction find rate = 1/1 = 1
Avg of avgs = 0.5+1(sum of avgs)/2(total unique users) = 0.75
Required output:
vend1,0.75
vend2,1
I can't seem to find count of both matches and occurrence in just Table1 in one hive query per user per vendor. I have reached to this query and can't find how to change it further.
SELECT A.vendor,A.uid,count(*) as totalmatchesperuser FROM Table1 A JOIN Table2 B ON A.uid = B.uid AND B.txid =A.txid group by vendor,A.uid
Any help would be great.
I think you are running into trouble with your JOIN. When you JOIN by txid and uid, you are losing the total number of uid's per group. If I were you I would assign a column of 1's to table2 and name the column something like success or transaction and do a LEFT OUTER JOIN. Then in your new table you will have a column with the number 1 in it if there was a completed transaction and NULL otherwise. You can then do a case statement to convert these NULLs to 0
Query:
select vendor
,(SUM(avg_uid) / COUNT(uid)) as avg_of_avgs
from (
select vendor
,uid
,AVG(complete) as avg_uid
from (
select uid
,txid
,amt
,vendor
,case when success is null then 0
else success
end as complete
from (
select A.*
,B.success
from table1 as A
LEFT OUTER JOIN table2 as B
ON B.txid = A.txid
) x
) y
group by vendor, uid
) z
group by vendor
Output:
vend1 0.75
vend2 1.0
B.success in line 17 is the column of 1's that I put int table2 before the JOIN. If you are curious about case statements in Hive you can find them here
Amazing and precise answer by GoBrewers14!! Thank you so much. I was looking at it from a wrong perspective.
I made little changes in the query to get things finally done.
I didn't need to add a "success" colummn to table2. I checked B.txid in the above query instead of B.success. B.txid will be null in case a match is not found and be some value if a match is found. That checks the success & failure conditions itself without adding a new column. And then I set NULL as 0 and !NULL as 1 in the part above it. Also I changed some variable names as hive was finding it ambiguous.
The final query looks like :
select vendr
,(SUM(avg_uid) / COUNT(usrid)) as avg_of_avgs
from (
select vendr
,usrid
,AVG(complete) as avg_uid
from (
select usrid
,txnid
,amnt
,vendr
,case when success is null then 0
else 1
end as complete
from (
select A.uid as usrid,A.vendor as vendr,A.amt as amnt,A.txid as txnid
,B.txid as success
from Table1 as A
LEFT OUTER JOIN Table2 as B
ON B.txid = A.txid
) x
) y
group by vendr, usrid
) z
group by vendr;

Resources