Duplicate values returned on SQL INNER JOIN - psql

I'm getting duplicate values returned when I do an inner join to the change_management table. it returns three records but I only want the most recent cmp.id.
SELECT
cmp.id,
cr.id,
coalesce(cmp.effort, 0.00) AS "Effort"
FROM
m_change_request cr
INNER JOIN (select max(id) as id, change_request_fk, effort from m_change_management group by id, change_request_fk, effort) as cmp ON cmp.change_request_fk = cr.id
WHERE
cr.release_fk=509
I need it to return the most recent record by max(cmd.id). Any ideas how I can fix this ?

Found the solution
SELECT
cmp.id,
cr.id,
cr.number AS "PSL #"
FROM
m_change_request cr
LEFT JOIN m_change_management cmp ON cr.id = cmp.change_request_fk
LEFT JOIN m_change_management cmp2 ON cr.id = cmp2.change_request_fk AND cmp.id < cmp2.id
WHERE
cr.release_fk=509 AND cmp2.change_request_fk IS NULL

Related

Hive-How to join tables with OR clause in ON statement

I've got the following problem. In my oracle db I have query as follows:
select * from table1 t1
inner join table2 t2 on
(t1.id_1= t2.id_1 or t1.id_2 = t2.id_2)
and it works perfectly.
Nowadays I need to re-write query on hive. I've seen that OR clause doesn't work in JOINS in hive (error warning : 'OR not supported in JOIN').
Is there any workaround for this except splitting query between two separate and union them?
Another way is to union two joins, e.g.,
select * from table1 t1
inner join table2 t2 on
(t1.id_1= t2.id_1)
union all
select * from table1 t1
inner join table2 t2 on
(t1.id_2 = t2.id_2)
Hive does not support non-equi joins. Common approach is to move join ON condition to the WHERE clause. In the worst case it will be the CROSS JOIN + WHERE filter, like this:
select *
from table1 t1
cross join table2 t2
where (t1.id_1= t2.id_1 or t1.id_2 = t2.id_2)
It may work slow because of rows multiplication by CROSS JOIN.
You can try to do two LEFT joins instead of CROSS and filter out cases when both conditions are false (like INNER JOIN in your query). This may perform faster than cross join because will not multiply all the rows. Also columns selected from second table can be calculated using NVL() or coalesce().
select t1.*,
nvl(t2.col1, t3.col1) as t2_col1, --take from t2, if NULL, take from t3
... calculate all other columns from second table in the same way
from table1 t1
left join table2 t2 on t1.id_1= t2.id_1
left join table2 t3 on t1.id_2 = t3.id_2
where (t1.id_1= t2.id_1 OR t1.id_2 = t3.id_2) --Only joined records allowed likke in your INNER join
As you asked, no UNION is necessary.

LEFT JOIN using _PARTITIONDATE

I'm currently using StandardSQL in BigQuery, I tried to join two sets of table one of which is a pseudo-column table partitioned by day.
I tried to use this query below:
SELECT
DISTINCT DATE(create_time) AS date,
user_id,
city_name,
transaction_id,
price
FROM
table_1 a
LEFT JOIN (SELECT user_id, city_name FROM table_2) b
ON (a.user_id = b.user_id AND DATE(create_time) = _PARTITIONDATE)
I've tried this kind of JOIN (using _PARTITIONDATE) and worked out, but for this particular query I got an error message:
Unrecognized name: _PARTITIONDATE
Can anyone tell me why this happened, and how could I solve this? Thanks in advance.
The issue is that you are not selecting the _PARTITIONDATE field from table_2 when joining it so it can't recognize it:
SELECT user_id, city_name FROM table_2
In order to solve it you can add it as follows:
SELECT
DISTINCT DATE(create_time) AS date,
user_id,
city_name,
transaction_id,
price
FROM
table_1 a
LEFT JOIN (SELECT _PARTITIONDATE AS pd, user_id, city_name FROM table_2) b
ON (a.user_id = b.user_id AND DATE(create_time) = pd)
Note that you'll need an alias such as pd as it's a pseudocolumn
Probably it was working in the past if you were joining two tables directly such as in (you don't get selectivity benefits in that case):
FROM
table_1 a
LEFT JOIN table_2 b
ON (a.user_id = b.user_id AND DATE(create_time) = _PARTITIONDATE)

how to select multiple distinct columns from a partitioned table in DolphinDB database

I have a partitioned table in DolphinDB database. Two of the columns are symbol and name. Each symbol corresponds to a unique name. I need to select distinct symbol and name from the partitioned table.
I used the following script
t0 = select distinct(Symbol) as Symbol from t order by Symbol
t0 = select Symbol, Name from lj(t0, t, `Symbol)
but got the following error message:
execution was completed with exception
A regular left table can't perform left join (lj), sorted left join (slj), full join (fj), asof join (aj), or window join (pwj, wj) with another distributed or segmented table.
For your case you can get around the problem with equal join. In DolphinDB equal join can be conducted between a regular table and a partitioned table.
t0 = select distinct(Symbol) as Symbol from t order by Symbol
t0 = select Symbol, Name from ej(t0, t, `Symbol)
t0 = select * from t0 where prev(Symbol) ne Symbol

PSQL - Select size of tables for both partitioned and normal

Thanks in advance for any help with this, it is highly appreciated.
So, basically, I have a Greenplum database and I am wanting to select the table size for the top 10 largest tables. This isn't a problem using the below:
select
sotaidschemaname schema_name
,sotaidtablename table_name
,pg_size_pretty(sotaidtablesize) table_size
from gp_toolkit.gp_size_of_table_and_indexes_disk
order by 3 desc
limit 10
;
However I have several partitioned tables in my database and these show up with the above sql as all their 'child tables' split up into small fragments (though I know they accumalate to make the largest 2 tables). Is there a way of making a script that selects tables (partitioned or otherwise) and their total size?
Note: I'd be happy to include some sort of join where I specify the partitoned table-name specifically as there are only 2 partitioned tables. However, I would still need to take the top 10 (where I cannot assume the partitioned table(s) are up there) and I cannot specify any other table names since there are near a thousand of them.
Thanks again,
Vinny.
Your friends would be pg_relation_size() function for getting relation size and you would select pg_class, pg_namespace and pg_partition joining them together like this:
select schemaname,
tablename,
sum(size_mb) as size_mb,
sum(num_partitions) as num_partitions
from (
select coalesce(p.schemaname, n.nspname) as schemaname,
coalesce(p.tablename, c.relname) as tablename,
1 as num_partitions,
pg_relation_size(n.nspname || '.' || c.relname)/1000000. as size_mb
from pg_class as c
inner join pg_namespace as n on c.relnamespace = n.oid
left join pg_partitions as p on c.relname = p.partitiontablename and n.nspname = p.partitionschemaname
) as q
group by 1, 2
order by 3 desc
limit 10;
select * from
(
select schemaname,tablename,
pg_relation_size(schemaname||'.'||tablename) as Size_In_Bytes
from pg_tables
where schemaname||'.'||tablename not in (select schemaname||'.'||partitiontablename from pg_partitions)
and schemaname||'.'||tablename not in (select distinct schemaname||'.'||tablename from pg_partitions )
union all
select schemaname,tablename,
sum(pg_relation_size(schemaname||'.'||partitiontablename)) as Size_In_Bytes
from pg_partitions
group by 1,2) as foo
where Size_In_Bytes >= '0' order by 3 desc;

Error in Hive Query while joining tables

I am unable to pass the equality check using the below HIVE query.
I have 3 table and i want to join these table. I trying as below, but get error :
FAILED: Error in semantic analysis: Line 3:40 Both left and right aliases encountered in JOIN 'visit_date'
select t1.*, t99.* from table1 t1 JOIN
(select v3.*, t3.* from table2 v3 JOIN table3 t3 ON
( v3.AS_upc= t3.upc_no AND v3.start_dt <= t3.visit_date AND v3.end_dt >= t3.visit_date AND v3.adv_price <= t3.comp_price ) ) t99 ON
(t1.comp_store_id = t99.cpnumber AND t1.AS_store_nbr = t99.store_no);
EDITED based on help from FuzzyTree:
1st:
We tried to edit above query using between and where clause, but not getting any output from the query.
But If we changed the above query by removing the between clause with date, then I got some output based on "v3.adv_price <= t3.comp_price", but not using "date filter".
select t1.*, t99.* from table1 t1 JOIN
(select v3.*, t3.* from table2 v3 JOIN table3 t3 on (v3.AS_upc= t3.upc_no)
where v3.adv_price <= t3.comp_price
) t99 ON
(t1.comp_store_id = t99.cpnumber AND t1.AS_store_nbr = t99.store_no);
2nd :
Next we tried to pass only one date as :
select t1.*, t99.* from table1 t1 JOIN
(select v3.*, t3.* from table2 v3 JOIN table3 t3 on (v3.AS_upc= t3.upc_no)
where v3.adv_price <= t3.comp_price and v3.start_dt <= t3.visit_date
) t99 ON
(t1.comp_store_id = t99.cpnumber AND t1.AS_store_nbr = t99.store_no);
So, now it's showing some result but if we pass both the start and end date filter, it; not showing any result.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins
Only equality joins, outer joins, and left semi joins are supported in
Hive. Hive does not support join conditions that are not equality
conditions as it is very difficult to express such conditions as a
map/reduce job.
Try moving your inequalities to the where clause
select t1.*, t99.* from table1 t1 JOIN
(select v3.*, t3.* from table2 v3 JOIN table3 t3 on (v3.AS_upc= t3.upc_no)
where t3.visit_date between v3.start_dt and v3.end_dt
and v3.adv_price <= t3.comp_price
) t99 ON
(t1.comp_store_id = t99.cpnumber AND t1.AS_store_nbr = t99.store_no);

Resources