TERADATA Compare tables for matches - comparison

I have a question in relation to comparison between tables.
I want to compare data from the same table with different filter conditions:
First Version:
select *
from PPT_TIER4_FTSRB.AUTO_SOURCE_ACCOUNT
WHERE BUSINESS_DATE = DATE '2022-05-31'
AND GRAIN ='ACCOUNT'
AND LAYER = 'IDL'
AND SOURCE_CD = 'MTMB'
Second Version:
select *
from PPT_TIER4_FTSRB.AUTO_SOURCE_ACCOUNT
WHERE BUSINESS_DATE = DATE '2022-05-31'
AND GRAIN ='ACCOUNT'
AND LAYER = 'ACQ'
AND SOURCE_CD = 'MTMB'
As you can see the only difference between the two is the LAYER = IDL in first version and ACQ
I wanted to see which records match betweeen the two excluding the column Layer(Because they would always be different.
I tried to do an inner join, but it keeps running for very long:
SELECT *
FROM
( select *
from PPT_TIER4_FTSRB.AUTO_SOURCE_ACCOUNT
WHERE BUSINESS_DATE = DATE '2022-05-31'
AND GRAIN ='ACCOUNT'
AND LAYER = 'IDL'
AND SOURCE_CD = 'MTMB'
) A
INNER JOIN
( select *
from PPT_TIER4_FTSRB.AUTO_SOURCE_ACCOUNT
WHERE BUSINESS_DATE = DATE '2022-05-31'
AND GRAIN ='ACCOUNT'
AND LAYER = 'ACQ'
AND SOURCE_CD = 'MTMB'
) B
ON A.BUSINESS_DATE = B.BUSINESS_DATE
AND A.GRAIN =B.GRAIN
AND A.SOURCE_CD = B.SOURCE_CD

This is because a join for your purposes would need a 1:1 relationship between the rows being joined. You don't appear to have that, and haven't given any example data for us to derive one.
For example:
sample 1 has rows 1, 2, 3
sample 2 has rows a, b, c
your results give 1a,1b,1c,2a,2b,2c,3a,3b,3c
That's effectively a CROSS JOIN, which happens because the columns you're joining on are always the same on every row.
My advice would be to select all the rows in question and Sort them. Then visually see if there are any patterns you want to analyse with joins or aggregates...
SELECT *
FROM ppt_tier4_ftsrb.auto_source_account
WHERE business_date = DATE '2022-05-31'
AND grain ='ACCOUNT'
AND layer IN ('ACQ', 'IDL')
AND source_cd = 'MTMB'
ORDER BY layer, and, some, other, columns

Related

I can not upload data to a VirtualStringTree

I have a form with a TPageControl that has two tabs. In each of them there is a TVirtualStringTree and I have defined these two structures:
typedef struct tagTTreeMun
{
AnsiString Municipio;
int Padron;
int Censo;
double Relacion;
int Codigo;
} TTreeMun, *PTreeMun;
typedef struct tagTTreePro
{
AnsiString Proceso;
int Padron;
int Censo;
double Relacion;
int Codigo;
}TTreePro, *PTreePro;
I know: they are almost the same; then I explain it. The first one is loaded from four nested querys and it does it without any problem, but the second one ... There's no way!
To load this second I need two querys:
SELECT DISTINCT Date FROM Elections ORDER BY Date DESC
that field Date contains only the year and runs without any problem.
SELECT A.Codigo, B.Name, SUM (C.Padron) Padron, SUM (C.Censo) Census, A.Closed
FROM Elections A, Process B, HisElec C
WHERE A.CodPrv = (SELECT Literal FROM Installation WHERE Label = 'Province')
AND A.CodPrv = B.CodPrv AND B.Codigo = A.Process AND A.Closed = 1
AND A.CodPrv = C.CodPrv AND A.Codigo = C.Election
AND A. Date =: Date
GROUP BY 1, 2, 5
UNION
SELECT A.Codigo, B.Name, SUM (C.Padron) Padron,
(SELECT SUM (Census) FROM Tables WHERE CodPrv = (SELECT Literal FROM Installation WHERE Label = 'Province')) Census,
A.Closed
FROM Elections A, Process B, Dl01 C
WHERE A.CodPrv = (SELECT Literal FROM Installation WHERE Label = 'Province')
AND A.CodPrv = B.CodPrv AND A.Process = B.Code AND A.Closed = 0
AND A. Date =: Date
GROUP BY 1, 2, 5
ORDER BY 1 DESC, 3
It also runs without problems or errors. The problem comes when trying to pass that data to the corresponding TVirtualStringTree.
PTreePro DatPro;
PVirtualNode Node1, Node2, Node3, Node4;
LisPro->NodeDataSize = sizeof (TTreePro);
LisPro->BeginUpdate ();
LisPro->Clear ();
for (;! qTemp1->Eof; qTemp1->Next ())
{
Node1 = LisPro->AddChild (NULL);
DatPro = (PTreePro) LisPro->GetNodeData (Node1);
DatPro->Process = IntToStr(qTemp1->FieldByName ("Date")->AsInteger);
qTemp2->Close ();
qTemp2->ParamByName ("Date")->AsInteger = qTemp1->FieldByName("Date")->AsInteger;
qTemp2->Open ();
for (;! qTemp1->Eof; qTemp1->Next())
{
Node2 = LisPro->AddChild(Node1);
DatPro = (PTreePro)LisPro->GetNodeData(Node2);
DatPro->Process = qTemp1->FieldByName("Name")
[...]
}
}
When trying to create that Node1 in this query, the lines Node1 = ... and DatPro = (PTreePro) ... are executed without major problem, except that Node1, after running the AddChild, has a NULL value and therefore, from then on, it can only give an error because when trying to give value to Process, it automatically gives an execution error.
I have tried putting the load of each of the trees in a different function, by isolating code; I have tried with the same structure (in the end they are identical) or as in the example with two structures, to change the order of execution. For more tests that I have done I am not able to load both trees, in LisPro I ALWAYS do the same to me.

PSQL query to comapre data from two instances of a database

I have two instances of the same database from different days. All tables from one day are called tableA* and from the other tableB*. I would like to compare data to see what have changed. I would like to select all rows that don't match exactly. So for example if one value is different in tables tableA1 and tableB1 I would like to select a corresponding row from table A and mark it as 'new' and from table B and mark it as 'deleted'. I tried with a query like this:
SELECT 'new', ta1.name, ta2.name, ta3.name, ta4.name, ta5.name
FROM tableA1 ta1
LEFT JOIN tableA2 ta2 ON ta1.ta2_id = ta2.id
LEFT JOIN tableA3 ta3 ON ta1.ta3_id = ta3.id
LEFT JOIN tableA4 ta4 ON ta1.ta4_id = ta4.id
LEFT JOIN tableA5 ta5 ON ta5.ta1_id = ta1.id WHERE NOT EXISTS
(SELECT tb1.name, tb2.name, tb3.name, tb4.name, tb5.name
FROM tableB1 tb1
LEFT JOIN tableB2 tb2 ON tb1.tb2_id = tb2.id
LEFT JOIN tableB3 tb3 ON tb1.tb3_id = tb3.id
LEFT JOIN tableB4 tb4 ON tb1.tb4_id = tb4.id
LEFT JOIN tableB5 tb5 ON tb5.tb1_id = tb1.id WHERE
tb1.name = ta1.name AND
tb2.name = ta2.name AND
tb3.name = ta3.name AND
tb4.name = ta4.name AND
tb5.name = ta5.name)
UNION
SELECT 'deleted', tb1.name, tb2.name, tb3.name, tb4.name, tb5.name
FROM tableB1 tb1
LEFT JOIN tableB2 tb2 ON tb1.tb2_id = tb2.id
LEFT JOIN tableB3 tb3 ON tb1.tb3_id = tb3.id
LEFT JOIN tableB4 tb4 ON tb1.tb4_id = tb4.id
LEFT JOIN tableB5 tb5 ON tb5.tb1_id = tb1.id WHERE NOT EXISTS
(SELECT ta1.name, ta2.name, ta3.name, ta4.name, ta5.name
FROM tableA1 ta1
LEFT JOIN tableA2 ta2 ON ta1.ta2_id = ta2.id
LEFT JOIN tableA3 ta3 ON ta1.ta3_id = ta3.id
LEFT JOIN tableA4 ta4 ON ta1.ta4_id = ta4.id
LEFT JOIN tableA5 ta5 ON ta5.ta1_id = ta1.id WHERE
tb1.name = ta1.name AND
tb2.name = ta2.name AND
tb3.name = ta3.name AND
tb4.name = ta4.name AND
tb5.name = ta5.name)
Hoping that if I created the same stuructre and compare all the values I would get the anticipated result. Even if databases are the same I get a lot row selected.
Found the problem. When comparing two NULL values the result is FALSE, the query itself should be fine. So I should have added conditions to check whether values are NULL.

Multipy after joining data in PIG

I am trying to multiply two fields and take their sum after joining three tables in Pig. However I keep on getting this error:
<file loyalty_program.pig, line 30, column 74> (Name: Multiply Type: null Uid: null)incompatible types in Multiply Operator left hand side:bag :tuple(new_details1::new_details::potential_customers::num_of_orders:long) right hand side:bag :tuple(products::price:int)
-- load the data sets
orders = LOAD '/dualcore/orders' AS (order_id:int,
cust_id:int,
order_dtm:chararray);
details = LOAD '/dualcore/order_details' AS (order_id:int,
prod_id:int);
products = LOAD '/dualcore/products' AS (prod_id:int,
brand:chararray,
name:chararray,
price:int,
cost:int,
shipping_wt:int);
recent = FILTER orders by order_dtm matches '2012-.*$';
customer = GROUP recent by cust_id;
cust_orders = FOREACH customer GENERATE group as cust_id, (int)COUNT(recent) as num_of_orders;
potential_customers = FILTER cust_orders by num_of_orders>=5;
new_details = join potential_customers by cust_id, recent by cust_id;
new_details1 = join new_details by order_id, details by order_id;
new_details2 = join new_details1 by prod_id, products by prod_id;
--DESCRIBE new_details2;
final_details = FOREACH new_details2 GENERATE potential_customers::cust_id, potential_customers::num_of_orders as num_of_orders,recent::order_id as order_id,recent::order_dtm,details::prod_id,products::brand,products::name,products::price as price,products::cost,products::shipping_wt;
grouped_data = GROUP final_details by cust_id;
member = FOREACH grouped_data GENERATE SUM(final_details.num_of_orders * final_details.price) ;
lim = limit member 10;
dump lim;
I even casted the result of count to int. It still keeps on throwing this error at me. I have no clue how to go about it.
Ok.. I think at first, you want to multiply no.of purchases with the price of each product and then you need total SUM of that multiplied value..
Even though this is a strange requirement, but you can go with below approach..
All you need to do is calculate the multiplication in final_details Foreach statement itself and simply apply the SUM for that multiplied amount..
Based on your load statements I created the below input files
main_orders.txt
6666,100,2012-01-01
7777,101,2012-09-02
8888,100,2012-01-09
9999,101,2012-12-08
6666,101,2012-09-02
9999,100,2012-07-12
9999,100,2012-08-01
6666,100,2012-01-02
7777,100,2012-09-09
orders_details.txt
6666,6000
7777,7000
8888,8000
9999,9000
main_products.txt
6000,Nike,Shoes,3000,3000,1
7000,Adidas,Cap,1000,1000,1
8000,Rebook,Shoes,4000,4000,1
9000,Puma,Shoes,25000,2500,1
Below is the code
orders = LOAD '/user/cloudera/inputfiles/main_orders.txt' USING PigStorage(',') AS (order_id:int,cust_id:int,order_dtm:chararray);
details = LOAD '/user/cloudera/inputfiles/orders_details.txt' USING PigStorage(',') AS (order_id:int,prod_id:int);
products = LOAD '/user/cloudera/inputfiles/main_products.txt' USING PigStorage(',') AS(prod_id:int,brand:chararray,name:chararray,price:int,cost:int,shipping_wt:int);
recent = FILTER orders by order_dtm matches '2012-.*';
customer = GROUP recent by cust_id;
cust_orders = FOREACH customer GENERATE group as cust_id, (int)COUNT(recent) as num_of_orders;
potential_customers = FILTER cust_orders by num_of_orders>=5;
new_details = join potential_customers by cust_id, recent by cust_id;
new_details1 = join new_details by order_id, details by order_id;
new_details2 = join new_details1 by prod_id, products by prod_id;
DESCRIBE new_details2;
final_details = FOREACH new_details2 GENERATE potential_customers::cust_id, potential_customers::num_of_orders as num_of_orders,recent::order_id as order_id,recent::order_dtm,details::prod_id,products::brand,products::name,products::price as price,products::cost,products::shipping_wt, (potential_customers::num_of_orders * products::price ) as multiplied_price;// multiplication is achived in last variable
dump final_details;
grouped_data = GROUP final_details by cust_id;
member = FOREACH grouped_data GENERATE SUM(final_details.multiplied_price) ;
lim = limit member 10;
dump lim;
Just for clarity I am dumping the output of final_details foreach statement as well.
(100,6,6666,2012-01-01,6000,Nike,Shoes,3000,3000,1,18000)
(100,6,6666,2012-01-02,6000,Nike,Shoes,3000,3000,1,18000)
(100,6,7777,2012-09-09,7000,Adidas,Cap,1000,1000,1,6000)
(100,6,8888,2012-01-09,8000,Rebook,Shoes,4000,4000,1,24000)
(100,6,9999,2012-07-12,9000,Puma,Shoes,25000,2500,1,150000)
(100,6,9999,2012-08-01,9000,Puma,Shoes,25000,2500,1,150000)
final output is below
(366000)
This code may help you, but Please clarify your requirement again

How can this SQL subquery be expressed using Squeel/ActiveRecord?

I'm having a bit of brain fade today and can't figure out how I should express this SQL query correctly using ActiveRecord/Squeel/ARel:
SELECT `d1`.* FROM `domain_names` d1
WHERE `d1`.`created_at` = (
SELECT MAX(`d2`.`created_at`)
FROM `domain_names` d2
WHERE `d2`.`owner_type` = `d1`.`owner_type`
AND `d2`.`owner_id` = `d1`.`owner_id`
AND `d2`.`key` = `d1`.`key`
)
Any ideas?
Background: The DomainName model has a polymorphic owner as well as a "key" field that allows owners to have many different types of domain name. The query above fetches the latest domain name for each unique [owner_type, owner_id, key] tuple.
Edit:
Here's the same query using JOIN:
SELECT `d1`.* FROM `domain_names` d1
JOIN (
SELECT `owner_type`, `owner_id`, `key`, MAX(`created_at`) max_created_at
FROM `domain_names`
GROUP BY `owner_type`, `owner_id`, `key`
) d2
ON `d2`.`owner_type` = `d1`.`owner_type`
AND `d2`.`owner_id` = `d1`.`owner_id`
AND `d2`.`key` = `d1`.`key`
WHERE `d1`.`created_at` = `d2`.`max_created_at`

JPQL join tables doubles column names

I have multiple tables in a join and every table has a column ID. So in the resultig join there are a lot of ID columns. How I can access a specific ID column with the criteria API?
ParameterExpression<A> idParam = criteriaBuilder.parameter(A.class, PARAM_NAME);
Subquery<B> sq = query.subquery(B.class);
Root<B> root = sq.from(B.class);
Join<C, D> joinTogether = root.join("memberX").join("memberY");
sq.select(root);
sq.where(criteriaBuilder.and(criteriaBuilder.equal(joinTogether.get("id"), idParam), criteriaBuilder.equal(parentQuery.get("id"), root.get("id"))));
The problem is, that in the resulting SQL contains
SELECT 1 FROM E t6, B t5, C t4, D t3 WHERE ((( = paramName) AND (t0.ID = t5.ID)) AND (((t6.memberZ = t5.ID) AND (t4.ID = t6.memberX)) AND (t3.ID = t4.memberY))))
The table E (t6) is an additional join table between table B and C, t0 is the reference to the parent query. Instead t3.id = :paramName EclipseLink creates nothing just before the first equal-sign (paramName is the content of the constant PARAM_NAME). My idea is, that the "id" column could reference all tables and EclipseLink can not decide, which table I mean.
How I can change that?
Thank you
André

Resources