I have similar query:
select
a,
b,
c,
d,
e,
(select somevalue from differenttable where report.a = two.a) as two,
(select somevalue from differenttable2 where report.a = three.a) as three,
(select sum(kol*CAST(price AS BIGINT)) from "otherreport"('01.01.2010', '01.10.2010') as sum
from "report"('01.01.2010', '01.10.2010')
The query works in its original form, but the problem is that in some of the databases(old versions) that i query some of the fields a, b, c, d, e are missing in the result of 'report' stored procedure and i get an error.What would be the best way to handle this ?
I was thinking of select * from reports, but i can not add the subqueries. May be i can not find the right syntaxis to do this.Other way would be to check if the columns exists before
making the select...
There are several things you can do:
Use an aliased SELECT * followed by the subqueries. Instead of doing a simple SELECT * you alias the selectable stored procedure and use myAlias.*:
SELECT
r.*,
(select somevalue from differenttable where report.a = two.a) as two,
...
from "report"('01.01.2010', '01.10.2010') r
Note that this has its own downsides, like moving the "missing columns" problem from the query to the consumer of the query.
Upgrade your databases so that the stored procedure matches your expectations.
Dynamically build the query from the procedure metadata. This is pretty complex, and I currently don't have time to give a full example of this.
Related
I am trying to build bigquery stored procedure where I need to pass the table name as a parameter. My code is:
CREATE OR REPLACE PROCEDURE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES` (table_name STRING)
BEGIN
----step 1
CREATE OR REPLACE TABLE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES_01` AS
SELECT DISTINCT XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT
FROM table_name XX
LEFT JOIN
(
SELECT DISTINCT HH_ID, ECR_PRTY_ID
FROM `analytics-mkt-cleanroom.Master.EDW_ECR_ECR_MAPPING`
WHERE HH_ID NOT LIKE 'U%'
AND ECR_PRTY_ID IS NOT NULL
)A
ON XX.HH_ID = A.HH_ID----one (HH) to many (ecr)
;
END;
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES(`analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample`);
I followed a couple of similar questions here and here, tried writing an EXECUTE IMMEDIATE version of the above but not able to work out the right syntax.
I think issue is; the SELECT statement in my code is selecting multiple columns XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT and the EXECUTIVE IMMEDIATE setup is meant to work only for one column. But I'm not sure. Please advise. Thank you.
I am basically trying to write stored procedures for data pipeline building.
Hope below is helpful.
pass a parameter as a string.
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES(`analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample`);
-->
CALL MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES('analytics-mkt-cleanroom.MKT_DS.Home_Services_Multi_Class_Aesthetic_Baseline_Final_Training_Sample');
use EXECUTE IMMEDIATE since a table name can't be parameterized as a variable in a query.
----step 1
EXECUTE IMMEDIATE FORMAT("""
CREATE OR REPLACE TABLE `MKT_DS.PXV2DWY_CREATE_PROPERTY_FEATURES_01` AS
SELECT DISTINCT XX.HH_ID, A.ECR_PRTY_ID, XX.ANCHOR_DT
FROM `%s` XX
LEFT JOIN
(
SELECT DISTINCT HH_ID, ECR_PRTY_ID
FROM `analytics-mkt-cleanroom.Master.EDW_ECR_ECR_MAPPING`
WHERE HH_ID NOT LIKE 'U%%'
AND ECR_PRTY_ID IS NOT NULL
)A
ON XX.HH_ID = A.HH_ID----one (HH) to many (ecr)
;
""", table_name);
escape % in a format string with additional %
LIKE 'U%'
-->
LIKE 'U%%'
see PARSE_DATE not working in FORMAT() in BigQuery
I have a select that return a single column, using as where clause 4 columns, and I created a specific index for this query. When I do this select using Dbeaver, the result return in 30~50 milliseconds.
SELECT
column1
FROM
myTable
WHERE
column2 = a
AND column3 = b
AND (column4 = c AND column5 = d )
FOR READ ONLY WITH UR;
Now I created a procedure with this same simple select. The proc declare a P1, a 'cursor with return', do the select, open the cursor and close the P1. When I use the same dbeaver connection the result is returning between 1.2 ~ 1.6 seconds.
CREATE PROCEDURE myProc (
IN a DECIMAL(10),
IN b DECIMAL(10),
IN c DECIMAL(10),
IN d DECIMAL(10)
)
LANGUAGE SQL
DYNAMIC RESULT SETS 1
SPECIFIC myProc
P1: BEGIN
DECLARE cursor1 CURSOR WITH RETURN for
SELECT
column1
FROM
myTable
WHERE
column2 = a
AND column3 = b
AND (column4 = c AND column5 = d )
FOR READ ONLY WITH UR;
OPEN cursor1;
END P1
Is this huge return difference correct? If wrong, Is there something wrong in my procedure that justifies this return time difference? Or could be something wrong in the DB configuration or the server that justifies this difference, something like few resources in the DB server or something like the proc not using the index( I don't know if procs in the DB2 use the index by default like the queries )?
I am new in DB2 and new in procedures creation. Thank you in advance.
Best regards,
Luis
I don't know if is the best way, but I solved the problem using a solution that I read for SQL Server. The solution is create local variables that will receive the parameters values, and use the variable in the queries, to guarantee always the best execution plan. I didn't knew if this was valid to DB2, but my proc now has almost the same response time compared to query. Worked!
Link of SQL Server post: SQL Server: Query fast, but slow from procedure
In this link above a user called #Jake give this explanation:
"The reason this happens is because the procedures query plan is being cached, along with the parameters that were passed to it. On subsequent calls, this query plan generated will be reused with new parameters. This can cause problems because if the data is unevenly distributed, one parameter can generate a sub-optimal plan vs. another. Using local variables essentially does the same as OPTIMIZE FOR UNKNOWN because local variables cannot be sniffed."
I think that is the same for DB2, because worked. After I change these old procedures to use local variables my execution plan begun to use the indexes recently created
According to Hive's documentation it supports NOT IN subqueries in a WHERE clause, provided that the subquery is an uncorrelated subquery (does not reference columns from the main query).
However, when I attempt to run the trivial query below, I get an error FAILED: SemanticException Cartesian products are disabled for safety reasons.
-- sample data
CREATE TEMPORARY TABLE foods (name STRING);
CREATE TEMPORARY TABLE vegetables (name STRING);
INSERT INTO foods VALUES ('steak'), ('eggs'), ('celery'), ('onion'), ('carrot');
INSERT INTO vegetables VALUES ('celery'), ('onion'), ('carrot');
-- the problematic query
SELECT *
FROM foods
WHERE foods.name NOT IN (SELECT vegetables.name FROM vegetables)
Note that if I use an IN clause instead of a NOT IN clause, it actually works fine, which is perplexing because the query evaluation structure should be the same in either case.
Is there a workaround for this, or another way to filter values from a query based on their presence in another table?
This is Hive 2.3.4 btw, running on an Amazon EMR cluster.
Not sure why you would get that error. One work around is to use not exists.
SELECT f.*
FROM foods f
WHERE NOT EXISTS (SELECT 1
FROM vegetables v
WHERE v.name = f.name)
or a left join
SELECT f.*
FROM foods f
LEFT JOIN vegetables v ON v.name = f.name
WHERE v.name is NULL
You got cartesian join because this is what Hive does in this case. vegetables table is very small (just one row) and it is being broadcasted to perform the cross (most probably map-join, check the plan) join. Hive does cross (map) join first and then applies filter. Explicit left join syntax with filter as #VamsiPrabhala said will force to perform left join, but in this case it works the same, because the table is very small and CROSS JOIN does not multiply rows.
Execute EXPLAIN on your query and you will see what is exactly happening.
Folks,
I've had a pretty thorough search before posting and couldn't see this answered anywhere previously. Perhaps it isn't possible.... I'm using SQL server 2008 R2
Anyway, thanks in advance for looking/helping.
I have two tables that I'd like to join.
Table1 (t1):
Account------Name--------Amount
12345-------account1-----10000.00
12346-------account2-----20000.00
Table2 (t2):
ID-----Account---extraData
10-----12345-----ZZ100
20-----12345-----ZZ250
30-----12345-----ZZ400
10-----12346-----ZZ150
20-----12346-----ZZ200
I'm trying to return the following from the above tables:
t1.Account---t1.Name------ID1(t2.ID=10)---ID2(td.ID=20)----SUM(Amount)
12345--------account1-------ZZ100------------ZZ250-------------10000.00
12346--------account2-------ZZ150------------ZZ200-------------20000.00
I have tried various joins of sorts and a union, but can't seem to get the results above. Most result in either nothing, or the Amount column returning as double the required result.
My starting point is:
Select t1.Account, t1.Name, t2A.extraData, t2B.extraData, SUM(t1.AMOUNT)
from table1 t1
join table2 t2A on t1.Account = t2A.Account and t2A.ID = '10'
join table2 t2B on t1.Account = t2B.Account and t2B.ID = '20'
Group by t1.Account, t1.Name, t2A.extraData, t2B.extraData
I've reduced the code and complexity of the query for this thread, but the problem is as above. I have no control over the table structure as they form part of an accounting system that I can't amend (I could, but I'd upset one or two people!).
Hopefully I've explained the issue clearly enough. It seems like it should be simple, but I can't seem to fathom it - perhaps I've just been staring too long. Anyway, thanks in advance for your assistance.
Edit: to change the code to reflect the first response highlighting a mistake in my posting.
Please try this. I think this helps you to achieve your result.
DECLARE #ids varchar(max)
SELECT #ids=STUFF((SELECT DISTINCT ', [' + CAST(ID AS VARCHAR(10))+']'
FROM t2
FOR XML PATH(''), TYPE)
.value('.','NVARCHAR(MAX)'),1,2,' ')
SELECT #ids
EXECUTE ('SELECT
Account,Name,'+#ids+',Amount
FROM
(SELECT t1.Account,Name,ID,ExtraData,SUM(Amount) AS Amount
FROM t1 t1 INNER JOIN t2 t2 ON t1.Account=t2.Account
GROUP BY t1.Account,Name,ID,ExtraData) AS SourceTable
PIVOT
(
MAX(ExtraData)
FOR ID IN ('+#ids+')
) AS PivotTable;')
I have two datasets DS1 and DS2. DS1 is 100,000rows x 40cols, DS2 is 20,000rows x 20cols. I actually need to pull COL1 from DS1 if some fields match DS2.
Since I am very-very new to SAS, I am trying to stick to SQL logic.
So basically I did (shot version)
proc sql;
...
SELECT DS1.col1
FROM DS1 INNER JOIN DS2
on DS1.COL2=DS2.COL3
OR DS1.COL3=DS2.COL3
OR DS1.COL4=DS2.COL2
...
After an hour or so, it was still running, but I was getting emails from SAS that I am using 700gb or so. Is there a better and faster SAS-way of doing this operation?
I would use 3 separate queries and use a UNION
proc sql;
...
SELECT DS1.col1
FROM DS1 INNER JOIN DS2
on DS1.COL2=DS2.COL3
UNION
SELECT DS1.col1
FROM DS1 INNER JOIN DS2
On DS1.COL3=DS2.COL3
UNION
SELECT DS1.col1
FROM DS1 INNER JOIN DS2
ON DS1.COL4=DS2.COL2
...
You may have null or blank values in the columns you are joining on. Your query is probably matching all the null/blank columns together resulting in a very large result set.
I suggest adding additional clauses to exclude null results.
Also - if the same row happens to exist in both tables, then you should also prevent the row from joining to itself.
Either of these could effectively result in a cartesian product join (or something close to a cartesian product join).
EDIT : By the way - a good way of debugging this type of problem is to limit both datasets to a certain number of rows - say 100 in each - and then running it and checking the output to make sure it's expected. You can do this using the SQL options inobs=, outobs=, and loops=. Here's a link to the documentation.
First sort the datasets that you are trying to merge using proc sort. Then merge the datasets based on id.
Here is how you can do it.
I have assumed you match field as ID
proc sort data=DS1;
by ID;
proc sort data=DS2;
by ID;
data out;
merge DS1 DS2;
by ID;
run;
You can use proc sort for Ds3 and DS4 and then include them in merge statement if you need to join them as well.