I'm working on a project that uses SQLAnywhere and I found this Query:
update MY_TABLE table1
set table1.column1 = table3.id
from MY_TABLE table2, MY_OTHER_TABLE table3
where table2.some_col = table3.some_col and table2.other_col is null;
The problem is, that table1 which is updated and table2/table2 which are joined do not have any link, no constraint. Table1 is completely independent from the other two.
So as far as I can understand it, if the condition in the last line is met for at least one row, then ALL rows of table1 will be updated because then the join-statement is always true.
Am I right or am I missing something?
Short answer: Yes. I agree ;)
Long answer: It really looks like that there is no connection between table 1 and tables 2 and 3. So based on your input above, I'd expect the mentioned behaviour.
Also I'd remove the implicit JOIN here, as it might causes confusion.
Related
Let me begin by apologizing for what may have been a confusing title. I an just beginning my data analyst journey. I am working in BIGQUERY with a Extreme Storm dataset (TABLE1) that has fields for LAT,LONG, and STATE. There are null values in the latitude and longitude fields that I want to replace with general LAT/LONG values from a State Information dataset(TABLE2) also containing LAT,LONG and STATE values. In TABLE1 each record is given a unique EVENT_ID and there are 1.4m rows. In TABLE2 each STATE is a unique record.
I've tried:
Update TABLE1
SET TABLE1.BEGIN_LAT=TABLE2.latitude
From TABLE1
INNER JOIN TABLE2
ON TABLE1.STATE = TABLE2.STATE
WHERE TABLE1.BEGIN_LAT IS NULL
I am getting an error because TABLE1 contains multiple rows with the same STATE and I am trying to use it as my primary key. I know what I am doing wrong but can't figure out how to do it the correct way. Is what I am trying to do possible in BigQuery?
Any help would be appreciated. Even advice on how to ask questions! :)
Thank you.
I believe you have in your query some alias for TABLE1 in Update and for TABLE1 in From. In this case you can add condition to the WHERE clause to also match on EVENT_ID. Like this:
UPDATE TABLE1 TABLE1_U
SET TABLE1_U.BEGIN_LAT=TABLE2.latitude
FROM TABLE1 TABLE1_F
INNER JOIN TABLE2
ON TABLE1_F.STATE = TABLE2.STATE
WHERE TABLE1_U.BEGIN_LAT IS NULL AND TABLE1_U.EVENT_ID = TABLE1_F.EVENT_ID
Also, I would prefer to do SELECT query instead of update and save query results to the new table.
Is it possible to do an UPDATE on a table based on a JOIN with an existing table in BigQuery?
When I try this statement on the following database (https://bigquery.cloud.google.com/dataset/pfamdb:pfam31),
UPDATE pfam31.uniprot
SET uniprot.auto_architecture = uniprot_architecture.auto_architecture
INNER JOIN
pfam31.uniprot_architecture using(uniprot_acc)
I get errors relating to the INNER JOIN, with WHERE being expected instead. How should I be doing this (if it's possible)?
UPDATE `pfam31.uniprot` a
SET a.auto_architecture = b.auto_architecture
FROM `pfam31.uniprot_architecture` b
WHERE a.uniprot_acc = b.uniprot_acc
Please refer to the UPDATE statement syntax. There is even an example on UPDATE with JOIN. You need to use a FROM clause, and your query should be something like this:
UPDATE pfam31.uniprot
SET uniprot.auto_architecture =
(SELECT uniprot_architecture.auto_architecture
FROM pfam31.uniprot_architecture
WHERE uniprot.uniprot_acc = auto_architecture.uniprot_acc);
This assumes that there is a 1:1 relationship between the uniprot_acc values in the tables. If that isn't the case, you will need to use LIMIT 1, for instance.
So I have a weird situation.
Whenever I run a sqlContext.sql with a inner join statement, I actually get an error but when I read the error, it looks like Spark has already automatically joined my two separate tables once it tries to execute the on statement.
Table1:
patient_id, code
Table2:
patient_id, date
Select code, date
from Table1
inner join Table2
on Table1.patient_id = Table2.patient_id <- exception shows the table is joined already by this point.
Any ideas about this behavior?
Error looks like thisish
org.apache.spark.sql.AnalysisException: cannot resolve 'Table2.patient_id' given input columns [patient_id, code, date]
I think that you have a typo in your program.
However, what you can do is the following:
tableOneDF.join(tableTwoDF, tableOneDF("patient_id") === tableTwoDF("patient_id"), "inner").select("code", "date")
whereas tableOneDF and tableTwoDF are two dataframes created on top of the two tables.
Just try it out, and see if it still happens.
Folks,
I've had a pretty thorough search before posting and couldn't see this answered anywhere previously. Perhaps it isn't possible.... I'm using SQL server 2008 R2
Anyway, thanks in advance for looking/helping.
I have two tables that I'd like to join.
Table1 (t1):
Account------Name--------Amount
12345-------account1-----10000.00
12346-------account2-----20000.00
Table2 (t2):
ID-----Account---extraData
10-----12345-----ZZ100
20-----12345-----ZZ250
30-----12345-----ZZ400
10-----12346-----ZZ150
20-----12346-----ZZ200
I'm trying to return the following from the above tables:
t1.Account---t1.Name------ID1(t2.ID=10)---ID2(td.ID=20)----SUM(Amount)
12345--------account1-------ZZ100------------ZZ250-------------10000.00
12346--------account2-------ZZ150------------ZZ200-------------20000.00
I have tried various joins of sorts and a union, but can't seem to get the results above. Most result in either nothing, or the Amount column returning as double the required result.
My starting point is:
Select t1.Account, t1.Name, t2A.extraData, t2B.extraData, SUM(t1.AMOUNT)
from table1 t1
join table2 t2A on t1.Account = t2A.Account and t2A.ID = '10'
join table2 t2B on t1.Account = t2B.Account and t2B.ID = '20'
Group by t1.Account, t1.Name, t2A.extraData, t2B.extraData
I've reduced the code and complexity of the query for this thread, but the problem is as above. I have no control over the table structure as they form part of an accounting system that I can't amend (I could, but I'd upset one or two people!).
Hopefully I've explained the issue clearly enough. It seems like it should be simple, but I can't seem to fathom it - perhaps I've just been staring too long. Anyway, thanks in advance for your assistance.
Edit: to change the code to reflect the first response highlighting a mistake in my posting.
Please try this. I think this helps you to achieve your result.
DECLARE #ids varchar(max)
SELECT #ids=STUFF((SELECT DISTINCT ', [' + CAST(ID AS VARCHAR(10))+']'
FROM t2
FOR XML PATH(''), TYPE)
.value('.','NVARCHAR(MAX)'),1,2,' ')
SELECT #ids
EXECUTE ('SELECT
Account,Name,'+#ids+',Amount
FROM
(SELECT t1.Account,Name,ID,ExtraData,SUM(Amount) AS Amount
FROM t1 t1 INNER JOIN t2 t2 ON t1.Account=t2.Account
GROUP BY t1.Account,Name,ID,ExtraData) AS SourceTable
PIVOT
(
MAX(ExtraData)
FOR ID IN ('+#ids+')
) AS PivotTable;')
I need to generate such SQL using Propel build criteria:
"SELECT *
FROM `table1`
LEFT JOIN table2 ON ( table1.OBJECT_ID = table2.ID )
LEFT JOIN table3 ON ( table1.OBJECT_ID = table3.ID )
LEFT JOIN table4 ON ( table4.USER_ID = table2.ID
OR table4.USER_ID = table3.AUTHOR_ID )"
Is it possible to make join with or condition? Or maybe some other ways?
Propel 1.5
Table1Query::create()
->leftJoinTable2()
->leftJoinTable3()
->useTable2Query()
->leftJoinTable4()
->endUse()
->condition('cond1', Table4::USER_ID . ' = ' . Table2::ID)
->condition('cond2', Table4::USER_ID . ' = ' . Table3::AUTHOR_ID)
->combine(array('cond1', 'cond2'), Criteria::LOGICAL_OR, 'onClause')
->setJoinCondition('Table4', 'onClause')
->find();
useTable2Query() is necessary because your information seems to imply that Table4 is related to Table2 and not to Table1, and so joining Table4 directly to Table1 will result in a series of fatal Propel errors. The "use" functionality bridges that relationship.
The first two joins (table2, table3) are easy, if I recall correctly. Just make table1.OBJECT_ID non-required in your schema, and the left join will be used automatically.
Not immediately sure about the OR join. If you get stuck, one way to do it is to use the above in a raw query, and then "hydrate" objects from the resultset. Another way (very good for complex queries that are a pain to express in an ORM) is to create a database view for the above, and then add a new table in your schema for the view. It's cheating a bit for sure, but for some really complex master-detail things I did in a large symfony project, it was great - and it made query debugging really easy as well.