Let me begin by apologizing for what may have been a confusing title. I an just beginning my data analyst journey. I am working in BIGQUERY with a Extreme Storm dataset (TABLE1) that has fields for LAT,LONG, and STATE. There are null values in the latitude and longitude fields that I want to replace with general LAT/LONG values from a State Information dataset(TABLE2) also containing LAT,LONG and STATE values. In TABLE1 each record is given a unique EVENT_ID and there are 1.4m rows. In TABLE2 each STATE is a unique record.
I've tried:
Update TABLE1
SET TABLE1.BEGIN_LAT=TABLE2.latitude
From TABLE1
INNER JOIN TABLE2
ON TABLE1.STATE = TABLE2.STATE
WHERE TABLE1.BEGIN_LAT IS NULL
I am getting an error because TABLE1 contains multiple rows with the same STATE and I am trying to use it as my primary key. I know what I am doing wrong but can't figure out how to do it the correct way. Is what I am trying to do possible in BigQuery?
Any help would be appreciated. Even advice on how to ask questions! :)
Thank you.
I believe you have in your query some alias for TABLE1 in Update and for TABLE1 in From. In this case you can add condition to the WHERE clause to also match on EVENT_ID. Like this:
UPDATE TABLE1 TABLE1_U
SET TABLE1_U.BEGIN_LAT=TABLE2.latitude
FROM TABLE1 TABLE1_F
INNER JOIN TABLE2
ON TABLE1_F.STATE = TABLE2.STATE
WHERE TABLE1_U.BEGIN_LAT IS NULL AND TABLE1_U.EVENT_ID = TABLE1_F.EVENT_ID
Also, I would prefer to do SELECT query instead of update and save query results to the new table.
Is it possible to do an UPDATE on a table based on a JOIN with an existing table in BigQuery?
When I try this statement on the following database (https://bigquery.cloud.google.com/dataset/pfamdb:pfam31),
UPDATE pfam31.uniprot
SET uniprot.auto_architecture = uniprot_architecture.auto_architecture
INNER JOIN
pfam31.uniprot_architecture using(uniprot_acc)
I get errors relating to the INNER JOIN, with WHERE being expected instead. How should I be doing this (if it's possible)?
UPDATE `pfam31.uniprot` a
SET a.auto_architecture = b.auto_architecture
FROM `pfam31.uniprot_architecture` b
WHERE a.uniprot_acc = b.uniprot_acc
Please refer to the UPDATE statement syntax. There is even an example on UPDATE with JOIN. You need to use a FROM clause, and your query should be something like this:
UPDATE pfam31.uniprot
SET uniprot.auto_architecture =
(SELECT uniprot_architecture.auto_architecture
FROM pfam31.uniprot_architecture
WHERE uniprot.uniprot_acc = auto_architecture.uniprot_acc);
This assumes that there is a 1:1 relationship between the uniprot_acc values in the tables. If that isn't the case, you will need to use LIMIT 1, for instance.
I am writing a hive query to join two tables; table1 and table2. In the result I just need all columns from table1 and no columns from table2.
I know the solution where I can select all the columns manually by specifying table1.column1, table1.column2.. and so on in the select statement. But I have about 22 columns in table 1. Also, I have to do the same for multiple other tables ans its painful process.
I tried using "SELECT table1.*", but I get a parse exception.
Is there a better way to do it?
Hive 0.13 onwards the following query syntax works:
SELECT a.* FROM a JOIN b ON (a.id = b.id)
This query will select all columns from a. So instead of typing all the column names (making the query cumbersome), it is a better idea to use tablealias.*
I have a sqlite database that I'm trying to build a query. The table column I need to retrieve is iEDLID from the table below :
Right now all I have to go on is a known iEventID from the table below :
And the the nClientLocationID from the table below.
So the requirements are I need to get current iEDLID to write, lookup from tblEventDateLocations for dEventDate and the tblLocation.nClientLocationID based on the tblLocations.iLocationID I already have and event selected on this screen.
So I would need a query that does a "SELECT DISTINCT table EventDateLocations.iEDLID FROM tblEventDateLocations ...."
So basically from another query I have the iEventID I need, and I have the event ID i need but where the dEventDate=(select date('now')) I need to retrieve the iEventDateID from table EventDates.iEventDateID to use on the table EventDateLocations
this is the point where I'm trying to wrap my head around the joins for this query and the syntax...
It seems like you want this:
select distinct edl.iEDLDID
from
tblEventDateLocations edl
join tblEventDates ed on edl.EventDateId = ed.EventDateId
where
ed.EventId = ?
and ed.dEventDate = date('now')
and edl.nClientLocationID = ?
where the ? of course represent the known event ID and location ID parameters.
Since nClientLocationId appears on table tblEventDateLocations you do not need to join table tblLocations unless you want to filter out results whose location ID does not appear in that table.
I'm trying to convert the following SQL statement to Core Data:
delete from SomeTable
where someID not in (
select someID
from SomeTable
group by property1, property2, property3
)
Basically, I want to retrieve and delete possible duplicates in a table where a record is deemed a duplicate if property1, property2 and property3 are equal to another record.
How can I do that?
PS: As the title says, I'm trying to convert the above SQL statement into iOS Core Data methods, not trying to improve, correct or comment on the above SQL, that is beyond the point.
Thank you.
It sounds like you are asking for SQL to accomplish your objective. Your starting query won't do what you describe, and most databases wouldn't accept it at all on account of the aggregate subquery attempting to select a column that is not a function of the groups.
UPDATE
I had initially thought the request was to delete all members of each group containing dupes, and wrote code accordingly. Having reinterpreted the original SQL as MySQL would do, it seems the objective is to retain exactly one element for each combination of (property1, property2, property3). I guess that makes more sense anyway. Here is a standard way to do that:
delete from SomeTable st1
where someID not in (
select min(st2.someId)
from SomeTable st2
group by property1, property2, property3
)
That's distinguished from the original by use of the min() aggregate function to choose a specific one of the someId values to retain from each group. This should work, too:
delete from SomeTable st1
where someID in (
select st3.someId
from SomeTable st2
join SomeTable st3
on st2.property1 = st3.property1
and st2.property2 = st3.property2
and st2.property3 = st3.property3
where st2.someId < st3.someId
)
These two queries will retain the same rows. I like the second better, even though it's longer, because the NOT IN operator is kinda nasty for choosing a small number of elements from a large set. If you anticipate having enough rows to be concerned about scaling, though, then you should try both, and perhaps look into optimizations (for example, an index on (property1, property2, property3)) and other alternatives.
As for writing it in terms of Core Data calls, however, I don't think you exactly can. Core Data does support grouping, so you could write Core Data calls that perform the subquery in the first alternative and return you the entity objects or their IDs, grouped as described. You could then iterate over the groups, skip the first element of each, and call Core Data deletion methods for all the rest. The details are out of scope for the SO format.
I have to say, though, that doing such a job in Core Data is going to be far more costly than doing it directly in the database, both in time and in required memory. Doing it directly in the database is not friendly to an ORM framework such as Core Data, however. This sort of thing is one of the tradeoffs you've chosen by going with an ORM framework.
I'd recommend that you try to avoid the need to do this at all. Define a unique index on SomeTable(property1, property2, property3) and do whatever you need to do to avoid trying to creating duplicates or to gracefully recover from a (failed) attempt to do so.
DELETE SomeTable
FROM SomeTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, property1, property2, property3
FROM SomeTable
GROUP BY property1, property2, property3
) as KeepRows ON
SomeTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
A few pointers for doing this in iOS: Before iOS 9 the only way to delete objects is individually, ie you will need to iterate through an array of duplicates and delete each one. (If you are targeting iOS9, there is a new NSBatchDeleteRequest which will help delete them all in one go - it does act directly on the store but also does some cleanup to eg. ensure relationships are updated where necessary).
The other problem is identifying the duplicates. You can configure a fetch to group its results (see the propertiesToGroupBy of NSFetchRequest), but you will have to specify NSDictionaryResultType (so the results are NOT the objects themselves, just the values from the relevant properties.) Furthermore, CoreData will not let you fetch properties (other than aggregates) that are not specified in the GROUP BY. So the suggestion (in the other answer) to use min(someId) will be necessary. (To fetch an expression such as this, you will need to use an NSExpression, embed it in an NSExpressionDescription and pass the latter in propertiesToFetch of the fetch request).
The end result will be an array of dictionaries, each holding the someId value of your prime records (ie the ones you don't want to delete), from which you have then got to work out the duplicates. There are various ways, but none will be very efficient.
So as the other answer says, duplicates are better avoided in the first place. On that front, note that iOS 9 allows you to specify attributes that you would like to be unique (individually or collectively).
Let me know if you would like me to elaborate on any of the above.
Group-wise Maximum:
select t1.someId
from SomeTable t1
left outer join SomeTable t2
on t1.property1 = t2.property1
and t1.property2 = t2.property2
and t1.property3 = t2.property3
and t1.someId < t2.someId
where t2.someId is null;
So, this could be the answer
delete SomeTable
where someId not in
(select t1.someId
from SomeTable t1
left outer join SomeTable t2
on t1.property1 = t2.property1
and t1.property2 = t2.property2
and t1.property3 = t2.property3
and t1.someId < t2.someId
where t2.someId is null);
Sqlfiddle demo
You can use exists function to check for each row if there is another row that exists whose id is not equal to the current row and all other properties that define the duplicate criteria of each row are equal to all the properties of the current row.
delete from something
where
id in (SELECT
sm.id
FROM
sometable sm
where
exists( select
1
from
sometable sm2
where
sm.prop1 = sm2.prop1
and sm.prop2 = sm2.prop2
and sm.prop3 = sm2.prop3
and sm.id != sm2.id)
);
I think you could easily handle this by creating a derived duplicate_flg column and set it to 1 when all three property values are equal. Once that is done, you could just delete those records where duplicate_flg = 1. Here is a sample query on how to do this:
--retrieve all records that has same property values (property1,property2 and property3)
SELECT *
FROM (
SELECT someid
,property1
,property2
,property3
,CASE
WHEN property1 = property2
AND property1 = property3
THEN 1
ELSE 0
END AS duplicate_flg
FROM SomeTable
) q1
WHERE q1.duplicate_flg = 1;
Here is a sample delete statement:
DELETE
FROM something
WHERE someid IN (
SELECT someid
FROM (
SELECT someid
,property1
,property2
,property3
,CASE
WHEN property1 = property2
AND property1 = property3
THEN 1
ELSE 0
END AS duplicate_flg
FROM SomeTable
) q1
WHERE q1.duplicate_flg = 1
);
Simply, if you want to remove duplicate from table you can execute below Query :
delete from SomeTable
where rowid not in (
select max(rowid)
from SomeTable
group by property1, property2, property3
)
if you want to delete all duplicate records try the below code
WITH tblTemp as
(
SELECT ROW_NUMBER() Over(PARTITION BY Property1,Property2,Property3 ORDER BY Property1) As RowNumber,* FROM Table_1
)
DELETE FROM tblTemp where RowNumber >1
Hope it helps
Use the below query to delete the duplicate data from that table
delete from SomeTable where someID not in
(select Min(someID) from SomeTable
group by property1+property2+property3)