Let me begin by apologizing for what may have been a confusing title. I an just beginning my data analyst journey. I am working in BIGQUERY with a Extreme Storm dataset (TABLE1) that has fields for LAT,LONG, and STATE. There are null values in the latitude and longitude fields that I want to replace with general LAT/LONG values from a State Information dataset(TABLE2) also containing LAT,LONG and STATE values. In TABLE1 each record is given a unique EVENT_ID and there are 1.4m rows. In TABLE2 each STATE is a unique record.
I've tried:
Update TABLE1
SET TABLE1.BEGIN_LAT=TABLE2.latitude
From TABLE1
INNER JOIN TABLE2
ON TABLE1.STATE = TABLE2.STATE
WHERE TABLE1.BEGIN_LAT IS NULL
I am getting an error because TABLE1 contains multiple rows with the same STATE and I am trying to use it as my primary key. I know what I am doing wrong but can't figure out how to do it the correct way. Is what I am trying to do possible in BigQuery?
Any help would be appreciated. Even advice on how to ask questions! :)
Thank you.
I believe you have in your query some alias for TABLE1 in Update and for TABLE1 in From. In this case you can add condition to the WHERE clause to also match on EVENT_ID. Like this:
UPDATE TABLE1 TABLE1_U
SET TABLE1_U.BEGIN_LAT=TABLE2.latitude
FROM TABLE1 TABLE1_F
INNER JOIN TABLE2
ON TABLE1_F.STATE = TABLE2.STATE
WHERE TABLE1_U.BEGIN_LAT IS NULL AND TABLE1_U.EVENT_ID = TABLE1_F.EVENT_ID
Also, I would prefer to do SELECT query instead of update and save query results to the new table.
I'm trying to convert the following SQL statement to Core Data:
delete from SomeTable
where someID not in (
select someID
from SomeTable
group by property1, property2, property3
)
Basically, I want to retrieve and delete possible duplicates in a table where a record is deemed a duplicate if property1, property2 and property3 are equal to another record.
How can I do that?
PS: As the title says, I'm trying to convert the above SQL statement into iOS Core Data methods, not trying to improve, correct or comment on the above SQL, that is beyond the point.
Thank you.
It sounds like you are asking for SQL to accomplish your objective. Your starting query won't do what you describe, and most databases wouldn't accept it at all on account of the aggregate subquery attempting to select a column that is not a function of the groups.
UPDATE
I had initially thought the request was to delete all members of each group containing dupes, and wrote code accordingly. Having reinterpreted the original SQL as MySQL would do, it seems the objective is to retain exactly one element for each combination of (property1, property2, property3). I guess that makes more sense anyway. Here is a standard way to do that:
delete from SomeTable st1
where someID not in (
select min(st2.someId)
from SomeTable st2
group by property1, property2, property3
)
That's distinguished from the original by use of the min() aggregate function to choose a specific one of the someId values to retain from each group. This should work, too:
delete from SomeTable st1
where someID in (
select st3.someId
from SomeTable st2
join SomeTable st3
on st2.property1 = st3.property1
and st2.property2 = st3.property2
and st2.property3 = st3.property3
where st2.someId < st3.someId
)
These two queries will retain the same rows. I like the second better, even though it's longer, because the NOT IN operator is kinda nasty for choosing a small number of elements from a large set. If you anticipate having enough rows to be concerned about scaling, though, then you should try both, and perhaps look into optimizations (for example, an index on (property1, property2, property3)) and other alternatives.
As for writing it in terms of Core Data calls, however, I don't think you exactly can. Core Data does support grouping, so you could write Core Data calls that perform the subquery in the first alternative and return you the entity objects or their IDs, grouped as described. You could then iterate over the groups, skip the first element of each, and call Core Data deletion methods for all the rest. The details are out of scope for the SO format.
I have to say, though, that doing such a job in Core Data is going to be far more costly than doing it directly in the database, both in time and in required memory. Doing it directly in the database is not friendly to an ORM framework such as Core Data, however. This sort of thing is one of the tradeoffs you've chosen by going with an ORM framework.
I'd recommend that you try to avoid the need to do this at all. Define a unique index on SomeTable(property1, property2, property3) and do whatever you need to do to avoid trying to creating duplicates or to gracefully recover from a (failed) attempt to do so.
DELETE SomeTable
FROM SomeTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, property1, property2, property3
FROM SomeTable
GROUP BY property1, property2, property3
) as KeepRows ON
SomeTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
A few pointers for doing this in iOS: Before iOS 9 the only way to delete objects is individually, ie you will need to iterate through an array of duplicates and delete each one. (If you are targeting iOS9, there is a new NSBatchDeleteRequest which will help delete them all in one go - it does act directly on the store but also does some cleanup to eg. ensure relationships are updated where necessary).
The other problem is identifying the duplicates. You can configure a fetch to group its results (see the propertiesToGroupBy of NSFetchRequest), but you will have to specify NSDictionaryResultType (so the results are NOT the objects themselves, just the values from the relevant properties.) Furthermore, CoreData will not let you fetch properties (other than aggregates) that are not specified in the GROUP BY. So the suggestion (in the other answer) to use min(someId) will be necessary. (To fetch an expression such as this, you will need to use an NSExpression, embed it in an NSExpressionDescription and pass the latter in propertiesToFetch of the fetch request).
The end result will be an array of dictionaries, each holding the someId value of your prime records (ie the ones you don't want to delete), from which you have then got to work out the duplicates. There are various ways, but none will be very efficient.
So as the other answer says, duplicates are better avoided in the first place. On that front, note that iOS 9 allows you to specify attributes that you would like to be unique (individually or collectively).
Let me know if you would like me to elaborate on any of the above.
Group-wise Maximum:
select t1.someId
from SomeTable t1
left outer join SomeTable t2
on t1.property1 = t2.property1
and t1.property2 = t2.property2
and t1.property3 = t2.property3
and t1.someId < t2.someId
where t2.someId is null;
So, this could be the answer
delete SomeTable
where someId not in
(select t1.someId
from SomeTable t1
left outer join SomeTable t2
on t1.property1 = t2.property1
and t1.property2 = t2.property2
and t1.property3 = t2.property3
and t1.someId < t2.someId
where t2.someId is null);
Sqlfiddle demo
You can use exists function to check for each row if there is another row that exists whose id is not equal to the current row and all other properties that define the duplicate criteria of each row are equal to all the properties of the current row.
delete from something
where
id in (SELECT
sm.id
FROM
sometable sm
where
exists( select
1
from
sometable sm2
where
sm.prop1 = sm2.prop1
and sm.prop2 = sm2.prop2
and sm.prop3 = sm2.prop3
and sm.id != sm2.id)
);
I think you could easily handle this by creating a derived duplicate_flg column and set it to 1 when all three property values are equal. Once that is done, you could just delete those records where duplicate_flg = 1. Here is a sample query on how to do this:
--retrieve all records that has same property values (property1,property2 and property3)
SELECT *
FROM (
SELECT someid
,property1
,property2
,property3
,CASE
WHEN property1 = property2
AND property1 = property3
THEN 1
ELSE 0
END AS duplicate_flg
FROM SomeTable
) q1
WHERE q1.duplicate_flg = 1;
Here is a sample delete statement:
DELETE
FROM something
WHERE someid IN (
SELECT someid
FROM (
SELECT someid
,property1
,property2
,property3
,CASE
WHEN property1 = property2
AND property1 = property3
THEN 1
ELSE 0
END AS duplicate_flg
FROM SomeTable
) q1
WHERE q1.duplicate_flg = 1
);
Simply, if you want to remove duplicate from table you can execute below Query :
delete from SomeTable
where rowid not in (
select max(rowid)
from SomeTable
group by property1, property2, property3
)
if you want to delete all duplicate records try the below code
WITH tblTemp as
(
SELECT ROW_NUMBER() Over(PARTITION BY Property1,Property2,Property3 ORDER BY Property1) As RowNumber,* FROM Table_1
)
DELETE FROM tblTemp where RowNumber >1
Hope it helps
Use the below query to delete the duplicate data from that table
delete from SomeTable where someID not in
(select Min(someID) from SomeTable
group by property1+property2+property3)
I am doing calculations in my KLOG table.
However, my PRICES table has the data I need for the calculations in the KLOG table.
Example :
KLOG table has PRICE_ID field (integer). So does the PRICES table.
So I am trying to do something like this (oncalculatefields of the KLOG table ) :
if KLOG.FieldByName('PRICE_ID') = 1 then begin
KLOG.FieldByName('calculated_field_value_1').Value := KLOG.FieldByName('calculated_field_value_2').Value +5;
This (+5) however is a field value (BONUS) in my PRICES table where PRICE_ID =1.
So how can I reference this BONUS field in my oncalculate event of the KLOG table?
Can you use SELECT ? Something like :
KLOG.FieldByName('calculated_field_value_1').Value := KLOG.FieldByName('calculated_field_value_2').Value + (select BONUS from PRICES where PRICES.PRICE_ID = KLOG.PRICE_ID);
Not sure I am writing this properly.
You are not telling us what data access component you are using.
The recommended way to do this is using a Query component (TSQLQuery, TADOQuery, TIBQuery,...) , then have the SQL for that query retrieve all necessary information. All the fields that you want access to in you OnCalcFields() should be in that query.
It would be something like
select * from KLOG, PRICES
where KLOG.PRICE_ID=PRICES.PRICE_ID
Replace the * with the field names you actually want.
If you want to, you can even do simple calculation in the query already, e.g. something like
select [..], PRICES.PRICE*KLOG_QUANTITY as TotalPrice from KLOG, PRICES
where KLOG.PRICE_ID=PRICES.PRICE_ID
Using query components instead of table components is good practice anyway; (together with TClientDataSet...) it will generally allow you to retrieve smaller amounts of data than an entire table.
Here is the query (MS SQL):
use DB_432799_satv
select [DB_432799_satv].[dbo].ac_Orders.OrderNumber, OrderDate, PaymentDate,
CompletedDate, ShipDate,DateTimeFinishedPicking,DateTimeShipped,
ShipMethodName, PaymentMethodName
from [DB_432799_satv].[dbo].ac_Orders
Inner join [DB_432799_satv].[dbo].ac_Payments on
[DB_432799_satv].[dbo].ac_Orders.OrderId =
[DB_432799_satv].[dbo].ac_Payments.OrderId
Inner join [DB_432799_satv].[dbo].ac_OrderShipments on
[DB_432799_satv].[dbo].ac_Orders.OrderId =
[DB_432799_satv].[dbo].ac_OrderShipments.OrderId
Inner join [DB_432799_satv].[dbo].[ac_Transactions] on
[DB_432799_satv].[dbo].ac_Payments.paymentid =
[DB_432799_satv].[dbo].ac_Transactions.paymentid
Inner join [SuperATV].[dbo].[tblPartsBoxHeader] on
[DB_432799_satv].[dbo].[ac_transactions].[ProviderTransactionId] =
[SuperATV].[dbo].[tblPartsBoxHeader].[ordernumber]
How would I change this into a stored procedure?
I suspect what you want is what a "view" will give you. Specifically, to store that query in the database and select from it like:
SELECT * FOM myLongQueryView WHERE OrderNumber = 1234;
I don't develop MSSQL, but a quick search turned up this doc page on msdn. I suppose it would go something like this:
CREATE VIEW [DB_432799_satv].myLongQueryView
AS
select [DB_432799_satv].[dbo].ac_Orders.OrderNumber, OrderDate, PaymentDate,
CompletedDate, ShipDate,DateTimeFinishedPicking,DateTimeShipped,
ShipMethodName, PaymentMethodName
from [DB_432799_satv].[dbo].ac_Orders
...
Simply prepend your query with CREATE VIEW [DB_432799_satv].myLongQueryView AS, and use the docs page to decide if any options are desired.
It's important to note that query is stored, not it's current results. So it stays up to date with your data. It is joinable, aggregatable, etc...
I have two tables - tool_downloads and tool_configurations. I am trying to retrieve the most recent build date for each tool in my database. The layout of the DB is simple. One table called tool_downloads keeps track of when a tool is downloaded. Another table is called tool_configurations and stores the actual data about the tool. They are linked together by the tool_conf_id.
If I run the following query which omits dates, I get back 200 records.
SELECT DISTINCT a.tool_conf_id, b.tool_conf_id
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
ORDER BY a.tool_conf_id
When I try to add in date information I get back hundreds of thousands of records! Here is the query that fails horribly.
SELECT DISTINCT a.tool_conf_id, max(a.configured_date) as config_date, b.configuration_name
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
ORDER BY a.tool_conf_id
I know the problem has something to do with group-bys/aggregate data and joins. I can't really search google since I don't know the name of the problem I'm encountering. Any help would be appreciated.
Solution is:
SELECT b.tool_conf_id, b.configuration_name, max(a.configured_date) as config_date
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
GROUP BY b.tool_conf_id, b.configuration_name