How to convert SQL statement "delete from TABLE where someID not in (select someID from Table group by property1, property2) - ios

I'm trying to convert the following SQL statement to Core Data:
delete from SomeTable
where someID not in (
select someID
from SomeTable
group by property1, property2, property3
)
Basically, I want to retrieve and delete possible duplicates in a table where a record is deemed a duplicate if property1, property2 and property3 are equal to another record.
How can I do that?
PS: As the title says, I'm trying to convert the above SQL statement into iOS Core Data methods, not trying to improve, correct or comment on the above SQL, that is beyond the point.
Thank you.

It sounds like you are asking for SQL to accomplish your objective. Your starting query won't do what you describe, and most databases wouldn't accept it at all on account of the aggregate subquery attempting to select a column that is not a function of the groups.
UPDATE
I had initially thought the request was to delete all members of each group containing dupes, and wrote code accordingly. Having reinterpreted the original SQL as MySQL would do, it seems the objective is to retain exactly one element for each combination of (property1, property2, property3). I guess that makes more sense anyway. Here is a standard way to do that:
delete from SomeTable st1
where someID not in (
select min(st2.someId)
from SomeTable st2
group by property1, property2, property3
)
That's distinguished from the original by use of the min() aggregate function to choose a specific one of the someId values to retain from each group. This should work, too:
delete from SomeTable st1
where someID in (
select st3.someId
from SomeTable st2
join SomeTable st3
on st2.property1 = st3.property1
and st2.property2 = st3.property2
and st2.property3 = st3.property3
where st2.someId < st3.someId
)
These two queries will retain the same rows. I like the second better, even though it's longer, because the NOT IN operator is kinda nasty for choosing a small number of elements from a large set. If you anticipate having enough rows to be concerned about scaling, though, then you should try both, and perhaps look into optimizations (for example, an index on (property1, property2, property3)) and other alternatives.
As for writing it in terms of Core Data calls, however, I don't think you exactly can. Core Data does support grouping, so you could write Core Data calls that perform the subquery in the first alternative and return you the entity objects or their IDs, grouped as described. You could then iterate over the groups, skip the first element of each, and call Core Data deletion methods for all the rest. The details are out of scope for the SO format.
I have to say, though, that doing such a job in Core Data is going to be far more costly than doing it directly in the database, both in time and in required memory. Doing it directly in the database is not friendly to an ORM framework such as Core Data, however. This sort of thing is one of the tradeoffs you've chosen by going with an ORM framework.
I'd recommend that you try to avoid the need to do this at all. Define a unique index on SomeTable(property1, property2, property3) and do whatever you need to do to avoid trying to creating duplicates or to gracefully recover from a (failed) attempt to do so.

DELETE SomeTable
FROM SomeTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, property1, property2, property3
FROM SomeTable
GROUP BY property1, property2, property3
) as KeepRows ON
SomeTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL

A few pointers for doing this in iOS: Before iOS 9 the only way to delete objects is individually, ie you will need to iterate through an array of duplicates and delete each one. (If you are targeting iOS9, there is a new NSBatchDeleteRequest which will help delete them all in one go - it does act directly on the store but also does some cleanup to eg. ensure relationships are updated where necessary).
The other problem is identifying the duplicates. You can configure a fetch to group its results (see the propertiesToGroupBy of NSFetchRequest), but you will have to specify NSDictionaryResultType (so the results are NOT the objects themselves, just the values from the relevant properties.) Furthermore, CoreData will not let you fetch properties (other than aggregates) that are not specified in the GROUP BY. So the suggestion (in the other answer) to use min(someId) will be necessary. (To fetch an expression such as this, you will need to use an NSExpression, embed it in an NSExpressionDescription and pass the latter in propertiesToFetch of the fetch request).
The end result will be an array of dictionaries, each holding the someId value of your prime records (ie the ones you don't want to delete), from which you have then got to work out the duplicates. There are various ways, but none will be very efficient.
So as the other answer says, duplicates are better avoided in the first place. On that front, note that iOS 9 allows you to specify attributes that you would like to be unique (individually or collectively).
Let me know if you would like me to elaborate on any of the above.

Group-wise Maximum:
select t1.someId
from SomeTable t1
left outer join SomeTable t2
on t1.property1 = t2.property1
and t1.property2 = t2.property2
and t1.property3 = t2.property3
and t1.someId < t2.someId
where t2.someId is null;
So, this could be the answer
delete SomeTable
where someId not in
(select t1.someId
from SomeTable t1
left outer join SomeTable t2
on t1.property1 = t2.property1
and t1.property2 = t2.property2
and t1.property3 = t2.property3
and t1.someId < t2.someId
where t2.someId is null);
Sqlfiddle demo

You can use exists function to check for each row if there is another row that exists whose id is not equal to the current row and all other properties that define the duplicate criteria of each row are equal to all the properties of the current row.
delete from something
where
id in (SELECT
sm.id
FROM
sometable sm
where
exists( select
1
from
sometable sm2
where
sm.prop1 = sm2.prop1
and sm.prop2 = sm2.prop2
and sm.prop3 = sm2.prop3
and sm.id != sm2.id)
);

I think you could easily handle this by creating a derived duplicate_flg column and set it to 1 when all three property values are equal. Once that is done, you could just delete those records where duplicate_flg = 1. Here is a sample query on how to do this:
--retrieve all records that has same property values (property1,property2 and property3)
SELECT *
FROM (
SELECT someid
,property1
,property2
,property3
,CASE
WHEN property1 = property2
AND property1 = property3
THEN 1
ELSE 0
END AS duplicate_flg
FROM SomeTable
) q1
WHERE q1.duplicate_flg = 1;
Here is a sample delete statement:
DELETE
FROM something
WHERE someid IN (
SELECT someid
FROM (
SELECT someid
,property1
,property2
,property3
,CASE
WHEN property1 = property2
AND property1 = property3
THEN 1
ELSE 0
END AS duplicate_flg
FROM SomeTable
) q1
WHERE q1.duplicate_flg = 1
);

Simply, if you want to remove duplicate from table you can execute below Query :
delete from SomeTable
where rowid not in (
select max(rowid)
from SomeTable
group by property1, property2, property3
)

if you want to delete all duplicate records try the below code
WITH tblTemp as
(
SELECT ROW_NUMBER() Over(PARTITION BY Property1,Property2,Property3 ORDER BY Property1) As RowNumber,* FROM Table_1
)
DELETE FROM tblTemp where RowNumber >1
Hope it helps

Use the below query to delete the duplicate data from that table
delete from SomeTable where someID not in
(select Min(someID) from SomeTable
group by property1+property2+property3)

Related

Access 'Not Equal To' Join

I have two tables, neither with a primary id. The same combination of fields uniquely identifies the records in each and makes the records between the two tables relate-able (I think).
I need a query to combine all the records from one table and only the records from the second not already included from the first table. How do I do this using 'not equal to' joins on multiple fields? My results so far only give me the records of the first table, or no records at all.
Try the following:
SELECT ECDSlides.[Supplier Code], ECDSlides.[Supplier Name], ECDSlides.Commodity
FROM ECDSlides LEFT JOIN (ECDSlides.Commodity = [Mit Task Details2].Commodity) AND (ECDSlides.[Supplier Code] = [Mit Task Details2].[Supplier Code])
WHERE [Mit Task Details2].Commodity Is Null;
This might be what you are looking for
SELECT fieldA,fieldB FROM tableA
UNION
SELECT fieldA,fieldB FROM tableB
Union should remove automatically. 'Union All' would not.
If, for some reason, you get perfect duplicates and they are not removed, you could try this :
SELECT DISTINCT * FROM (
SELECT fieldA,fieldB FROM tableA
UNION
SELECT fieldA,fieldB FROM tableB
) AS subquery

SQLite select distinct join query how to

I have a sqlite database that I'm trying to build a query. The table column I need to retrieve is iEDLID from the table below :
Right now all I have to go on is a known iEventID from the table below :
And the the nClientLocationID from the table below.
So the requirements are I need to get current iEDLID to write, lookup from tblEventDateLocations for dEventDate and the tblLocation.nClientLocationID based on the tblLocations.iLocationID I already have and event selected on this screen.
So I would need a query that does a "SELECT DISTINCT table EventDateLocations.iEDLID FROM tblEventDateLocations ...."
So basically from another query I have the iEventID I need, and I have the event ID i need but where the dEventDate=(select date('now')) I need to retrieve the iEventDateID from table EventDates.iEventDateID to use on the table EventDateLocations
this is the point where I'm trying to wrap my head around the joins for this query and the syntax...
It seems like you want this:
select distinct edl.iEDLDID
from
tblEventDateLocations edl
join tblEventDates ed on edl.EventDateId = ed.EventDateId
where
ed.EventId = ?
and ed.dEventDate = date('now')
and edl.nClientLocationID = ?
where the ? of course represent the known event ID and location ID parameters.
Since nClientLocationId appears on table tblEventDateLocations you do not need to join table tblLocations unless you want to filter out results whose location ID does not appear in that table.

How to get row Count of the sqlite3_stmt *statement? [duplicate]

I want to get the number of selected rows as well as the selected data. At the present I have to use two sql statements:
one is
select * from XXX where XXX;
the other is
select count(*) from XXX where XXX;
Can it be realised with a single sql string?
I've checked the source code of sqlite3, and I found the function of sqlite3_changes(). But the function is only useful when the database is changed (after insert, delete or update).
Can anyone help me with this problem? Thank you very much!
SQL can't mix single-row (counting) and multi-row results (selecting data from your tables). This is a common problem with returning huge amounts of data. Here are some tips how to handle this:
Read the first N rows and tell the user "more than N rows available". Not very precise but often good enough. If you keep the cursor open, you can fetch more data when the user hits the bottom of the view (Google Reader does this)
Instead of selecting the data directly, first copy it into a temporary table. The INSERT statement will return the number of rows copied. Later, you can use the data in the temporary table to display the data. You can add a "row number" to this temporary table to make paging more simple.
Fetch the data in a background thread. This allows the user to use your application while the data grid or table fills with more data.
try this way
select (select count() from XXX) as count, *
from XXX;
select (select COUNT(0)
from xxx t1
where t1.b <= t2.b
) as 'Row Number', b from xxx t2 ORDER BY b;
just try this.
You could combine them into a single statement:
select count(*), * from XXX where XXX
or
select count(*) as MYCOUNT, * from XXX where XXX
To get the number of unique titles, you need to pass the DISTINCT clause to the COUNT function as the following statement:
SELECT
COUNT(DISTINCT column_name)
FROM
'table_name';
Source: http://www.sqlitetutorial.net/sqlite-count-function/
For those who are still looking for another method, the more elegant one I found to get the total of row was to use a CTE.
this ensure that the count is only calculated once :
WITH cnt(total) as (SELECT COUNT(*) from xxx) select * from xxx,cnt
the only drawback is if a WHERE clause is needed, it should be applied in both main query and CTE query.
In the first comment, Alttag said that there is no issue to run 2 queries. I don't agree with that unless both are part of a unique transaction. If not, the source table can be altered between the 2 queries by any INSERT or DELETE from another thread/process. In such case, the count value might be wrong.
Once you already have the select * from XXX results, you can just find the array length in your program right?
If you use sqlite3_get_table instead of prepare/step/finalize you will get all the results at once in an array ("result table"), including the numbers and names of columns, and the number of rows. Then you should free the result with sqlite3_free_table
int rows_count = 0;
while (sqlite3_step(stmt) == SQLITE_ROW)
{
rows_count++;
}
// The rows_count is available for use
sqlite3_reset(stmt); // reset the stmt for use it again
while (sqlite3_step(stmt) == SQLITE_ROW)
{
// your code in the query result
}

How to get records from multiple condition from a same column through associated table

Let say a book model HABTM categories, for an example book A has categories "CA" & "CB". How can i retrieve book A if I query using "CA" & "CB" only. I know about the .where("category_id in (1,2)") but it uses OR operation. I need something like AND operation.
Edited
And also able to get books from category CA only. And how to include query criteria such as .where("book.p_year = 2012")
ca = Category.find_by_name('CA')
cb = Category.find_by_name('CB')
Book.where(:id => (ca.book_ids & cb.book_ids)) # & returns elements common to both arrays.
Otherwise you'd need to abuse the join table directly in SQL, group the results by book_id, count them, and only return rows where the count is at least equal to the number of categories... something like this (but I'm sure it's wrong so double check the syntax if you go this route. Also not sure it would be any faster than the above):
SELECT book_id, count(*) as c from books_categories where category_id IN (1,2) group by book_id having count(*) >= 2;

ASP.NET MVC & EF4 Entity Framework - Are there any performance concerns in using the entities vs retrieving only the fields i need?

Lets say we have 3 tables, Users, Products, Purchases.
There is a view that needs to display the purchases made by a user.
I could lookup the data required by doing:
from p in DBSet<Purchases>.Include("User").Include("Product") select p;
However, I am concern that this may have a performance impact because it will retrieve the full objects.
Alternatively, I could select only the fields i need:
from p in DBSet<Purchases>.Include("User").Include("Product") select new SimplePurchaseInfo() { UserName = p.User.name, Userid = p.User.Id, ProductName = p.Product.Name ... etc };
So my question is:
Whats the best practice in doing this?
== EDIT
Thanks for all the replies.
[QUESTION 1]: I want to know whether all views should work with flat ViewModels with very specific data for that view, or should the ViewModels contain the entity objects.
Real example: User reviews Products
var query = from dr in productRepository.FindAllReviews()
where dr.User.UserId = 'userid'
select dr;
string sql = ((ObjectQuery)query).ToTraceString();
SELECT [Extent1].[ProductId] AS [ProductId],
[Extent1].[Comment] AS [Comment],
[Extent1].[CreatedTime] AS [CreatedTime],
[Extent1].[Id] AS [Id],
[Extent1].[Rating] AS [Rating],
[Extent1].[UserId] AS [UserId],
[Extent3].[CreatedTime] AS [CreatedTime1],
[Extent3].[CreatorId] AS [CreatorId],
[Extent3].[Description] AS [Description],
[Extent3].[Id] AS [Id1],
[Extent3].[Name] AS [Name],
[Extent3].[Price] AS [Price],
[Extent3].[Rating] AS [Rating1],
[Extent3].[ShopId] AS [ShopId],
[Extent3].[Thumbnail] AS [Thumbnail],
[Extent3].[Creator_UserId] AS [Creator_UserId],
[Extent4].[Comment] AS [Comment1],
[Extent4].[DateCreated] AS [DateCreated],
[Extent4].[DateLastActivity] AS [DateLastActivity],
[Extent4].[DateLastLogin] AS [DateLastLogin],
[Extent4].[DateLastPasswordChange] AS [DateLastPasswordChange],
[Extent4].[Email] AS [Email],
[Extent4].[Enabled] AS [Enabled],
[Extent4].[PasswordHash] AS [PasswordHash],
[Extent4].[PasswordSalt] AS [PasswordSalt],
[Extent4].[ScreenName] AS [ScreenName],
[Extent4].[Thumbnail] AS [Thumbnail1],
[Extent4].[UserId] AS [UserId1],
[Extent4].[UserName] AS [UserName]
FROM [ProductReviews] AS [Extent1]
INNER JOIN [Users] AS [Extent2] ON [Extent1].[UserId] = [Extent2].[UserId]
LEFT OUTER JOIN [Products] AS [Extent3] ON [Extent1].[ProductId] = [Extent3].[Id]
LEFT OUTER JOIN [Users] AS [Extent4] ON [Extent1].[UserId] = [Extent4].[UserId]
WHERE N'615005822' = [Extent2].[UserId]
or
from d in productRepository.FindAllProducts()
from dr in d.ProductReviews
where dr.User.UserId == 'userid'
orderby dr.CreatedTime
select new ProductReviewInfo()
{
product = new SimpleProductInfo() { Id = d.Id, Name = d.Name, Thumbnail = d.Thumbnail, Rating = d.Rating },
Rating = dr.Rating,
Comment = dr.Comment,
UserId = dr.UserId,
UserScreenName = dr.User.ScreenName,
UserThumbnail = dr.User.Thumbnail,
CreateTime = dr.CreatedTime
};
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Name] AS [Name],
[Extent1].[Thumbnail] AS [Thumbnail],
[Extent1].[Rating] AS [Rating],
[Extent2].[Rating] AS [Rating1],
[Extent2].[Comment] AS [Comment],
[Extent2].[UserId] AS [UserId],
[Extent4].[ScreenName] AS [ScreenName],
[Extent4].[Thumbnail] AS [Thumbnail1],
[Extent2].[CreatedTime] AS [CreatedTime]
FROM [Products] AS [Extent1]
INNER JOIN [ProductReviews] AS [Extent2] ON [Extent1].[Id] = [Extent2].[ProductId]
INNER JOIN [Users] AS [Extent3] ON [Extent2].[UserId] = [Extent3].[UserId]
LEFT OUTER JOIN [Users] AS [Extent4] ON [Extent2].[UserId] = [Extent4].[UserId]
WHERE N'userid' = [Extent3].[UserId]
ORDER BY [Extent2].[CreatedTime] ASC
[QUESTION 2]: Whats with the ugly outer joins?
In general, only retrieve what you need, but keep in mind to retrieve enough information so your application is not too chatty, so if you can batch a bunch of things together, do so, otherwise you'll pay network traffic cost everytime you need to go back to the database and retrieve some more stuffs.
In this case, assuming you will only need those info, I would go with the second approach (if that's what you really need).
Eager loading with .Include doesn't really play nice when you want filtering (or ordering for that matter).
That first query is basically this:
select p.*, u.*, p2.*
from products p
left outer join users u on p.userid = u.userid
left outer join purchases p2 on p.productid = p2.productid
where u.userid == #p1
Is that really what you want?
There is a view that needs to display the purchases made by a user.
Well then why are you including "Product"?
Shouldn't it just be:
from p in DBSet<Purchases>.Include("User") select p;
Your second query will error. You must project to an entity on the model, or an anonymous type - not a random class/DTO.
To be honest, the easiest and most well performing option in your current scenario is to query on the FK itself:
var purchasesForUser = DBSet<Purchases>.Where(x => x.UserId == userId);
That should produce:
select p.*
from products p
where p.UserId == #p1
The above query of course requires you to include the foreign keys in the model.
If you don't have the FK's in your model, then you'll need more LINQ-Entities trickery in the form of anonymous type projection.
Overall, don't go out looking to optimize. Create queries which align with the scenario/business requirement, then optimize if necessary - or look for alternatives to LINQ-Entities, such as stored procedures, views or compiled queries.
Remember: premature optimization is the root of all evil.
*EDIT - In response to Question Update *
[QUESTION 1]: I want to know whether all views should work with flat ViewModels with very specific data for that view, or should the ViewModels contain the entity objects.
Yes - ViewModel's should only contain what is required for that View. Otherwise why have the ViewModel? You may as well bind straight to the EF model. So, setup the ViewModel which only the fields it needs for the view.
[QUESTION 2]: What's with the ugly outer joins?
That is default behaviour for .Include. .Include always produces a left outer join.
I think the second query will throw exception because you can't map result to unmapped .NET type in Linq-to-entities. You have to return annonymous type and map it to your object in Linq-to-objects or you have to use some advanced concepts for projections - QueryView (projections in ESQL) or DefiningQuery (custom SQL query mapped to new readonly entity).
Generally it is more about design of your entities. If you select single small entity it is not a big difference to load it all instead of projection. If you are selecting list of entities you should consider projections - expecially if tables contains columns like nvarchar(max) or varbinar(max) which are not needed in your result!
Both create almost the same query: select from one table, with two inner joins. The only thing that changes from a database perspective is the amount of fields returned, but that shouldn't really matter that much.
I think here DRY wins from a performance hit (if it even exists): so my call is go for the first option.

Resources