Platform: Ruby on Rails with PostgreSQL database.
Problem:
We are doing some backfilling to migrate our data to a new structure. It's created a rather convoluted situation, and we'd like to handle it as efficiently as possible. It's partially addressed with SQL similar to this:
with rows as (
insert into responses (prompt_id, answer, received_at, user_id, category_id)
select prompt_id, null as answer, received_at, user_id, category_id
from prompts
where user_status = 0 and skipped is not true
returning id, category_id
)
insert into category_responses (category_id, response_id)
select category_id, id as response_id
from rows;
The tables and columns have been obfuscated/simplified so the reasoning behind it may not be as clear, but category_responses is a many-to-many join table. What we're doing is grabbing existing prompts, and creating a set of empty responses (answer is NULL) for each.
The piece that's missing is to then associate the records in prompts with the newly created responses. Is there a way to do this within the query? I would like to avoid adding a prompt_id column to answers if possible, but I am guessing this would be one way to handle that, including it in the returning clause, then issuing a second query to update the prompts table - and anyway I'm not even sure you can run more than one query with the results of a single with clause.
What's the best way to accomplish this?
I have settled on adding the needed column, and updated the query as follows:
with tab1 as (
insert into responses (prompt_id, answer, received_at, user_id, category_id, prompt_id)
select prompt_id, null as answer, received_at, user_id, category_id
from prompts
where user_status = 0 and skipped is not true
returning id, category_id, prompt_id
),
tab2 as (
update prompts
set response_id = tab1.response_id,
category_id = tab1.category_id
from tab1
where prompts.id = tab1.prompt_id
returning prompts.response_id as response_id, prompts.category_id as category_id
)
insert into category_responses (category_id, response_id)
select category_id, id as response_id
from tab2;
I have a table that already has values in it. The value I want to update is g_fuel_prft.billed_qty. I need to multiple a number from this table times a number from another table to get the value.
So table names are inv_header which contains inv_header.rpt_factor and g_fuel_prft which contains g_fuel_prft.billed_qty. The criteria is where inv_header.link=g_fuel_prft.lnk AND inv_header.rpt_factor = 0.
once I have those rows selected I want to update them to billed_qty * rpt_factor
update g_fuel_prft
set billed_qty = (inv_header.rpt_factor * g_fuel_prft.billed_qty)
where exists
(select billed_qty,ivh_rpt_factor from g_fuel_prft,inv_header
where g_fuel_prft.prodlnk = inv_header.ivh_link
and inv_header.ivh_rpt_factor = 0)
I am getting an 201 syntax error
You can't refer to inv_header.rpt_factor like that. You'll need to place it into a sub-query.
UPDATE g_fuel_prft
SET billed_qty = ((SELECT inv_header.rpt_factor
FROM inv_header
WHERE g_fuel_prft.prodlnk = inv_header.ivh_link) *
g_fuel_prft.billed_qty)
WHERE EXISTS (SELECT *
FROM g_fuel_prft
JOIN inv_header ON g_fuel_prft.prodlnk = inv_header.ivh_link
WHERE inv_header.ivh_rpt_factor = 0)
With an EXISTS query, the select-list doesn't matter and * is conventional. You might need more restrictions in the sub-select within the SET clause. It depends on what sort of relationship there is between the joining columns (1:1, 1:N, N:1, M:N).
Given the minimal schema below, the query above is syntactically valid. It's hard to test it without valid sample data.
DROP TABLE IF EXISTS g_fuel_prft;
CREATE TABLE g_fuel_prft
(
prodlnk INTEGER NOT NULL,
billed_qty DECIMAL(8,2) NOT NULL
);
DROP TABLE IF EXISTS inv_header;
CREATE TABLE inv_header
(
rpt_factor DECIMAL(8,4) NOT NULL,
ivh_link INTEGER NOT NULL,
ivh_rpt_factor DECIMAL(8,4) NOT NULL
);
We have an existing PropertyType called IsPublic which uses a Umbraco.TrueFalse property editor.
Requirements have changed and this value now needs to be represented by multiple checkboxes that are driven from an Enum with the Values Public, Group1, Group2.
This all works as expected but with 10's of Thousands of documents we want to save our content editors from manually populating them all.
Saving a document in Umbraco, I can see that it creates an entry in the table cmsPropertyData with the value [ "Public", "Group1", "Group2" ] in the dataNvarchar column.
I've written a script to insert a row into this table based on the value of the original IsPublic flag.
However following running this, when opening a document in Umbraco the changes aren't displayed.
The script used to update is
DECLARE #HasPublicFlag NVARCHAR(50) = '[ "Public", "Group1", "Group2" ]'
DECLARE #NoPublicFlag NVARCHAR(50) = '[ "Group1", "Group2" ]'
DECLARE #feature INT = (SELECT nodeId FROM cmsContentType WHERE Alias = 'Feature')
--Existing IsPublic flag
DECLARE #featureIsPublic INT = (SELECT id FROM cmsPropertyType WHERE Alias = 'IsPublic' AND contentTypeId = #feature)
--New PropertyType
DECLARE #featureRoleRestriction INT = (SELECT id FROM cmsPropertyType WHERE Alias = 'documentRoleRestriction' AND contentTypeId = #page)
--Get feature document versions that are either newest version or published
;WITH FeatureDocumentsToUpdate AS
(
SELECT d.*, pd.dataInt
FROM cmsDocument d
JOIN cmsPropertyData pd ON pd.versionId = d.versionId
LEFT JOIN cmsPropertyData pd2 ON pd2.versionId = d.versionId AND pd2.propertytypeid = #featureRoleRestriction
WHERE (d.newest = 1 OR d.Published = 1) AND pd.propertytypeid = #featureIsPublic AND pd2.id IS NULL
)
--INSERT INTO cmsPropertyData based on value of existing flag
INSERT INTO cmsPropertyData(contentNodeId, versionId, propertytypeid, dataNvarchar)
SELECT s.nodeId, versionId, #featureRoleRestriction,
CASE WHEN s.dataInt = 0 THEN #NoPublicFlag ELSE #HasPublicFlag END AS NewValue
FROM FeatureDocumentsToUpdate s
Is there another table(s) that will need updating or is there a better way to do this?
My guess would be that you need to republish all of the affected pages for the caches etc to update and populate with the new values properly.
With 10,000 plus documents, doing a full republish of everything might be quite slow.
You could also try and update the XML for each page in the cmsContentXml table to have the correct values, and then rebuild the Examine indexes for the site, which should do the trick and be a bit quicker. This is because the contents this table is used to rebuild the indexes to save on speed.
Another option would be to write an API Controller task that you can run once and then remove to update all of the values using the Umbraco Services, but again, that'll be quite slow I think on the volume of pages you're talking about.
I am new to Gupta Sql Base. I would like to know how to get the last inserted record in Gupta SQL
If you are using SYSDBSequence.NextVal to generate your Primary Key, either within the Insert stmt, or prior to the Insert , then you can retrieve it back immediately after the Insert by Selecting Where [Primary Key] = SYSDBSequence.Currval e.g.
Select Name from Patient Where Patient_Id = SYSDBSequence.Currval
Alternatively, If your Primary Key column has been defined as AUTO_INCREMENT , you can select it back after the Insert using MAX( [Primary Key ] ) e.g.
Select Name from Patient Where Patient_Id = (Select MAX( Patient_Id) from Patient )
Alternatively, if none of the above, then write an Insert Trigger to either return it , or to store the PK in a table so you will always have the latest PK recorded for you.
You may like to join the Gupta users forum at enter link description here or there is much archived information at enter link description here
I'm trying to convert the following SQL statement to Core Data:
delete from SomeTable
where someID not in (
select someID
from SomeTable
group by property1, property2, property3
)
Basically, I want to retrieve and delete possible duplicates in a table where a record is deemed a duplicate if property1, property2 and property3 are equal to another record.
How can I do that?
PS: As the title says, I'm trying to convert the above SQL statement into iOS Core Data methods, not trying to improve, correct or comment on the above SQL, that is beyond the point.
Thank you.
It sounds like you are asking for SQL to accomplish your objective. Your starting query won't do what you describe, and most databases wouldn't accept it at all on account of the aggregate subquery attempting to select a column that is not a function of the groups.
UPDATE
I had initially thought the request was to delete all members of each group containing dupes, and wrote code accordingly. Having reinterpreted the original SQL as MySQL would do, it seems the objective is to retain exactly one element for each combination of (property1, property2, property3). I guess that makes more sense anyway. Here is a standard way to do that:
delete from SomeTable st1
where someID not in (
select min(st2.someId)
from SomeTable st2
group by property1, property2, property3
)
That's distinguished from the original by use of the min() aggregate function to choose a specific one of the someId values to retain from each group. This should work, too:
delete from SomeTable st1
where someID in (
select st3.someId
from SomeTable st2
join SomeTable st3
on st2.property1 = st3.property1
and st2.property2 = st3.property2
and st2.property3 = st3.property3
where st2.someId < st3.someId
)
These two queries will retain the same rows. I like the second better, even though it's longer, because the NOT IN operator is kinda nasty for choosing a small number of elements from a large set. If you anticipate having enough rows to be concerned about scaling, though, then you should try both, and perhaps look into optimizations (for example, an index on (property1, property2, property3)) and other alternatives.
As for writing it in terms of Core Data calls, however, I don't think you exactly can. Core Data does support grouping, so you could write Core Data calls that perform the subquery in the first alternative and return you the entity objects or their IDs, grouped as described. You could then iterate over the groups, skip the first element of each, and call Core Data deletion methods for all the rest. The details are out of scope for the SO format.
I have to say, though, that doing such a job in Core Data is going to be far more costly than doing it directly in the database, both in time and in required memory. Doing it directly in the database is not friendly to an ORM framework such as Core Data, however. This sort of thing is one of the tradeoffs you've chosen by going with an ORM framework.
I'd recommend that you try to avoid the need to do this at all. Define a unique index on SomeTable(property1, property2, property3) and do whatever you need to do to avoid trying to creating duplicates or to gracefully recover from a (failed) attempt to do so.
DELETE SomeTable
FROM SomeTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, property1, property2, property3
FROM SomeTable
GROUP BY property1, property2, property3
) as KeepRows ON
SomeTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
A few pointers for doing this in iOS: Before iOS 9 the only way to delete objects is individually, ie you will need to iterate through an array of duplicates and delete each one. (If you are targeting iOS9, there is a new NSBatchDeleteRequest which will help delete them all in one go - it does act directly on the store but also does some cleanup to eg. ensure relationships are updated where necessary).
The other problem is identifying the duplicates. You can configure a fetch to group its results (see the propertiesToGroupBy of NSFetchRequest), but you will have to specify NSDictionaryResultType (so the results are NOT the objects themselves, just the values from the relevant properties.) Furthermore, CoreData will not let you fetch properties (other than aggregates) that are not specified in the GROUP BY. So the suggestion (in the other answer) to use min(someId) will be necessary. (To fetch an expression such as this, you will need to use an NSExpression, embed it in an NSExpressionDescription and pass the latter in propertiesToFetch of the fetch request).
The end result will be an array of dictionaries, each holding the someId value of your prime records (ie the ones you don't want to delete), from which you have then got to work out the duplicates. There are various ways, but none will be very efficient.
So as the other answer says, duplicates are better avoided in the first place. On that front, note that iOS 9 allows you to specify attributes that you would like to be unique (individually or collectively).
Let me know if you would like me to elaborate on any of the above.
Group-wise Maximum:
select t1.someId
from SomeTable t1
left outer join SomeTable t2
on t1.property1 = t2.property1
and t1.property2 = t2.property2
and t1.property3 = t2.property3
and t1.someId < t2.someId
where t2.someId is null;
So, this could be the answer
delete SomeTable
where someId not in
(select t1.someId
from SomeTable t1
left outer join SomeTable t2
on t1.property1 = t2.property1
and t1.property2 = t2.property2
and t1.property3 = t2.property3
and t1.someId < t2.someId
where t2.someId is null);
Sqlfiddle demo
You can use exists function to check for each row if there is another row that exists whose id is not equal to the current row and all other properties that define the duplicate criteria of each row are equal to all the properties of the current row.
delete from something
where
id in (SELECT
sm.id
FROM
sometable sm
where
exists( select
1
from
sometable sm2
where
sm.prop1 = sm2.prop1
and sm.prop2 = sm2.prop2
and sm.prop3 = sm2.prop3
and sm.id != sm2.id)
);
I think you could easily handle this by creating a derived duplicate_flg column and set it to 1 when all three property values are equal. Once that is done, you could just delete those records where duplicate_flg = 1. Here is a sample query on how to do this:
--retrieve all records that has same property values (property1,property2 and property3)
SELECT *
FROM (
SELECT someid
,property1
,property2
,property3
,CASE
WHEN property1 = property2
AND property1 = property3
THEN 1
ELSE 0
END AS duplicate_flg
FROM SomeTable
) q1
WHERE q1.duplicate_flg = 1;
Here is a sample delete statement:
DELETE
FROM something
WHERE someid IN (
SELECT someid
FROM (
SELECT someid
,property1
,property2
,property3
,CASE
WHEN property1 = property2
AND property1 = property3
THEN 1
ELSE 0
END AS duplicate_flg
FROM SomeTable
) q1
WHERE q1.duplicate_flg = 1
);
Simply, if you want to remove duplicate from table you can execute below Query :
delete from SomeTable
where rowid not in (
select max(rowid)
from SomeTable
group by property1, property2, property3
)
if you want to delete all duplicate records try the below code
WITH tblTemp as
(
SELECT ROW_NUMBER() Over(PARTITION BY Property1,Property2,Property3 ORDER BY Property1) As RowNumber,* FROM Table_1
)
DELETE FROM tblTemp where RowNumber >1
Hope it helps
Use the below query to delete the duplicate data from that table
delete from SomeTable where someID not in
(select Min(someID) from SomeTable
group by property1+property2+property3)