I have a collection of id from checkboxlist (checked) let say: 1, 2, 3, 4, 5
In my database I have let say: 3, 4, 6
Now, the old approach that I have is that:
1. Delete: 3, 4, 6 on the database
2. Add: 1, 2, 3, 4, 5
Now this is ok if it's small records.
The best approach will be seperating between:
1. A new record: 1, 2
2. Current record: 3, 4
3. deleted record: 6
Therefore:
(1) Insert a new record on teh databsae
(2) Leave it
(3) Delete the record
Is this the best approach?
Thanks
How many records are we talking here? Your first approach is probably the easiest and fastest even up to a pretty big recordset.
Another option would be to pull down the entire recordset with one query. Use code to find out which ones you need to insert, update, or remove, and then commit the changes in a transaction.
I assume you are worried about the amount of data you have to transfer?
Your simple approach requires a 'delete all' which is a constant amount of data transfer (you don't need to know exactly which rows are present already), followed by sending the total length of the new list.
Your more complex approach first requires you to either fetch all records from the database, or to send all records to the database and get the database to tell you the difference. Then after that you need three separate commands to update the database. This is more data transfer in total than the simple method.
On the other hand, if your rows have foreign key constraints, deleting them and creating new rows might give them new IDs which could break some constraints and cause other problems.
The first method is definitely simpler. Use that unless you want to avoid changing the row IDs.
It might not be the most pretty solution but I think you could stay with your old solution. The reasoning behind this is that your page should never have a lot of checkboxes (hundreds).
delete all previous and add 1, 2, 3, 4, 5
easy and fail-safe ;)
Related
I'm wondering about something that doesn't seem efficient to me.
I have 2 tables, one very large table DATA (millions of rows and hundreds of cols), with an id as primary key.
I then have another table, NEW_COL, with variable rows (1 to millions) but alwas 2 cols : id, and new_col_name.
I want to update the first table, adding the new_data to it.
Of course, i know how to do it with a proc sql/left join, or a data step/merge.
Yet, it seems inefficient, as far as I see with time executing, (which may be wrong), these 2 ways of doing rewrite the huge table completly, even when NEW_DATA is only 1 row (almost 1 min).
I tried doing 2 sql, with alter table add column then update, but it's waaaaaaaay too slow as update with joining doesn't seem efficient at all.
So, is there an efficient way to "add a column" to an existing table WITHOUT rewriting this huge table ?
Thanks!
SAS datasets are row stores and not columnar stores like tables in other databases. As such, adding rows is far easier and efficient than adding columns. A key joined view could be argued as the most 'efficient' way to add a column to a data rectangle.
If you are adding columns so often that the 1 min resource incursion is a problem you may need to upgrade hardware with faster drives, less contentious operating environment, or more memory and SASFILE if the new columns are often yet temporary in nature.
#Richard answer is perfect. If you are adding columns on regular basis then there is problem with your design. You either need to give more details on what you are doing and someone can suggest you.
I would try hash join. you can find code for simple hash join. This is efficient way of joining because in your case you have one large table and one small table if it fit into memory, it much better than a left join. I have done various joins using and query run times was considerably less( to order of 10)
By Altering table approach you are rewriting the table and also it causes lock on your table and nobody can use the table.
You should perform this joins when workload is less, which means during not during office and you may need to schedule the jobs in night, when more SAS resources are available
Thanks for your answers guys.
To add information, i don't have any constraint about table locking, balance load or anything as it's a "projet tool" script I use.
The goal is, in data prep step 'starting point data generator', to recompute an already existing data, or add a new one (less often but still quite regularly). Thus, i just don't want to "lose" time to wait for the whole table to rewrite while i only need to update one data for specific rows.
When i monitor the servor, the computation of the data and the joining step are very fast. But when I want tu update only 1 row, i see the whole table rewriting. Seems a waste of ressource to me.
But it seems it's a mandatory step, so can't do much about it.
Too bad.
I have a column in my Postgres table that I want to remove for expired rows. What's the best way to do this securely? It's my understanding that simply writing 0's for those columns is ineffective because Postgres creates a new row upon Updates and marks the old row as dead.
Is the best way to set the column to null and manually vacuum to clean up the old records?
I will first say that it is bad practice to alter data like this - you are changing history. Also the below is only ONE way to do this (a quick and dirty way and not to be recommended):
1 Backup your database first.
2 Open PgAdmin, select the database, open the Query Editor and run a query.
3 It would be something like this
UPDATE <table_name> SET <column_name>=<new value (eg null)>
WHERE <record is dead>
The WHERE part is for you to figure out based on you are identifying which rows are dead (eg. is_removed=true, is_deleted=true are common for identifying soft deleted records).
Obviously you would have to run this script regularly. The better way would be to update your application to do this job instead.
I have a table with a bunch of rows that i need to update (sometimes a single row, sometimes many at once). However, before updating, I would like to copy values from 2 columns to other two.
id, current_tag, current_serial, previous_tag, previous_serial
I need to copy current_tag -> previous_tag and current_serial -> previous_serial
and then update current_tag and current_serial.
Is there an elegant/fast approach for doing that?
I am also not married to the idea of having previous_tag and previous_serial columns, but I do need a way to preserve previous values in case user needs to do a roll back?
I would consider this to be a solved problem by using something like the paper_trail gem. It would provide all of the functionality you require with very elegant rollback functionality (even beyond simply the last change made, you can have a complete history).
I've found it very easy to integrate with existing apps.
It's available here https://github.com/airblade/paper_trail
I note that this isn't the only solution, there are many others here https://www.ruby-toolbox.com/categories/Active_Record_Versioning
I'm just trying to load 5 random objects in a rails controller
Thing.all(:limit => 5, :order => "RANDOM()")
Is that the least expensive way to do it?
Short answer: no.
What you have asked the db to do is: go order the entire thing table in a random order... then grab me five of them. If your thing table has a lot of rows... that's a very expensive operation.
A better option (if the ids are auto-increment and thus likely concurrent) is to generate a set of random ids within the id-range for your thing table and go fetch these individual things by those ids.
This is the best way:
Thing.all.sample(5)
I have a view where one of the joined columns is nullable but is often the only distinguishing item between two rows. I see that EF built a primary key out of all the non-nullable items in the view. I've noticed that when I pull from the view, this nullable column does not always get returned correctly, and I read that it has to do with the way it maps to the key, and will return the same row if it sees the key already exists.
Ideally the best solution would be to make my column not-nullable, but I can't do that without causing larger problems.
The other idea was to use ROW_NUMBER() to make a primary key. I am unsure whether that may cause similar issues (if the context isn't refreshed between calls, would it go solely off that or it is smart enough to realize that the queries are different?) I also worry about performance of needing an ORDER BY for the function and how that would affect dynamic ordering of the rows.
What is the best way to ensure all my rows are returned exactly as they appear through the SQL query with the least hit to performance?
Thank you..
Example:
view: A int, B int, C int?
SQL Results:
1, 2, null
1, 3, 10
1, 3, 11
EF will return something like :
1, 2, null
1, 3, 10
1, 3, 10
I need to get that 11, too.
This happens due to identity map pattern. By default EF keep track of already loaded entities (identified by the entity key) - if the result set contains repeating entity key EF thinks that it is the same entity as the one already loaded and it doesn't create a new entity instance for those repeating records - instead it uses the instance created for the first record with that key. This is necessary for change tracking and for ability to save changes back to the database.
In your case you most probably don't want to save changes back to database because these records don't give you necessary information to be able to do that. So load records without change tracking and it should skip identity map pattern and generate new entity instance for every record in the result set:
context.YourEntitySet.MergeOption = MergeOption.NoTracking;
// Now execute your query