How can I add a number column that tracks deletions? - ruby-on-rails

Is there a gem or some database logic which I can use to add a number column to my database that tracks adds and deletes?
For example, GitHub has issues. An issue has a database ID. But it also has a number which is like a human readable identifier. If an issue is deleted, the number continues to increase. And repo A can have an issue with a number, and that doesn’t conflict with repo B.

Create a new table add a column to the Table like deleteCount. Everytime you call delete function or method. Just add a line that increments deleteCount like deleteCount++ to the success block. Same goes to the add method.

I believe you are overthinking it. The id column in rails (which all models possess by default) works the way you are thinking.
If you want a more human readable number as well, I would look at this gem (it has a ton of potential uses):
https://github.com/norman/friendly_id
Edit:
Looks like you might actually be looking for basically number of children of a parent. Logic looks like this:
When a parent is created a child_count column is set to 0
Whenever child is created for that parent, it increments the parent count, saves the result (this must be done atomically to avoid problems), which returns the current child_count
Set that child_count the childs parent_child_id.
The key trickly bit is that #2 has to be done atomically to avoid problems. So lock the row, update the column, then unlock the row.
Code roughly looks like:
# In Child model
after_commit :add_parent_child_id
def add_parent_child_id
parent.with_lock do
new_child_count = parent.child_count + 1
parent.child_count = new_child_count
parent.save!
self.update!(parent_child_id: new_child_count)
end
end

Related

locking rows on rails update to avoid collisions. (Postgres back end)

So I have a method on my model object which creates a unique sequence number when a binary field in the row is updated from null to true. Its implemented like this:
class AnswerHeader < ApplicationRecord
before_save :update_survey_complete_sequence, if: :survey_complete_changed?
def update_survey_complete_sequence
maxval =AnswerHeader.maximum('survey_complete_sequence')
self.survey_complete_sequence=maxval+1
end
end
My question is what do I need to lock so two rows being updated at the same time don't end up with two rows having the same survey_complete_sequence?
If it is possible to lock a single row rather than whole table that would be good because this is a often accessed table by users.
If you want to handle this in application logic itself, instead of letting database handle this. You make make use of rails with_lock function that will create a transaction and acquire a row level db lock on the selected rows(in your case a single row).
What you need to lock
In your case you have to lock the row containing the maximum survey_complete_sequence, since this is the row every query will look for while getting the value you require.
maxval =AnswerHeader.maximum('survey_complete_sequence')
Is it possible to lock a single row rather than whole table
There is no such specific lock for your scenario. But you can make use of Postgresql's SELECT FOR UPDATE row-level locking.
To acquire an exclusive row-level lock on a row without actually
modifying the row, select the row with SELECT FOR UPDATE.
And you can use pessimistic locking in rails and specify which lock you will use.
Call lock('some locking clause') to use a database-specific locking
clause of your own such as 'LOCK IN SHARE MODE' or 'FOR UPDATE NOWAIT'
Here's an example of how to achieve that from rails official guide itself
Item.transaction do
i = Item.lock("LOCK IN SHARE MODE").find(1)
...
end
Relations using lock are usually wrapped inside a transaction for preventing deadlock conditions.
So what you need doing is -
Apply SELECT FOR UPDATE lock to row consisting maximum('survey_complete_sequence')
Get the value you require from that row
Update your AnswerHeader with the value received
I believe you should give advisory locks a look. It makes sure the same block of code isn't executed on two machines simultaneously, while still keeping the table open for other business.
It uses the database, but it doesn't lock your tables.
You can use the gem called "with_advisory_lock" like this:
Model.with_advisory_lock("ADVISORY_LOCK_NAME") do
# Your code
end
https://github.com/ClosureTree/with_advisory_lock
It doesn't work with SQLite.
If you are using postgress, maybe Sequenced can help you out without defining a sequence at the DB level.
Is there a reason survey_complete_sequence should be incremental? if not, maybe randomize a bigint?
You probably don't want to lock the table, and even if you lock the row you're currently updating the row you're basing your maxval on will be available for another update to read and generate its sequence #.
Unless you have a huge table and lots of updates every millisecond (in the order of thousands) this shouldn't be an issue in real life. But if the idea bothers you, you can go ahead and add an unique index to the table on "survey_complete_sequence" column. The DB error will propagate to a Rails exception you can deal with within the application.

prevent certain column values from returning when calling an active record instance (without limiting access to them in future)

I have a database which is postgis enabled which has some records (in a given table, there are many) that have enormous :spatial column values.
I often use the awesome_print gem to view records quickly while working. It colorizes and nicely displays a given record (or records') information for quick review. The problem in this case is that 99% of the terminal display is devoting to showing these spatial column's multi-page length list of coordinates in the WKT format.
I'd like for activerecord to not return these objects when viewing these with the ap (awesome print) command. Is there any way to do that without breaking something else? Can I just instruct ActiveRecord to hide these column's values unless specifically requested or is that too much to ask?
One way or another you need to designate which fields to print, or not to print. For instance you could define a helper for this and put it, say, in the your console config file, e.g.:
def ap_article(article, cols=%w[col1 col2 col3])
ap article.attributes.slice(*cols)
end
or perhaps something like this if you just want to ignore the spatial columns
def ap_article(article)
cols = article.class.columns.select {|c| c.type != :spatial}.map(&:name)
ap article.attributes.slice(*cols)
end

Display a record sequentially with every refresh

I have a Rails 3 application that currently shows a single "random" record with every refresh, however, it repeats records too often, or will never show a particular record. I was wondering what a good way would be to loop through each record and display them such that all get shown before any are repeated. I was thinking somehow using cookies or session_ids to sequentially loop through the record id's, but I'm not sure if that would work right, or exactly how to go about that.
The database consists of a single table with a single column, and currently only about 25 entries, but more will be added. ID's are generated automatically and are sequential.
Some suggestions would be appreciated.
Thanks.
The funny thing about 'random' is that it doesn't usually feel random when you get the same answer twice in short succession.
The usual answer to this problem is to generate a queue of responses, and make sure when you add entries to the queue that they aren't already on the queue. This can either be a queue of entries that you will return to the user, or a queue of entries that you have already returned to the user. I like your idea of using the record ids, but with only 25 entries, that repeating loop will also be annoying. :)
You could keep track of the queue of previous entries in memcached if you've already got one deployed or you could stuff the queue into the session (it'll probably just be five or six integers, not too excessive data transfer) or the database.
I think I'd avoid the database, because it sure doesn't need to be persistent, it doesn't need to take database bandwidth or compute time, and using the database just to keep track of five or six integers seems silly. :)
UPDATE:
In one of your controllers (maybe ApplicationController), add something like this to a method that you run in a before_filter:
class ApplicationController < ActionController::Base
before_filter :find_quip
def find_quip:
last_quip_id = session[:quip_id] || Quips.find(:first).id
new_quip_id = Quips.find(last_quip.id + 1).id || Quips.find(:first)
session[:quip_id] = new_quip
end
end
I'm not so happy with the code to wrap around when you run out of quips; it'll completely screw up if there is ever a hole in the sequence. Which is probably going to happen someday. And it will start on number 2. But I'm getting too tired to sort it out. :)
If there are only going to be not too many like you say, you could store the entire array of IDs as a session variable, with another variable for the current index, and loop through them sequentially, incrementing the index.

Queue management in Rails

I am planning to have something like this for a website that is on Ruby on Rails. User comes and enters a bunch of names in a text field, and a queue gets created from all the names. From there the website keeps asking more details for each one from the queue until the queue finishes.
Is there any queue management gem available in Ruby or I have to just create an array and keep incrementing the index in session variable to emulate a queue behaviour?
The easiest thing is probably to use the push and shift methods of ruby arrays.
Push sticks things on the end of the array, shift will return and remove the first element.
As you receive data about each of the names, you could construct a second list of the names - a done array. Or if you're not concerned about that and just want to save and more on with them, just store the array in the session (assuming it's not going to be massive) and move on.
If your array is massive, consider storing the names to be added in temporary rows in a table then removing them when necessary. If this is the route you take, be sure to have a regularly running cleanup routine that removes entries that were never filled out.
References
http://apidock.com/ruby/Array/push
http://apidock.com/ruby/Array/shift
Try modeling a Queue with ActiveRecord
Queue.has_many :tasks
attributes: name, id, timestamps
Task.belongs_to :queue
attributes: name, id, position, timestamps, completed
Use timestamps to set initial position. Once a task is completed, set position to [highest position]+1 (assuming the lower the position number, the higher up on the queue). Completed tasks will sink to the bottom of the queue and a new task will rise to the top.
Hope this helps!

Can one rely on the auto-incrementing primary key in your database?

In my present Rails application, I am resolving scheduling conflicts by sorting the models by the "created_at" field. However, I realized that when inserting multiple models from a form that allows this, all of the created_at times are exactly the same!
This is more a question of best programming practices: Can your application rely on your ID column in your database to increment greater and greater with each INSERT to get their order of creation? To put it another way, can I sort a group of rows I pull out of my database by their ID column and be assured this is an accurate sort based on creation order? And is this a good practice in my application?
The generated identification numbers will be unique.
Regardless of whether you use Sequences, like in PostgreSQL and Oracle or if you use another mechanism like auto-increment of MySQL.
However, Sequences are most often acquired in bulks of, for example 20 numbers.
So with PostgreSQL you can not determine which field was inserted first. There might even be gaps in the id's of inserted records.
Therefore you shouldn't use a generated id field for a task like that in order to not rely on database implementation details.
Generating a created or updated field during command execution is much better for sorting by creation-, or update-time later on.
For example:
INSERT INTO A (data, created) VALUES (smething, DATE())
UPDATE A SET data=something, updated=DATE()
That depends on your database vendor.
MySQL I believe absolutely orders auto increment keys. SQL Server I don't know for sure that it does or not but I believe that it does.
Where you'll run into problems is with databases that don't support this functionality, most notably Oracle that uses sequences, which are roughly but not absolutely ordered.
An alternative might be to go for created time and then ID.
I believe the answer to your question is yes...if I read between the lines, I think you are concerned that the system may re-use ID's numbers that are 'missing' in the sequence, and therefore if you had used 1,2,3,5,6,7 as ID numbers, in all the implementations I know of, the next ID number will always be 8 (or possibly higher), but I don't know of any DB that would try and figure out that record Id #4 is missing, so attempt to re-use that ID number.
Though I am most familiar with SQL Server, I don't know why any vendor who try and fill the gaps in a sequence - think of the overhead of keeping that list of unused ID's, as opposed to just always keeping track of the last I number used, and adding 1.
I'd say you could safely rely on the next ID assigned number always being higher than the last - not just unique.
Yes the id will be unique and no, you can not and should not rely on it for sorting - it is there to guarantee row uniqueness only. The best approach is, as emktas indicated, to use a separate "updated" or "created" field for just this information.
For setting the creation time, you can just use a default value like this
CREATE TABLE foo (
id INTEGER UNSIGNED AUTO_INCREMENT NOT NULL;
created TIMESTAMP NOT NULL DEFAULT NOW();
updated TIMESTAMP;
PRIMARY KEY(id);
) engine=InnoDB; ## whatever :P
Now, that takes care of creation time. with update time I would suggest an AFTER UPDATE trigger like this one (of course you can do it in a separate query, but the trigger, in my opinion, is a better solution - more transparent):
DELIMITER $$
CREATE TRIGGER foo_a_upd AFTER UPDATE ON foo
FOR EACH ROW BEGIN
SET NEW.updated = NOW();
END;
$$
DELIMITER ;
And that should do it.
EDIT:
Woe is me. Foolishly I've not specified, that this is for mysql, there might be some differences in the function names (namely, 'NOW') and other subtle itty-bitty.
One caveat to EJB's answer:
SQL does not give any guarantee of ordering if you don't specify an order by column. E.g. if you delete some early rows, then insert 'em, the new ones may end up living in the same place in the db the old ones did (albeit with new IDs), and that's what it may use as its default sort.
FWIW, I typically use order by ID as an effective version of order by created_at. It's cheaper in that it doesn't require adding an index to a datetime field (which is bigger and therefore slower than a simple integer primary key index), guaranteed to be different, and I don't really care if a few rows that were added at about the same time sort in some slightly different order.
This is probably DB engine depended. I would check how your DB implements sequences and if there are no documented problems then I would decide to rely on ID.
E.g. Postgresql sequence is OK unless you play with the sequence cache parameters.
There is a possibility that other programmer will manually create or copy records from different DB with wrong ID column. However I would simplify the problem. Do not bother with low probability cases where someone will manually destroy data integrity. You cannot protect against everything.
My advice is to rely on sequence generated IDs and move your project forward.
In theory yes the highest id number is the last created. Remember though that databases do have the ability to temporaily turn off the insert of the autogenerated value , insert some records manaully and then turn it back on. These inserts are no typically used on a production system but can happen occasionally when moving a large chunk of data from another system.

Resources