Queue management in Rails - ruby-on-rails

I am planning to have something like this for a website that is on Ruby on Rails. User comes and enters a bunch of names in a text field, and a queue gets created from all the names. From there the website keeps asking more details for each one from the queue until the queue finishes.
Is there any queue management gem available in Ruby or I have to just create an array and keep incrementing the index in session variable to emulate a queue behaviour?

The easiest thing is probably to use the push and shift methods of ruby arrays.
Push sticks things on the end of the array, shift will return and remove the first element.
As you receive data about each of the names, you could construct a second list of the names - a done array. Or if you're not concerned about that and just want to save and more on with them, just store the array in the session (assuming it's not going to be massive) and move on.
If your array is massive, consider storing the names to be added in temporary rows in a table then removing them when necessary. If this is the route you take, be sure to have a regularly running cleanup routine that removes entries that were never filled out.
References
http://apidock.com/ruby/Array/push
http://apidock.com/ruby/Array/shift

Try modeling a Queue with ActiveRecord
Queue.has_many :tasks
attributes: name, id, timestamps
Task.belongs_to :queue
attributes: name, id, position, timestamps, completed
Use timestamps to set initial position. Once a task is completed, set position to [highest position]+1 (assuming the lower the position number, the higher up on the queue). Completed tasks will sink to the bottom of the queue and a new task will rise to the top.
Hope this helps!

Related

locking rows on rails update to avoid collisions. (Postgres back end)

So I have a method on my model object which creates a unique sequence number when a binary field in the row is updated from null to true. Its implemented like this:
class AnswerHeader < ApplicationRecord
before_save :update_survey_complete_sequence, if: :survey_complete_changed?
def update_survey_complete_sequence
maxval =AnswerHeader.maximum('survey_complete_sequence')
self.survey_complete_sequence=maxval+1
end
end
My question is what do I need to lock so two rows being updated at the same time don't end up with two rows having the same survey_complete_sequence?
If it is possible to lock a single row rather than whole table that would be good because this is a often accessed table by users.
If you want to handle this in application logic itself, instead of letting database handle this. You make make use of rails with_lock function that will create a transaction and acquire a row level db lock on the selected rows(in your case a single row).
What you need to lock
In your case you have to lock the row containing the maximum survey_complete_sequence, since this is the row every query will look for while getting the value you require.
maxval =AnswerHeader.maximum('survey_complete_sequence')
Is it possible to lock a single row rather than whole table
There is no such specific lock for your scenario. But you can make use of Postgresql's SELECT FOR UPDATE row-level locking.
To acquire an exclusive row-level lock on a row without actually
modifying the row, select the row with SELECT FOR UPDATE.
And you can use pessimistic locking in rails and specify which lock you will use.
Call lock('some locking clause') to use a database-specific locking
clause of your own such as 'LOCK IN SHARE MODE' or 'FOR UPDATE NOWAIT'
Here's an example of how to achieve that from rails official guide itself
Item.transaction do
i = Item.lock("LOCK IN SHARE MODE").find(1)
...
end
Relations using lock are usually wrapped inside a transaction for preventing deadlock conditions.
So what you need doing is -
Apply SELECT FOR UPDATE lock to row consisting maximum('survey_complete_sequence')
Get the value you require from that row
Update your AnswerHeader with the value received
I believe you should give advisory locks a look. It makes sure the same block of code isn't executed on two machines simultaneously, while still keeping the table open for other business.
It uses the database, but it doesn't lock your tables.
You can use the gem called "with_advisory_lock" like this:
Model.with_advisory_lock("ADVISORY_LOCK_NAME") do
# Your code
end
https://github.com/ClosureTree/with_advisory_lock
It doesn't work with SQLite.
If you are using postgress, maybe Sequenced can help you out without defining a sequence at the DB level.
Is there a reason survey_complete_sequence should be incremental? if not, maybe randomize a bigint?
You probably don't want to lock the table, and even if you lock the row you're currently updating the row you're basing your maxval on will be available for another update to read and generate its sequence #.
Unless you have a huge table and lots of updates every millisecond (in the order of thousands) this shouldn't be an issue in real life. But if the idea bothers you, you can go ahead and add an unique index to the table on "survey_complete_sequence" column. The DB error will propagate to a Rails exception you can deal with within the application.

What's the best pattern for logging data on a Stateful Object?

Currently I'm thinking about adding a json array column (I'm using postgres) and just pumping log messages for the object into this attribute. I want to log progress (The object is an import report that does a lot of stuff and takes a while so it's useful to have a sense of what's currently happening - how many rows have been imported, how many rows have been normalized, etc -
The other option is to add one of the gems that allow you to see logs streamed in a view, but this I think isn't as useful since what I'm looking for is something where I can see the history of this specific object.
Using a json column or json[] (PostgreSQL array of json) is a very bad idea for logging.
Each time you update it, the whole column contents must be read, modified in memory, and written out again in their entirety.
Instead, create a table used for logs for objects of this kind, with a FK to the table being logged and a timestamp for each entry. Insert a row for each log entry.
BTW, if the report runs in a single transaction, other clients won't be able to see any of the log rows until the whole view commits, in which case it won't be good for progress monitoring, but neither will your original idea. You'll need to use NOTICE messages instead.

How should I auto-expire entires in an ETS table, while also limiting its total size?

I have a lot of analytics data which I'm looking to aggregate every so often (let's say one minute.) The data is being sent to a process which stores it in an ETS table, and every so often a timer sends it a message to process the table and remove old data.
The problem is that the amount of data that comes in varies wildly, and I basically need to do two things to it:
If the amount of data coming in is too big, drop the oldest data and push the new data in. This could be viewed as a fixed size queue, where if the amount of data hits the limit, the queue would start dropping things from the front as new data comes to the back.
If the queue isn't full, but the data has been sitting there for a while, automatically discard it (after a fixed timeout.)
If these two conditions are kept, I could basically assume the table has a constant size, and everything in it is newer than X.
The problem is that I haven't found an efficient way to do these two things together. I know I could use match specs to delete all entires older than X, which should be pretty fast if the index is the timestamp. Though I'm not sure if this is the best way to periodically trim the table.
The second problem is keeping the total table size under a certain limit, which I'm not really sure how to do. One solution comes to mind is to use an auto-increment field wich each insert, and when the table is being trimmed, look at the first and the last index, calculate the difference and again, use match specs to delete everything below the threshold.
Having said all this, it feels that I might be using the ETS table for something it wasn't designed to do. Is there a better way to store data like this, or am I approaching the problem correctly?
You can determine the amount of data occupied using ets:info(Tab, memory). The result is in number of words. But there is a catch. If you are storing binaries only heap binaries are included. So if you are storing mostly normal Erlang terms you can use it and with a timestamp as you described, it is a way to go. For size in bytes just multiply by erlang:system_info(wordsize).
I haven't used ETS for anything like this, but in other NoSQL DBs (DynamoDB) an easy solution is to use multiple tables: If you're keeping 24 hours of data, then keep 24 tables, one for each hour of the day. When you want to drop data, drop one whole table.
I would do the following: Create a server responsible for
receiving all the data storage messages. This messages should be time stamped by the client process (so it doesn't matter if it waits a little in the message queue). The server will then store then in the ETS, configured as ordered_set and using the timestamp, converted in an integer, as key (if the timestamps are delivered by the function erlang:now in one single VM they will be different, if you are using several nodes, then you will need to add some information such as the node name to guarantee uniqueness).
receiving a tick (using for example timer:send_interval) and then processes the message received in the last N µsec (using the Key = current time - N) and looking for ets:next(Table,Key), and continue to the last message. Finally you can discard all the messages via ets:delete_all_objects(Table). If you had to add an information such as a node name, it is still possible to use the next function (for example the keys are {TimeStamp:int(),Node:atom()} you can compare to {Time:int(),0} since a number is smaller than any atom)

Is is possible in ruby to set a specific active record call to read dirty

I am looking at a rather large database.. Lets say I have an exported flag on the product records.
If I want an estimate of how many products I have with the flag set to false, I can do a call something like this
Product.where(:exported => false).count.. .
The problem I have is even the count takes a long time, because the table of 1 million products is being written to. More specifically exports are happening, and the value I'm interested in counting is ever changing.
So I'd like to do a dirty read on the table... Not a dirty read always. And I 100% don't want all subsequent calls to the database on this connection to be dirty.
But for this one call, dirty is what I'd like.
Oh.. I should mention ruby 1.9.3 heroku and postgresql.
Now.. if I'm missing another way to get the count, I'd be excited to try that.
OH SNOT one last thing.. this example is contrived.
PostgreSQL doesn't support dirty reads.
You might want to use triggers to maintain a materialized view of the count - but doing so will mean that only one transaction at a time can insert a product, because they'll contend for the lock on the product count in the summary table.
Alternately, use system statistics to get a fast approximation.
Or, on PostgreSQL 9.2 and above, ensure there's a primary key (and thus a unique index) and make sure vacuum runs regularly. Then you should be able to do quite a fast count, as PostgreSQL should choose an index-only scan on the primary key.
Note that even if Pg did support dirty reads, the read would still not return perfectly up to date results because rows would sometimes inserted behind the read pointer in a sequential scan. The only way to get a perfectly up to date count is to prevent concurrent inserts: LOCK TABLE thetable IN EXCLUSIVE MODE.
As soon as a query begins to execute it's against a frozen read-only state because that's what MVCC is all about. The values are not changing in that snapshot, only in subsequent amendments to that state. It doesn't matter if your query takes an hour to run, it is operating on data that's locked in time.
If your queries are taking a very long time it sounds like you need an index on your exported column, or whatever values you use in your conditions, as a COUNT against an indexed an column is usually very fast.

Display a record sequentially with every refresh

I have a Rails 3 application that currently shows a single "random" record with every refresh, however, it repeats records too often, or will never show a particular record. I was wondering what a good way would be to loop through each record and display them such that all get shown before any are repeated. I was thinking somehow using cookies or session_ids to sequentially loop through the record id's, but I'm not sure if that would work right, or exactly how to go about that.
The database consists of a single table with a single column, and currently only about 25 entries, but more will be added. ID's are generated automatically and are sequential.
Some suggestions would be appreciated.
Thanks.
The funny thing about 'random' is that it doesn't usually feel random when you get the same answer twice in short succession.
The usual answer to this problem is to generate a queue of responses, and make sure when you add entries to the queue that they aren't already on the queue. This can either be a queue of entries that you will return to the user, or a queue of entries that you have already returned to the user. I like your idea of using the record ids, but with only 25 entries, that repeating loop will also be annoying. :)
You could keep track of the queue of previous entries in memcached if you've already got one deployed or you could stuff the queue into the session (it'll probably just be five or six integers, not too excessive data transfer) or the database.
I think I'd avoid the database, because it sure doesn't need to be persistent, it doesn't need to take database bandwidth or compute time, and using the database just to keep track of five or six integers seems silly. :)
UPDATE:
In one of your controllers (maybe ApplicationController), add something like this to a method that you run in a before_filter:
class ApplicationController < ActionController::Base
before_filter :find_quip
def find_quip:
last_quip_id = session[:quip_id] || Quips.find(:first).id
new_quip_id = Quips.find(last_quip.id + 1).id || Quips.find(:first)
session[:quip_id] = new_quip
end
end
I'm not so happy with the code to wrap around when you run out of quips; it'll completely screw up if there is ever a hole in the sequence. Which is probably going to happen someday. And it will start on number 2. But I'm getting too tired to sort it out. :)
If there are only going to be not too many like you say, you could store the entire array of IDs as a session variable, with another variable for the current index, and loop through them sequentially, incrementing the index.

Resources