I have a Resque queue of jobs. Each job has a batch of events to be processed. A worker will process the jobs and count the number of events that occurred in each minute. It uses ActiveRecord to get a "datapoint" for that minute and add the number of events to it and save.
When I have multiple workers processing that queue, I believe there is a concurrency issue. I think there is a race condition between getting the datapoint from the database, adding the correct amount, and updating to the new value. I looked into Transactions, but I think that only helps if the query would fail.
My current workaround is only using 1 Resque Worker. I'd like to scale and process jobs faster, though. Any ideas?
Edit: I originally had trouble finding key words to search Google for, but thanks to Robin, I found the answer to my question here.
The correct answer is to use increment_counter or update_counters if you need to increment multiple attributes or a value other than +1. Both of them are model classes.
Related
Due the following use case:
(:Product)-[:HAS_PRICE]->(:Price)-[:HAS_CURRENCY]->(:Currency)
Having 1000 products and only one (1) currency supported, let’s say (:Currency {code:'USD'}).
To insert a Price for each Product, thousand (1000) of relations will be created against the node (:Currency {code:'USD'}.
Having 3 working threads receiving the prices and setting them up, the (:Currency {code:'USD'} node will be locked for the others two workers while one of the worker creates the edge -["HAS_CURRENCY]->.
Implementing a RETRY/BACK-OFF approach avoid some failures, but the RETRY threshold must be higher enough (about 100 in my case) to avoid all deadlocks and setting a long BACK-OFF delay is not worthy.
Chris Vest commented about modifications in locking and transaction isolation behavior in a previous post.
Is there anything we can do to avoid this issue? Any tips around server configuration, data modeling, etc.?
Thanks in advance.
Note - This question expands on an answer to another question here.
I'm importing a file into my DB by chunking it up into smaller groups and spawning background jobs to import each chunk (of 100 rows).
I want a way to track progress of how many chunks have been imported so far, so I had planned on each job incrementing a DB field by 1 when it's done so I know how many have processed so far.
This has a potential situation of two parallel jobs incrementing the DB field by 1 simultaneously and overwriting each other.
What's the best way to avoid this condition and ensure an atomic parallel operation? The linked post above suggests using Redis, which is one good approach. For the purposes of this question I'm curious if there is an alternate way to do it using persistent storage.
I'm using ActiveRecord in Rails with Postgres as my DB.
Thanks!
I suggest to NOT incrementing a DB field by 1, instead, create a DB record with for each job with a job id. There are two benefits:
You can count the number of records to let you know how many have processed without worrying about parallel operations.
You can also add some necessary logs into each job record and easily debug when any of the jobs fails when importing.
I suggest you use a postgresql sequence.
See CREATE SEQUENCE and Sequence Manipulation.
Especially nextval():
Advance the sequence object to its next value and return that value. This is done atomically: even if multiple sessions execute nextval concurrently, each will safely receive a distinct sequence value.
I am trying to obtain count the number of Postgres Statements my Ruby on Rails application is performing against our database. I found this entry on stackoverflow, but it counts transactions. We have several transactions that make very large numbers of statements, so that doesn't give a good picture. I am hoping the data is available from PG itself - rather than trying to parse a log.
https://dba.stackexchange.com/questions/35940/how-many-queries-per-second-is-my-postgres-executing
I think you are looking for ActiveSupport instrumentation. Part of Rails, this framework is used throughout Rails applications to publish certain events. For example, there's an sql.activerecord event type that you can subscribe to to count your queries.
ActiveSupport::Notifications.subscribe "sql.activerecord" do |*args|
counter++
done
You could put this in config/initializers/ (to count across the app) or in one of the various before_ hooks of a controller (to count statements for a single request).
(The fine print: I have not actually tested this snippet, but that's how it should work AFAIK.)
PostgreSQL provides a few facilities that will help.
The main one is pg_stat_statements, an extension you can install to collect statement statistics. I strongly recommend this extension, it's very useful. It can tell you which statements run most often, which take the longest, etc. You can query it to add up the number of queries for a given database.
To get a rate over time you should have a script sample pg_stat_statements regularly, creating a table with the values that changed since last sample.
The pg_stat_database view tracks values including the transaction rate. It does not track number of queries.
There's pg_stat_user_tables, pg_stat_user_indexes, etc, which provide usage statistics for tables and indexes. These track individual index scans, sequential scans, etc done by a query, but again not the number of queries.
I have a couple jobs that are set via after_create hook of an object. During my tests, i have been saving the object so one job fires around 5 minutes later, and one job 10 minutes later (using two attributes of the model that are datetime). The problem is, both seem to execute right away. If i create the object setting the two date values 24 hours in the future, it seems to wait. So I'm wondering if the time the worker thinks it is, is different then what the server is. Is there a way to make sure the delayed_job worker is in sync?
Here's the code for queueing the jobs:
Delayed::Job.enqueue(CharityProductCreateJob.new(self.id, shop.id), 0, self.start_date)
Delayed::Job.enqueue(CharityProductCreateJob.new(self.id, shop.id), 0, self.end_date)
This may answer your question http://www.gregbenedict.com/2009/08/19/is-delayed-job-run_at-datetime-giving-you-fits/
I have a Rails 3 application that currently shows a single "random" record with every refresh, however, it repeats records too often, or will never show a particular record. I was wondering what a good way would be to loop through each record and display them such that all get shown before any are repeated. I was thinking somehow using cookies or session_ids to sequentially loop through the record id's, but I'm not sure if that would work right, or exactly how to go about that.
The database consists of a single table with a single column, and currently only about 25 entries, but more will be added. ID's are generated automatically and are sequential.
Some suggestions would be appreciated.
Thanks.
The funny thing about 'random' is that it doesn't usually feel random when you get the same answer twice in short succession.
The usual answer to this problem is to generate a queue of responses, and make sure when you add entries to the queue that they aren't already on the queue. This can either be a queue of entries that you will return to the user, or a queue of entries that you have already returned to the user. I like your idea of using the record ids, but with only 25 entries, that repeating loop will also be annoying. :)
You could keep track of the queue of previous entries in memcached if you've already got one deployed or you could stuff the queue into the session (it'll probably just be five or six integers, not too excessive data transfer) or the database.
I think I'd avoid the database, because it sure doesn't need to be persistent, it doesn't need to take database bandwidth or compute time, and using the database just to keep track of five or six integers seems silly. :)
UPDATE:
In one of your controllers (maybe ApplicationController), add something like this to a method that you run in a before_filter:
class ApplicationController < ActionController::Base
before_filter :find_quip
def find_quip:
last_quip_id = session[:quip_id] || Quips.find(:first).id
new_quip_id = Quips.find(last_quip.id + 1).id || Quips.find(:first)
session[:quip_id] = new_quip
end
end
I'm not so happy with the code to wrap around when you run out of quips; it'll completely screw up if there is ever a hole in the sequence. Which is probably going to happen someday. And it will start on number 2. But I'm getting too tired to sort it out. :)
If there are only going to be not too many like you say, you could store the entire array of IDs as a session variable, with another variable for the current index, and loop through them sequentially, incrementing the index.