I have a couple jobs that are set via after_create hook of an object. During my tests, i have been saving the object so one job fires around 5 minutes later, and one job 10 minutes later (using two attributes of the model that are datetime). The problem is, both seem to execute right away. If i create the object setting the two date values 24 hours in the future, it seems to wait. So I'm wondering if the time the worker thinks it is, is different then what the server is. Is there a way to make sure the delayed_job worker is in sync?
Here's the code for queueing the jobs:
Delayed::Job.enqueue(CharityProductCreateJob.new(self.id, shop.id), 0, self.start_date)
Delayed::Job.enqueue(CharityProductCreateJob.new(self.id, shop.id), 0, self.end_date)
This may answer your question http://www.gregbenedict.com/2009/08/19/is-delayed-job-run_at-datetime-giving-you-fits/
Related
Note - This question expands on an answer to another question here.
I'm importing a file into my DB by chunking it up into smaller groups and spawning background jobs to import each chunk (of 100 rows).
I want a way to track progress of how many chunks have been imported so far, so I had planned on each job incrementing a DB field by 1 when it's done so I know how many have processed so far.
This has a potential situation of two parallel jobs incrementing the DB field by 1 simultaneously and overwriting each other.
What's the best way to avoid this condition and ensure an atomic parallel operation? The linked post above suggests using Redis, which is one good approach. For the purposes of this question I'm curious if there is an alternate way to do it using persistent storage.
I'm using ActiveRecord in Rails with Postgres as my DB.
Thanks!
I suggest to NOT incrementing a DB field by 1, instead, create a DB record with for each job with a job id. There are two benefits:
You can count the number of records to let you know how many have processed without worrying about parallel operations.
You can also add some necessary logs into each job record and easily debug when any of the jobs fails when importing.
I suggest you use a postgresql sequence.
See CREATE SEQUENCE and Sequence Manipulation.
Especially nextval():
Advance the sequence object to its next value and return that value. This is done atomically: even if multiple sessions execute nextval concurrently, each will safely receive a distinct sequence value.
In my api, I'd like to run a rake task every 10 minutes(for instance at 8:00, 8:10..). This task can take up to 10 minutes to execute(Thus, the next task can start its execution before the first one ends).
Since this task executes a lot of time-consuming queries at the beginning, I thought about splitting my rake task to two:
The first will execute these time-consuming queries. I'll fire this task every 10 minutes(at 8:05, 8:15...).
I'd like to use the result of the queries executed at the first rake task in the second one(which will be fired at 8:10, 8:20).
I was wondering if there is way a to store the result of the first rake task and use it in the second one? Or, is there another way to do what I am planing to build?
FYI, I don't want to use database storage since writing/reading takes a lot of time.
This makes no sense on the face of it. If your second task is dependent on your first, and your first only prepares data for the second, you either need to store the data in the database, or (more sensibly) just make it one task. Whatever other non-database storage solutions you come up with aren't going to be any better than a database.
If you need it to be two tasks, then modularlize things. Use three tasks; the first two can be run independently, and the third will invoke the first and pass the result to the second.
Use database tables or memcache/redis for storing the results.
for example, create a table called cached_results (don't use this name, use a name which is closer to your business logic) and store the results with a timestamp. Or similarly you can use memcache/redis with some sort of ttl
In Gemfire, my records should expire n seconds after the last time the record was read.
entry-idle-time seems to fit that description but I cannot get it to work.
I tried
gfe:entry-tti action="DESTROY" timeout="120"
This works fine when I have just one server, but when I have 2 servers with redundant copies = 1, my entries get removed even when my test program has been querying them every few seconds.
I tried action="LOCAL_DESTROY", but then the server does not start-up at all.
How can I get entries to stay alive as long as someone is querying them.
Thanks
I think the problem is that the last modified time is only updated in in whichever copy the query goes to. The other copy then deletes the entry when it expires. If you run with redundancy 0, I think you will get the expiration behavior you expect.
The gemfire docs have a note about not using entry-idle-time with PRs, I think maybe for this reason:
"For partitioned regions, to ensure reliable read behavior, use the time-to-live attributes, not the idle-time attributes. In addition, you cannot use local-destroy or local-invalidate expiration actions in partitioned regions."
http://gemfire.docs.pivotal.io/latest/userguide/index.html#developing/expiration/configuring_data_expiration.html
I have a Resque queue of jobs. Each job has a batch of events to be processed. A worker will process the jobs and count the number of events that occurred in each minute. It uses ActiveRecord to get a "datapoint" for that minute and add the number of events to it and save.
When I have multiple workers processing that queue, I believe there is a concurrency issue. I think there is a race condition between getting the datapoint from the database, adding the correct amount, and updating to the new value. I looked into Transactions, but I think that only helps if the query would fail.
My current workaround is only using 1 Resque Worker. I'd like to scale and process jobs faster, though. Any ideas?
Edit: I originally had trouble finding key words to search Google for, but thanks to Robin, I found the answer to my question here.
The correct answer is to use increment_counter or update_counters if you need to increment multiple attributes or a value other than +1. Both of them are model classes.
Hey. I use delayed_job for background processing. I have 8 CPU server, MySQL and I start 7 delayed_job processes
RAILS_ENV=production script/delayed_job -n 7 start
Q1:
I'm wondering is it possible that 2 or more delayed_job processes start processing the same process (the same record-row in the database delayed_jobs). I checked the code of the delayed_job plugin but can not find the lock directive in a way it should be (no lock table or SELECT...FOR UPDATE).
I think each process should lock the database table before executing an UPDATE on lock_by column. They lock the record simply by updating the locked_by field (UPDATE delayed_jobs SET locked_by...). Is that really enough? No locking needed? Why? I know that UPDATE has higher priority than SELECT but I think this does not have the effect in this case.
My understanding of the multy-threaded situation is:
Process1: Get waiting job X. [OK]
Process2: Get waiting jobs X. [OK]
Process1: Update locked_by field. [OK]
Process2: Update locked_by field. [OK]
Process1: Get waiting job X. [Already processed]
Process2: Get waiting jobs X. [Already processed]
I think in some cases more jobs can get the same information and can start processing the same process.
Q2:
Is 7 delayed_jobs a good number for 8CPU server? Why yes/not.
Thx 10x!
I think the answer to your question is in line 168 of 'lib/delayed_job/job.rb':
self.class.update_all(["locked_at = ?, locked_by = ?", now, worker], ["id = ? and (locked_at is null or locked_at < ?)", id, (now - max_run_time.to_i)])
Here the update of the row is only performed, if no other worker has already locked the job and this is checked if the table is updated. A table lock or similar (which by the way would massively reduce the performance of your app) is not needed, since your DBMS ensures that the execution of a single query is isolated from effects off other queries. In your example Process2 can't get the lock for job X, since it updates the jobs table if and only if it was not locked before.
To your second question: It depends. On an 8 CPU server. which is dedicated for this job, 8 workers are a good starting point, since workers are single threaded you should run one for every core. Depending on your setup more or less workers are better. It heavily depends on your jobs. Take your jobs advantage of mutiple cores? Or does your job wait most of the time for external resources? You have experiment with different settings and have a look at all involved resources.