I'm working in Ruby on Rails(Rails 5.0)
I have a grouped sidekiq worker(GoalWorker) that groups every 100 jobs. Let the queue name be "goal_queue". Now I have an another job(say GoalEmailJob) relevant to the same model and I want to use the same queue name for this Job. But not configuring this one for grouping.
Now the documentation says
Jobs will be combined when queue size exceeds 30
Will number of GoalEmailJob in the queue be considered for grouping criteria as it is in the same queue? Or will it consider only the number of GoalWorker in queue?
Related
so the initial problem is that I have 2 products, one individual and one standard. After the separate products got produced, they will send out, but in the logistics department, there is only 1 worker. So how do I prioritize the individual product? The worker should always send out the individual good before the standard product.
I'm stuck because I have no idea how to queue, either agent comparison or priority-based, but how does the block knows which product is which?
Thx
Easiest approach:
add a parameter "myPriority" to your product agent type (integer)
when creating a individualized one, set it to 10. Else to 1
in your queue, set it up as below. This will ensure the queue always moves higher-prio agents to the front. Make sure your queue block expects agents of your Product type
Also check example models and the help :)
My concern is that I have various queues of different priorities. (Eg: Q1,Q2,Q3).
If I have a job in a lower priority queue (say Q2), after some time i want to move it to higher priority queue (say Q1).
I am using sidekiq "cross-queue" uniqueness to check if duplicate job doesn't exist in some other queue. If it exist I want to change priority.
Event is Cron triggered after every 1 minute.
Need help on how to implement the same
Note - This question expands on an answer to another question here.
I'm importing a file into my DB by chunking it up into smaller groups and spawning background jobs to import each chunk (of 100 rows).
I want a way to track progress of how many chunks have been imported so far, so I had planned on each job incrementing a DB field by 1 when it's done so I know how many have processed so far.
This has a potential situation of two parallel jobs incrementing the DB field by 1 simultaneously and overwriting each other.
What's the best way to avoid this condition and ensure an atomic parallel operation? The linked post above suggests using Redis, which is one good approach. For the purposes of this question I'm curious if there is an alternate way to do it using persistent storage.
I'm using ActiveRecord in Rails with Postgres as my DB.
Thanks!
I suggest to NOT incrementing a DB field by 1, instead, create a DB record with for each job with a job id. There are two benefits:
You can count the number of records to let you know how many have processed without worrying about parallel operations.
You can also add some necessary logs into each job record and easily debug when any of the jobs fails when importing.
I suggest you use a postgresql sequence.
See CREATE SEQUENCE and Sequence Manipulation.
Especially nextval():
Advance the sequence object to its next value and return that value. This is done atomically: even if multiple sessions execute nextval concurrently, each will safely receive a distinct sequence value.
I have a Resque queue of jobs. Each job has a batch of events to be processed. A worker will process the jobs and count the number of events that occurred in each minute. It uses ActiveRecord to get a "datapoint" for that minute and add the number of events to it and save.
When I have multiple workers processing that queue, I believe there is a concurrency issue. I think there is a race condition between getting the datapoint from the database, adding the correct amount, and updating to the new value. I looked into Transactions, but I think that only helps if the query would fail.
My current workaround is only using 1 Resque Worker. I'd like to scale and process jobs faster, though. Any ideas?
Edit: I originally had trouble finding key words to search Google for, but thanks to Robin, I found the answer to my question here.
The correct answer is to use increment_counter or update_counters if you need to increment multiple attributes or a value other than +1. Both of them are model classes.
Hey. I use delayed_job for background processing. I have 8 CPU server, MySQL and I start 7 delayed_job processes
RAILS_ENV=production script/delayed_job -n 7 start
Q1:
I'm wondering is it possible that 2 or more delayed_job processes start processing the same process (the same record-row in the database delayed_jobs). I checked the code of the delayed_job plugin but can not find the lock directive in a way it should be (no lock table or SELECT...FOR UPDATE).
I think each process should lock the database table before executing an UPDATE on lock_by column. They lock the record simply by updating the locked_by field (UPDATE delayed_jobs SET locked_by...). Is that really enough? No locking needed? Why? I know that UPDATE has higher priority than SELECT but I think this does not have the effect in this case.
My understanding of the multy-threaded situation is:
Process1: Get waiting job X. [OK]
Process2: Get waiting jobs X. [OK]
Process1: Update locked_by field. [OK]
Process2: Update locked_by field. [OK]
Process1: Get waiting job X. [Already processed]
Process2: Get waiting jobs X. [Already processed]
I think in some cases more jobs can get the same information and can start processing the same process.
Q2:
Is 7 delayed_jobs a good number for 8CPU server? Why yes/not.
Thx 10x!
I think the answer to your question is in line 168 of 'lib/delayed_job/job.rb':
self.class.update_all(["locked_at = ?, locked_by = ?", now, worker], ["id = ? and (locked_at is null or locked_at < ?)", id, (now - max_run_time.to_i)])
Here the update of the row is only performed, if no other worker has already locked the job and this is checked if the table is updated. A table lock or similar (which by the way would massively reduce the performance of your app) is not needed, since your DBMS ensures that the execution of a single query is isolated from effects off other queries. In your example Process2 can't get the lock for job X, since it updates the jobs table if and only if it was not locked before.
To your second question: It depends. On an 8 CPU server. which is dedicated for this job, 8 workers are a good starting point, since workers are single threaded you should run one for every core. Depending on your setup more or less workers are better. It heavily depends on your jobs. Take your jobs advantage of mutiple cores? Or does your job wait most of the time for external resources? You have experiment with different settings and have a look at all involved resources.