rails cron job overlap - ruby-on-rails

I am using heroku scheduler with every 10 minutes run a task. The task is doing sth like below. I am thinking about how to prevent the next job overlap with the current one. Are there anythings can prevent cron job overlap problem?
task force_close: :environment do
#get all unvoted wine_question
questions = Question.where(closed: false)
puts "Total #{questions} of wine_question will be closed"
finish_count = 0
questions.each do |question|
begin
question.force_close!
finish_count += 1
rescue StandardError => bang
puts "question #{self.id} error when running #{bang}"
end
end
puts "Total #{finish_count} of question was closed"
end

There is 2 ways.
First,
Create checked field on Question. like force_closed:boolean
and touch force_closed field when call force_close! method
so you can find questions by where(closed: false, force_closed: false)
Second,
Create batch history table. columns should are task_name and run_at
and save runnging time info in batch history table,
# last line of task
BatchHistory.create(task_name: 'force_close', run_at: Time.now)

Related

Ruby on Rails best way to update 100k records

I am in a situation where I have to update more than 100k records in the database with best efficient way Please see below my code:
namespace :order do
desc "update confirmed at field for Payments::Order"
task set_confirmed_at: :environment do
puts "==> Updating confirmed_at for orders starts ...".blue
Payments::Order.find_each(batch_size: 10000) do |orders|
order_action = orders.actions.where("sender LIKE ?", "%ConfirmJob%").first if orders.actions
if !order_action.blank?
orders.update_attribute(:confirmed_at, order_action.created_at)
puts "order id = #{orders.id} has been updated.".green
end
end
puts "== completed ==".blue
end
end
Here I am breaking records into 10000 of each batch size and then try to update the record on the basis of some conditions so could anyone suggest me a more efficient way to do the same task.
Thank you in advance!
You can try update_all:
Payments::Order.joins(:actions).where(Payment::OrderAction.arel_table[:sender].matches("%ConfirmJob%")).update_all("confirmed_at = actions.created_at")
So your code will look like this:
namespace :order do
desc "update confirmed at field for Payments::Order"
task set_confirmed_at: :environment do
puts "==> Updating confirmed_at for orders starts ...".blue
Payments::Order.joins(:actions).where(Payments::OrderAction.arel_table[:sender].matches("%ConfirmJob%")).update_all("confirmed_at = actions.created_at")
puts "== completed ==".blue
end
end
Update:
I've investigated an issue and found out that bulk update with joined table is a long term issue in rails
As set part uses string parameter as it is I suggest to add from clause there.
namespace :order do
desc "update confirmed at field for Payments::Order"
task set_confirmed_at: :environment do
puts "==> Updating confirmed_at for orders starts ...".blue
Payments::Order.joins(:actions).
where(Order::Action.arel_table[:sender].matches("%ConfirmJob%")).
update_all("confirmed_at = actions.created_at FROM actions")
puts "== completed ==".blue
end
end
You are doing Payments::Order.find_each so your solution will loop for each Payment::Order when you only want to loop for the ones having actions.server like '%ConfirmJob%', so I will go with this solution:
Payments::Order
.includes(:actions)
.joins(:actions)
.where("actions.server like '%?%'", "ConfirmJob")
.find_each do |order|
order_action = order.actions.first
order.update!(confirmed_at: order_action.created_at)
end

Automatically Generating Daily Posts For A Blog With Ruby On Rails

Currently I have a rake task which I will run daily with the Heroku Scheduler.
It currently will generate a new post for the user every day when the rake task is executed as long as today's date is after the "start date" of the users account.
This is the code for the rake task:
namespace :abc do
desc "Used to generate a new daily log"
task :create_post => :environment do
User.find_each do |currentUser|
starting_date = currentUser.start_date
Post.create!(content: "RAKED", user: currentUser, status: "new") if Date.today >= starting_date && Date.today.on_weekday?
end
puts "It worked yo"
end
end
My problem is if someone makes an account then sets their start date sometime in the past (so they can fill in old posts) my current rake task will not generate the backdated daily posts. Does anyone have any ideas about how to resolve this so that the rake task still performs its current job but also deals with this case?
namespace :abc do
desc "Used to generate a new daily log"
task :create_post => :environment do
User.find_each do |currentUser
starting_date = currentUser.start_date
if Date.today >= starting_date && Date.today.on_weekday?
if currentUser.posts.count.zero?
starting_date.upto(Date.today) { |date| currentUser.generate_post if date.on_weekday? }
else
currentUser.generate_post
end
end
end
puts "It actually worked yo!"
end
end
In User model,
def generate_post
posts.create!(content: "RAKED", status: "new")
end
Your logic remains the same, I just loopes over the starting date to the current date to create backdated posts. Checking post count to zero will ensure that the condition is true only for the new user/user whose posts are not created earlier.
Hope it helps..

Post Scheduling in Ruby Blog

I'm working on post scheduling model by using gem "whenever" with status: published_at, schedule and drafts. But the problem is at given time post is not changing it's status from Schedule to Published_at.
#schedule.rb
every 1.minute do
rake 'scheduler'
end
#example.rake
task scheduler: :environment do
time = Time.zone.now
posts = Post.scheduled.where(published_at: (time))
posts.update_all(status: "Published")
end
task scheduler: :environment do
Post.scheduled.publish_now!
end
in your model post.rb add this line:
def self.publish_now!
where(published_at: Time.now).update_all(status: "Published")
end
I think it's too risk to search by Time.now because it will get exact time in second, but your scheduler is every 1.minutes, so you will have possibility to miss Post because of gaps 60 seconds. So that better you query like this:
def publish_now!
where("status != ? AND published_at <= now()", "Published").update_all(status: "Published")
end

How to continue indexing documents in elasticsearch(rails)?

So I ran this command rake environment elasticsearch:import:model CLASS='AutoPartsMapper' FORCE=true to index documents in elasticsearch.In my database I have 10 000 000 records=)...it takes (I think) one day to index this...When indexing was running my computer turned off...(I indexed 2 000 000 documents)Is it possible to continue indexing documents?
If you use rails 4.2+ you can use ActiveJob to schedule and leave it running. So, first generate it with this
bin/rails generate job elastic_search_index
This will give you class and method perform:
class ElasticSearchIndexJob < ApplicationJob
def perform
# impleement here indexing
AutoPartMapper.__elasticsearch__.create_index! force:true
AutoPartMapper.__elasticsearch__.import
end
end
Set the sidekiq as your active job provider and from console initiate this with:
ElasticSearchIndexJob.perform_later
This will set the active job and execute it on next free job but it will free your console. You can leave it running and check the process in bash later:
ps aux | grep side
this will give you something like: sidekiq 4.1.2 app[1 of 12 busy]
Have a look at this post that explains them
http://ruby-journal.com/how-to-integrate-sidekiq-with-activejob/
Hope it helps
There is no such functionality in elasicsearch-rails afaik but you could write a simple task to do that.
namespace :es do
task :populate, [:start_id] => :environment do |_, args|
start_id = args[:start_id].to_i
AutoPartsMapper.where('id > ?', start_id).order(:id).find_each do |record|
puts "Processing record ##{record.id}"
record.__elasticsearch__.index_document
end
end
end
Start it with bundle exec rake es:populate[<start_id>] passing the id of the record from which to start the next batch.
Note that this is a simplistic solution which will be much slower than batch indexing.
UPDATE
Here is a batch indexing task. It is much faster and automatically detects the record from which to continue. It does make an assumption that previously imported records were processed in increasing id order and without gaps. I haven't tested it but most of the code is from a production system.
namespace :es do
task :populate_auto => :environment do |_, args|
start_id = get_max_indexed_id
AutoPartsMapper.find_in_batches(batch_size: 1000).where('id > ?', start_id).order(:id) do |records|
elasticsearch_bulk_index(records)
end
end
def get_max_indexed_id
AutoPartsMapper.search(aggs: {max_id: {max: {field: :id }}}, size: 0).response[:aggregations][:max_id][:value].to_i
end
def elasticsearch_bulk_index(records)
return if records.empty?
klass = records.first.class
klass.__elasticsearch__.client.bulk({
index: klass.__elasticsearch__.index_name,
type: klass.__elasticsearch__.document_type,
body: elasticsearch_records_to_index(records)
})
end
def self.elasticsearch_records_to_index(records)
records.map do |record|
payload = { _id: record.id, data: record.as_indexed_json }
{ index: payload }
end
end
end

Sending emails based on intervals using Ruby on Rails

I would like to be able to send a string of emails at a determined interval to different recipients.
I assign to each Contact this series of Emails called a Campaign, where Campaign has Email1, Email2, etc. Each Contact has a Contact.start_date. Each Email has email.days which stores the number of days since a Contact's start-date to send the email.
For example: Email1.days=5, Email2.days=7, Email3.days=11
Contact1.start_date = 4/10/2010; contact2.start_date = 4/08/2010
IF today is 4/15, then Contact1 receives Email 1 (4/15-4/10 = 5 days)
IF today is 4/15, then Contact2 received Email 2 (4/15 - 4/8 = 7 days).
What's a good action to run every day using a cron job that would then follow these rules to send out emails using ActionMailer?
NOTE: The question isn't about using ActionMailer. It is about doing the "math" as well as the execution. Which email to send to whom? I am guessing it has to do with some version of Date - Contact[x].start_date and then compare against email[x].days but I'm not exactly clear how. Thanks.
I'd like guidance on whether to use date.today versus time.now as well.
Note: the intent is that an individual person may need to schedule individual follow-up on a consistent basis. Rather than having to remember when to follow up which email with whom, it would just follow a pre-determined campaign and send for that person.
So it's not a "bulk mail" -- it's really automating the follow-up for individual correspondence.
I would use DelayedJob for this ( assuming you are not sending large number of emails emails a day, i.e. 100's of thousands per day etc.)
class Email < ActiveRecord::Base
belongs_to :campaign
after_create :schedule_email_dispatch
def schedule_email_dispatch
send_at(campaign.created_at + self.days.days, :send_email)
end
def send_email
end
end
Run the workers using the rake task:
rake jobs:work
Every time a new Email object is created a delayed job item is added to the queue. At the correct interval the email will be sent by the worker.
#campaign = Compaign.new(...)
#campaign.emails.build(:days => 1)
#campaign.emails.build(:days => 2)
#campaign.save # now the delay
In the example above, two delayed job entries will be created after saving the campaign. They are executed 1 and 2 days after the creation date of the campaign.
This solution ensures emails are sent approximately around the expected schedule times. In a cron job based solution, disptaching happens at the cron intervals. There can be several hours delay between the intended dispatch time and the actual dispatch time.
If you want to use the cron approach do the following:
class Email < ActiveRecord::Base
def self.dispatch_emails
# find the emails due for dispatch
Email.all(:conditions => ["created_at <= DATE_SUB(?, INTERVAL days DAY)",
Time.now]).each do |email|
email.send_email
end
end
end
In this solution, most of the processing is done by the DB.
Add email.rake file in lib/tasks directory:
task :dispatch_emails => :environment do
Email.dispatch_emails
end
Configure the cron to execute rake dispatch_emails at regular intervals( in your case < 24 hours)
I would create a rake task in RAILS_ROOT/lib/tasks/email.rake
namespace :email do
desc "send emails to contacts"
task :send do
Email.all.each do |email|
# if start_date is a datetime or timestamp column
contacts = Contact.all(:conditions => ["DATE(start_date) = ?", email.days.days.ago.to_date])
# if start_date is a date column
contacts = Contact.all(:conditions => { :start_date => email.days.days.ago.to_date })
contacts.each do |contact|
#code to send the email
end
end
end
end
Then I would use a cronjob to call this rake task every day at 3 a.m.:
0 3 * * * app_user cd RAILS_APP_FOLDER && RAILS_ENV=production rake email:send
I think it would be much easier and more secure (you don't have to worry on authentication and so on) to create a rake task to send the emails. Also you don't have to worry about a possibly very long running request. Just create a file RAILS_ROOT/lib/tasks/email.rake
namespace :email do
desc "Sends scheduled emails"
task :send_scheduled => :enviroment do
Email.send_scheduled_emails
end
end
and in RAILS_ROOT/app/email.rb
class Email < ActiveRecord::Base
# ...
def self.send_scheduled_emails
#send your emails ...
end
end
Then create a cron job
0 0 * * * user cd /your/rails/app/ && RAILS_ENV=production rake emais:send_scheduled
to send the emails every night at 12:00.
I am using rufus-scheduler for scheduled email and twitter updates. You should check it.
I use ar_mailer gem
http://seattlerb.rubyforge.org/ar_mailer/
http://github.com/adzap/ar_mailer
http://blog.segment7.net/articles/2006/08/15/ar_mailer

Resources