Currently I am running on rails 3.1 rc4 and using redis and resque for queueing creation of rackspace servers.
The rackspace gem that I am using, cloudservers, tells you when your server is done being setup with the status method.
What I am trying to do with the code below is execute the code in the elsif only after the server is active and ready to be used.
class ServerGenerator
#queue = :servers_queue
def self.perform(current_id)
current_user = User.find(current_id)
cs = CloudServers::Connection.new(:username => "***blocked for security***", :api_key => "***blocked for security***")
image = cs.get_image(49) # Set the linux distro
flavor = cs.get_flavor(1) # Use the 256 Mb of Ram instance
newserver = cs.create_server(:name => "#{current_user.name}", :imageId => image.id, :flavorId => flavor.id)
if newserver.status == "BUILD"
newserver.refresh
elsif newserver.status == "ACTIVE"
# Do stuff here, I generated another server with a different, static name
# so that I could see if it was working
cs = CloudServers::Connection.new(:username => "***blocked for security***", :api_key => "***blocked for security***")
image = cs.get_image(49)
flavor = cs.get_flavor(1)
newserver = cs.create_server(:name => "working", :imageId => image.id, :flavorId => flavor.id)
end
end
end
When I ran the above, it only generated the first server that uses the "current_user.name", as it's name. Would a loop help around the if statement? Also this seems like a poor way of queueing tasks.
Should I enqueue a new task that just checks to see if the server is ready or not?
Thanks a bunch!
Based upon what you've written, I'm assuming that cs.create_server is non-blocking. In this case, yes, you would need to wrap your check in a do ... loop or some similar construct. Otherwise you're checking the value precisely once and then exiting the perform method.
If you're going to loop in the method, you should add in sleep calls, otherwise you're going to burn a lot of CPU cycles doing nothing. Whether to loop or call a separate job is ultimately up to you and whether your workers are mostly idle. Put another way, if it takes 5 min. for your server to come up, and you just loop, that worker is not going to be able to process any other jobs for 5 min. If that's acceptable, it's certainly the easiest thing to do. If not acceptable, you'll probably want another job that accepts your server ID and makes an API call to see if it's available.
That process itself can be tricky though. If your server never comes online for whatever reason, you could find yourself creating jobs waiting for its status ad infinitum. So, you probably want to pass some sort of execution count around too, or keep track in redis, so you stop trying after X number of tries. I'd also check out resque-scheduler so you can exert control over when your job gets executed in this case.
Related
So I've been looking for the simplest way to send an e-mail when X column of Payments table in the database is == 'condition'. Basically what I want is to add a payment and set a date like 6 months. When 6 months have passed I want to send the mail. I've seen many solutions like using Whenever cron jobs and others but I want to know the absolute simplest way (perhaps using Rails only without relying on outside source) to keep my application light and clean. I was thinking I could use the auto generated created_at to evaluate when x time has passed.
Since you have a column in your db for the time to send email, make it a datetime datatype and you can set the email date as soon as the event payment event is created. Then, you can have a rake task where,
range = Time.now.beginning_of_day..Time.now.end_of_day
Payment.where(your_datetime_custom_column: range).each do |payment|
payment.user.send_email
end
and you can run this task everyday from the scheduler.
The "easiest" way is to use Active Job in conjunction with a state machine:
EmailJob.set(wait: 6.months).perform_later(user.id) if user.X_changed?
The problem with this is that the queue will accumulate jobs since jobs don't get handled right away. This may lead to other performance issues since there are now more jobs to scan and they're taking up more memory.
Cron jobs are well suited for this kind of thing. Depending on your hosting platform, there may be various other ways to handle this; for example, Heroku has Heroku Scheduler.
There are likely other ways to schedule repeating tasks without cron, such as this SO answer.
edit: I did use a gem once called 'fist_of_fury', but it's not currently maintained and I'm not sure how it would perform in a production environment. Below are some snippets for how I used it in a rails project:
in Gemfile
gem 'fist_of_fury'
in config/initializers/fist_of_fury.rb
# Ensure the jobs run only in a web server.
if defined?(Rails::Server)
FistOfFury.attack! do
ObserveAllJob.recurs { minutely(1) }
end
end
in app/jobs/observe_all_job.rb
class ObserveAllJob
include SuckerPunch::Job
include FistOfFury::Recurrent
def perform
::Task.all.each(&:observe)
end
end
I have a "Play" button in my app that checks a stock value from an API and creates a Position object that holds that value. This action uses Resque to make a background job using Resque and Redis in the following way:
Controller - stock_controller.rb:
def start_tracking
#stock = Stock.find(params[:id])
Resque.enqueue(StockChecker, #stock.id)
redirect_to :back
end
Worker:
class StockChecker
#queue = :stock_checker_queue
def self.perform(stock_id)
stock = Stock.find_by(id: stock_id)
stock.start_tracking_position
end
end
Model - stock.rb:
def start_tracking_position
// A Position instance that holds the stock value is created
end
I now want this to happen every 15 minutes for every Stock object. I looked at the scheduling section in the Ruby Toolbox website and have a hard time deciding what fits to my needs and how to start implementing it.
My concern is that my app will create tons of Position objects so I need something that is simple, uses Resque and can withstand this type of object creating without overloading the app.
What gem should I use and what is the simplest way to make my Resque Job happen every 15 minutes when the start_tracking action happens on a Stock object?
I've found resque scheduler to be useful: https://github.com/resque/resque-scheduler.
Configure up the schedule.yml for 15 mins
The biggest issue I found was ensuring it's running post releases etc. In the end I set up God to shutdown and restart
In terms of load. Not sure I follow, the schedulers will trigger events but the load is determined by the number of workers you have and how you decide to implement the creation. You can set the priority of the queues, and workers for the queue, but I guess if you don't process them in a timely way you get a backlog, is that acceptable. ? Normally you would run them of a separate server, this minimising impact to front end
I'm looking for a solution which enables:
Repetitive executing of a scraping task (nokogiri)
Changing the time interval via http://www.myapp.com/interval (example)
What is the best solution/way to get this done?
Options I know about
Custom Rake task
Rufus Scheduler
Current situation
In ./config/initializers/task_scheduler.rb I have:
require 'nokogiri'
require 'open-uri'
require 'rufus-scheduler'
require 'rake'
scheduler = Rufus::Scheduler.new
scheduler.every "1h" do
puts "BEGIN SCHEDULER at #{Time.now}"
#url = "http://www.marktplaats.nl/z/computers-en-software/apple-ipad/ipad-mini.html? query=ipad+mini&categoryId=2722&priceFrom=100%2C00&priceTo=&startDateFrom=always"
#doc = Nokogiri::HTML(open(#url))
#title = #doc.at_css("title").text
#number = 0
2.times do |number|
#doc.css(".defaultSnippet.group-#{#number}").each do |listing|
#listing_title = listing.at_css(".mp-listing-title").text
#listing_subtitle = listing.at_css(".mp-listing-description").text
#listing_price = listing.at_css(".price").text
#listing_priority = listing.at_css(".mp-listing-priority-product").text
listing = Listing.create(title: "#{#listing_title}", subtitle: "#{#listing_subtitle}", price: "#{#listing_price}")
end
#number +=1
end
puts "END SCHEDULER at #{Time.now}"
end
Is it not working?
Yes the current setup is working. However, I don't know how to enable changing the interval time via http://www.myapp.com/interval (example).
Changing scheduler.every "1h" to scheduler.every "#{#interval} do does not work.
In what file do I have to define #interval for it to work in task_scheduler.rb?
I'm not very familiar with Rufus Scheduler but it appears that it will be difficult to acheive both of your goals (regular heartbeat, dynamically rescheduled) with it. In order for it to work, you'll have to capture the job_id that it returns, use that job_id to stop the job if a rescheduling event occurs, and then create the new job. Rufus also points out that it's an in-memory application whose jobs will disappear when the process disappears -- reboot the server, restart the application, etc and you've got to reschedule from scratch.
I'd consider two things. First, I'd consider creating a model that wraps the screen-scraping that you want to do. At a minimum you'd capture the url and the interval. The model may wrap up the code for processing the html response (basically what's wrapped up in the 2.times block) as instance methods that you trigger based on the URL. You may also capture this in a text column and use eval on it, assuming that only "good guys" get access to this part of the system. This has a couple of advantages: you can quickly expand to scraping other sites and you can sanitize the interval sent back by the user.
Second, something like Delayed::Job may better suit your needs. Delayed::Job allows you to specify a time for the job's execution which you could fill in by reading the model and converting the interval to a time. The key to this approach is that the job must schedule the next iteration of itself before it exits.
This won't be as rock-steady as something like cron but it does seem to better address the rescheduling need.
First off: your rufus scheduler code is in an initializer, which is fine, but it is executed before the rails process is started, and only when the rails process is started. So, in the initializer you have no access to any variable #interval you could set, for instance in a controller.
What are possible options, instead of a class variable:
read it from a config file
read it from a database (but you will have to setup your own connection, in the initializer activerecord is not started imho
And ... if you change the value you will have to restart your rails process for it to have effect again.
So an alternative approach, where your rails process handles the interval of the scheduled job, is to use a recurring background job. At the end of the background, it reschedules itself, with the at that moment active interval. The interval is fetched from the database, I would propose.
Any background job handler could do this. Check ruby toolbox, I vote for resque or delayed_job.
My Survey model has about 2500 instances and I need to apply the set_state method to each instance twice. I need to apply it the second time only after every instance has had the method applied to it once. (The state of an instance can depend on the state of other instances.)
I'm using delayed_job to create delayed jobs and workless to automatically scale up/down my worker dynos as required.
The set_state method typically takes about a second to execute. So I've run the following at the heroku console:
2.times do
Survey.all.each do |survey|
survey.delay.set_state
sleep(4)
end
end
Shouldn't be any issues with overloading the API, right?
And yet I'm still seeing the following in my logs for each delayed job:
Heroku::API::Errors::ErrorWithResponse: Expected(200) <=> Actual(429 Unknown)
I'm not seeing any infinite loops -- it just returns this message as soon as I create the delayed job.
How can I avoid blowing Heroku's API rate limits?
Reviewing workless, it looks like it incurs an API call per delayed job to check the worker count and potentially a second API call to scale up/down. So if you are running 5000 (2500x2) jobs within a short period, you'll end up with 5000+ API calls. Which would be well in excess of the 1200/requests per hour limit. I've commented over there to hopefully help toward reducing the overall API usage (https://github.com/lostboy/workless/issues/33#issuecomment-20982433), but I think we can offer a more specific solution for you.
In the mean time, especially if your workload is pretty predictable (like this). I'd recommend skipping workless and doing that portion yourself. ie it sounds like you already know WHEN the scaling would need to happen (scale up right before the loop above, scale down right after). If that is the case you could do something like this to emulate the behavior in workless:
require 'heroku-api'
heroku = Heroku::API.new(:api_key => ENV['HEROKU_API_KEY'])
client.post_ps_scale(ENV['APP_NAME'], 'worker', Survey.count)
2.times do
Survey.all.each do |survey|
survey.delay.set_state
sleep(4)
end
end
min_workers = ENV['WORKLESS_MIN_WORKERS'].present? ? ENV['WORKLESS_MIN_WORKERS'].to_i : 0
client.post_ps_scale(ENV['APP_NAME'], 'worker', min_workers)
Note that you'll need to remove workless from these jobs also. I didn't see a particular way to do this JUST for certain jobs though, so you might want to ask on that project if you need that. Also, if this needs to be 2 pass (the first time through needs to finish before the second), the 4 second sleep may in some cases be insufficient but that is a different can of worms.
I hope that helps narrow in on what you needed, but I'm certainly happy to discuss further and/or elaborate on the above as needed. Thanks!
I want to give my users the option to send them a daily summary of their account statistics at a specific (user given) time ....
Lets say following model:
class DailySummery << ActiveRecord::Base
# attributes:
# send_at
# => 10:00 (hour)
# last_sent_at
# => Time of the last sent summary
end
Is there now a best practice how to send this account summaries via email to the specific time?
At the moment I have a infinite rake task running which checks permanently if emails are available for sending and i would like to put the dailysummary-generation and sending into this rake task.
I had a thought that I could solve this with following pseudo-code:
while true
User.all.each do |u|
u.generate_and_deliver_dailysummery if u.last_sent_at < Time.now - 24.hours
end
sleep 60
end
But I'm not sure if this has some hidden caveats...
Notice: I don't want to use queues like resq or redis or something like that!
EDIT: Added sleep (have it already in my script)
EDIT: It's a time critical service (notification of trade rates) so it should be as fast as possible. Thats the background why I don't want to use a queue or job based system. And I use Monit to manage this rake task, which works really fine.
There's only really two main ways you can do delayed execution. You run the script when an user on your site hits a page, which is inefficient and not entirely accurate. Or use some sort of background process, whether it's a cron job or resque/delayed job/etc.
While your method of having an rake process run forever will work fine, it's inefficient because you're iterating over users 24/7 as soon as it finishes, something like:
while true
User.where("last_sent_at <= ? OR last_sent_at = ?", 24.hours.ago, nil).each do |u|
u.generate_and_deliver_dailysummery
end
sleep 3600
end
Which would run once an hour and only pull users that needed an email sent is a bit more efficient. The best practice would be to use a cronjob though that runs your rake task though.
Running a task periodically is what cron is for. The whenever gem (https://github.com/javan/whenever) makes it simple to configure cron definitions for your app.
As your app scales, you may find that the rake task takes too long to run and that the queue is useful on top of cron scheduling. You can use cron to control when deliveries are scheduled but have them actually executed by a worker pool.
I see two possibilities to do a task at a specific time.
Background process / Worker / ...
It's what you already have done. I refactored your example, because there was two bad things.
Check conditions directly from your database, it's more efficient than loading potential useless data
Load users by batch. Imagine your database contains millions of users... I'm pretty sure you would be happy, but not Rails... not at all. :)
Beside your code I see another problem. How are you going to manage this background job on your production server? If you don't want to use Resque or something else, you should consider manage it another way. There is Monit and God which are both a process monitor.
while true
# Check the condition from your database
users = User.where(['last_sent_at < ? OR created_at IS NULL', 24.hours.ago])
# Load by batch of 1000
users.find_each(:batch_size => 1000) do |u|
u.generate_and_deliver_dailysummery
end
sleep 60
end
Cron jobs / Scheduled task / ...
The second possibility is to schedule your task recursively, for instance each hour or half-hour. Correct me if I'm wrong, but do your users really need to schedule the delivery at 10:39am? I think that let them choose the hour is enough.
Applying this, I think a job fired each hour is better than an infinite task querying your database every single minute. Moreover it's really easy to do, because you don't need to set up anything.
There is a good gem to manage cron task with the ruby syntax. More infos here : Whenever
You can do that, you'll need to also check for the time you want to send at. So starting with your pseudo code and adding to it:
while true
User.all.each do |u|
if u.last_sent_at < Time.now - 24.hours && Time.now.hour >= u.send_at
u.generate_and_deliver_dailysummery
# the next 2 lines are only needed if "generate_and_deliver_dailysummery" doesn't sent last_sent_at already
u.last_sent_at = Time.now
u.save
end
end
sleep 900
end
I've also added the sleep so you don't needlessly hammer your database. You might also want to look into limiting that loop to just the set of users you need to send to. A query similar what Zachary suggests would be much more efficient than what you have.
If you don't want to use a queue - consider delayed job (sort of a poor mans queue) - it does run as a rake task similar to what you are doing
https://github.com/collectiveidea/delayed_job
http://railscasts.com/episodes/171-delayed-job
it stores all tasks in a jobs table, usually when you add a task it queues it to run as soon as possible, however you can override this to delay it until a specific time
you could convert your DailySummary class to DailySummaryJob and once complete it could re-queue a new instance of itself for the next days run
How did you update the last_sent_at attribute?
if you use
last_sent_at += 24.hours
and initialized with last_sent_at = Time.now.at_beginning_of_day + send_at
it will be all ok .
don't use last_sent_at = Time.now . it is because there may be some delay when the job is actually done , this will make the last_sent_at attribute more and more "delayed".