Have just installed whenever gem https://github.com/javan/whenever to run my rake tasks, which are nokogiri / feedzilla dependent scraping tasks.
eg my tasks are called grab_bbc, grab_guardian etc
My question - as I update my site, I keep add more tasks to scheduler.rake.
What should I write in my config/schedule.rb to make all rake tasks run, no matter what they are called?
Would something like this work?
every 12.hours do
rake:task.each do |task|
runner task
end
end
Am new to Cron, using RoR 4.
namespace :sc do
desc 'All'
task all: [:create_categories, :create_subcategories]
desc 'Create categories'
task create_categories: :environment do
# your code
end
desc 'Create subcategories'
task create_subcategories: :environment do
# your code
end
end
in console write $ rake sc:all
write separate rake tasks for each scraping tasks. then write a aggregated task to run all those scraping rake tasks.
desc "scrape nytimes"
task :scrape_nytimes do
# scraping method
end
desc "scrape guardian"
task :scrape_guardian do
# scraping method
end
desc "perform all scraping"
task :scrape do
Rake::Task[:scrape_nytimes].execute
Rake::Task[:scrape_guardian].execute
end
then call the rake task as
rake scrape
Make sure you have a unique namespace with all the tasks in it, like:
namespace :scrapers do
desc "Scraper Number 1"
task :scrape_me do
# Your code here
end
desc "Scraper Number 2"
task :scrape_it do
# Your code here
end
end
You could then run all tasks of that namespace with a task outside of that namespace:
task :run_all_scrapers do
Rake.application.tasks.each do |task|
task.invoke if task.name.starts_with?("scrapers:")
end
end
That said, I'm pretty sure that this is not how you should run a set of scrapers. If for any reason the if part should return true you might unintenionally run tasks like rake db:drop
Either "manually" maintaining schedule.rb or a master task seems like a better option to me.
The aggregated task can be concise:
namespace :scrape do
desc "scrape nytimes"
task :nytimes do
# scraping method
end
desc "scrape guardian"
task :guardian do
# scraping method
end
end
desc "perform all scraping"
task scrape: ['scrape:nytimes', 'scrape:guardian']
Namespaces are also a good practice.
Use namespace and in_namespace to run all tasks dynamically.
I prefer this method because it keeps things clean and precludes you from having to remember to update your "parent" task if any of our namespace tasks change.
Note, the example was borrowed from Dmitry Shvetsov's excellent answer.
namespace :scrape do
desc "scrape nytimes"
task :nytimes do
# scraping method
end
desc "scrape guardian"
task :guardian do
# scraping method
end
end
desc "perform all scraping"
task :scrape do
Rake.application.in_namespace( :scrape ){ |namespace| namespace.tasks.each( &:invoke ) }
end
Related
Somebody has asked a similar question here:
https://github.com/jimweirich/rake/issues/257
The answer from the maintainer was:
I am going to reject this since it allows you to use tasks in non-rake-like ways.
So what are the correct way of using rake if a task depends of other tasks.
task 'succeed' => ['db:drop','stats'] do something end
displays results of stats even if Postgres threw an error and db:drop failded because of active connections.
If rake is not suitable for system maintenace, what tools should I use?
I need to be able to run a backup of a database, then do some tests, then drop the database and finally restore from backup.
to hel you understand my problem look at folowing fragment
namespace :experiment do
desc "TODO"
task 'succeed' => ['stopme', 'stats'] do
puts 'this and stats task should not run'
end
desc "TODO"
task stopme: :environment do
Rake::Task['db:drop'].invoke
end
end
You can invoke tasks manually like that:
task :stats => :environment do
Rake::Task['db:drop'].invoke rescue nil
# do something
end
If I have one rake which invokes multiple other rakes.
Once I initiate the parent rake
rake myapp:main
Then invokes done within the rake would load environment for each task or its just one time activity done while running rake myapp:main ?
namespace :myapp do
desc "Main Run"
task :main => :environment do
Rake::Task['myapp:task1'].invoke
Rake::Task['myapp:task2'].invoke
end
task :task1 => :environment do
# Does the first task
end
task :task2 => :environment do
# Does the second task
end
end
Adding details to #Shadwell's answer..
The => :environment is specifying that the :environment task (defined by rails) is a dependency of your tasks and must be invoked before your tasks are.
You can see :environment task's definition here
https://github.com/rails/rails/blob/d70ba48c4dd6b57d8f38612ea95a3842337c1419/railties/lib/rails/application.rb#L428-432
Rake keeps track of which tasks have invoked though and when it reaches a dependency that has already been invoked it knows it can skip it.
https://github.com/jimweirich/rake/blob/5e59bccecaf480d1de565ab34fd15e54ff667660/lib/rake/task.rb#L195-204
# Invoke all the prerequisites of a task.
def invoke_prerequisites(task_args, invocation_chain) # :nodoc:
if application.options.always_multitask
invoke_prerequisites_concurrently(task_args, invocation_chain)
else
prerequisite_tasks.each { |p|
prereq_args = task_args.new_scope(p.arg_names)
p.invoke_with_call_chain(prereq_args, invocation_chain)
}
end
end
Rake maintains an intance variable #already_invoked to know if a task has already been called. The same can be seen in the below method
https://github.com/jimweirich/rake/blob/5e59bccecaf480d1de565ab34fd15e54ff667660/lib/rake/task.rb#L170-184
def invoke_with_call_chain(task_args, invocation_chain) # :nodoc:
new_chain = InvocationChain.append(self, invocation_chain)
#lock.synchronize do
if application.options.trace
application.trace "** Invoke #{name} #{format_trace_flags}"
end
return if #already_invoked
#already_invoked = true
invoke_prerequisites(task_args, new_chain)
execute(task_args) if needed?
end
rescue Exception => ex
add_chain_to(ex, new_chain)
raise ex
end
The environment would only be set up once.
The => :environment is specifying that the :environment task (defined by rails) is a dependency of your tasks and must be invoked before your tasks are. Rake keeps track of which tasks have invoked though and when it reaches a dependency that has already been invoked it knows it can skip it.
(Aside: this can cause problems if you actually want the dependency to be invoked multiple times)
You could also define your main task using dependencies:
task :main => [:task1, :task2] do
# Blank
end
When you run rake myapp:main it will look at the dependencies and invoke task1 and then task2. Because task1 has a dependency environment it will invoke that first too. It'll skip the environment dependency on task2 though.
Answer is NO, Environment is not loaded when executing another Rake task from the parent Task. Simple explanation for this is the code below :
namespace :myapp do
desc "Main Run"
task :main => :environment do
puts "Start Time : #{Time.now.to_i}"
Rake::Task['myapp:task1'].invoke
puts "End Time1 : #{Time.now.to_i}"
Rake::Task['myapp:task2'].invoke
puts "End Time2 : #{Time.now.to_i}"
end
task :task1 => :environment do
# Does the first task
puts "Executing..1"
end
task :task2 => :environment do
# Does the second task
puts "Executing..2"
end
end
But it is not a good practice to do the two or multiple rake tasks. If you want to achieve the same thing, you can modularize the code, and create two functions and call the function to achieve the same result.
Is there any way in rails to call a method, such as before, automatically when running a rake task I've built?
Let's say we have
namespace :migrate do
def before
# do this before all tasks
end
desc 'migrate authors from legacy database'
task :authors => :environment do
# some code here
end
end
I want to the before method to run everytime a task runs.
See if this helps: http://www.rubyflow.com/items/4104
I have a Rails 2.2 project in which I want to override the functionality of the rake db:test:prepare task. I thought this would work, but it doesn't:
#lib/tasks/db.rake
namespace :db do
namespace :test do
desc "Overridden version of rails' standard db:test:prepare task since the schema dump used in that can't handle DB enums"
task :prepare => [:environment] do
puts "doing db:structure:dump"
Rake::Task['db:structure:dump'].invoke
puts "doing db:test:clone_structure"
Rake::Task['db:test:clone_structure'].invoke
end
end
end
I get the standard task's behaviour. If I change the name of the task to :prepare2 and then do rake db:test:prepare2, then it works fine. The natural conclusion I draw from this is that my rake tasks are being defined before the built-in Rails ones, so mine is overridden by the standard :prepare task.
Can anyone see how I can fix this? I'd rather override it than have to use a new task. Thanks, max
If you define a rake task that already exists, its execution gets appended to the original task's execution; both tasks will be executed.
If you want to redefine a task you need to clear the original task first:
Rake::Task["db:test:prepare"].clear
It's also useful to note that once a task has been executed in rake, it won't execute again even if you call it again. This is by design but you can call .reset on a task to allow it to be run again.
You have to remove the default task before adding your own:
Rake.application.instance_variable_get('#tasks').delete('db:test:prepare')
namespace 'db' do
namespace 'test' do
task 'prepare' do
# ...
end
end
end
A fairly popular idiom is to create a convenience method called remove_task like so:
Rake::TaskManager.class_eval do
def remove_task(task_name)
#tasks.delete(task_name.to_s)
end
end
def remove_task(task_name)
Rake.application.remove_task(task_name)
end
(Source: drnic/newgem)
Create a new project.rake file at lib/tasks/, and paster below code into it.
namespace :mv do
desc "Display hint and info for your rails 4 project"
task info: :environment do
puts 'Run rake test to test'
end
end
task(:default).clear.enhance ['mv:info']
inspired by Krasimir Angelov's blog
In Rake task definition, like following:
desc 'SOME description'
task :some_task => :environment do
# DO SOMETHING
end
What does the :some_task in task :some_task => :environment means?
Is it the method name which will be invoked in the DO SOMETHING part?
Can :some_task be any arbitrary string which describe the task?
In fact, when you're creating a rake task, :some_task is the name of the task you are calling.
For instance, in this case, you will call rake some_task
You also could define namespaces for your tasks :
namespace :my_tasks do
desc "My first task"
task :first_task => :environment do
# DO SOMETHING
end
end
And then you will call rake my_tasks:first_task in your console.
Hope it will help you,
Edit:
As explained by Holger Just, the :environment executes the "environment" task and if you are on rails, loads the environment. This could take a long time but il also helps you if your tasks works with the database.
With your example, you define a task called some_task which can be invoked by calling rake some_task on the command line.
It will depend on the environment task which will be rune before your new some_task. In rails, the environment task sets up the rails environment (loading libraries, preparing database connection, ...) which is quite expensive and thus optional.