How to design a rake task that calls multiple class methods? - ruby-on-rails

I have a rake task called pull_orders which calls methods of a RemoteDbConnector class to do things like establish a connection to an external db, generate raw SQL queries, execute queries and store records in the local db.
While trying to test it, I stumbled upon this answer which got me thinking whether my design is flawed.
Should rake tasks really be one liners? If so, where should I put all these method calls, given that they need to be called in a particular sequence?
My task looks like this:
namespace :db do
desc 'finds and populates data from remote db'
task pull_orders: :environment do
...
columns = ...
table = ...
options = ...
column_mappings = ...
RemoteDbConnector.generate_query(...)
RemoteDbConnector.execute_query(...)
RemoteDbConnector.map_column_names(...)
Order.create(...) #creates records based on hash generated by RemoteDbConnector
...
end
end

Reasonable people will probably not enforce a strict rule of no more than one line per rake task, but it is definitely nicer to move large blocks of code out of .rake files into classes:
Short rake tasks make the structure of the .rake file clearer.
Large blocks of code should usually be broken up into multiple small methods with names that explain what they do. You could just do that in the .rake file, but then you'd have a mixture of rake tasks and methods, which would not be as easy to read as a uniform set of rake tasks.
Normal classes are easier to test.
So,
Extract classes out of your big rake tasks. You can initially define such a class in the same .rake file as the task you extracted it from as an intermediate step.
Move the classes from .rake files to your project's lib directory, not in lib/tasks (which is for .rake files) but in the root of lib or in a subdirectory of lib corresponding to their namespace if they have one.
require the classes in the rake tasks that use them. (Don't configure your app to autoload lib; that would load unwanted code and its dependencies in your production application.)
Test the classes as you would any classes.

I don't agree with the post you've linked and think your approach is fine. It's not usually possible or practical for rake tasks to be one liners. Often times Rake tasks are used for one-off tasks like bulk migrating or updating your database (by migrating I mean data migrations, not schema migrations, as is probably the case here).
In such cases it wouldn't be reasonable to update the code base to include code that's only intended to be run once, and there's no problem in handling logic in rake tasks.

Related

Why are rake tasks stored in lib/tasks/?

My understanding of the lib/ directory in rails is that it stores non-domain specific code as a best practice.
However, my Rake scripts are very specific to my domain. They do things like create new models.
So is there a better place than lib/tasks/ to store domain-specific rake scripts, or am I missing something here?
I like this idea, and I agree - lib at one point was very much a junk drawer, and as a Rails community we've moved some of the junk away, but yes Rake tasks are usually very specific application logic.
In your Rakefile all you have to do is load your new Rakefiles (exercise for the reader: iterate the files in the folder instead of specifying it explicitly.
Example:
require File.expand_path('../config/application', __FILE__)
Rails.application.load_tasks
load('app/tasks/my_task.rake') # <--- my custom task!!!
The above is correct, but you should add the following to prevent the zeitwerk from creating the Tasks constant. (ignore if not using zeitwerk loader)
Rails.autoloaders.main.ignore('app/tasks');
You may check this in the console by calling Tasks, it should not be defined.
Another option might be to add rake tasks inside /tasks instead of app/tasks

Define a custom rake task for each of the existing rake tasks

I need to create a custom task for each of the existing task I have. For example in /lib/tasks I have two files with tasks: update.rake,service.rake, etc. I have several tasks in each of these files. I have for example: rake update:users. Now I want to have rake custom_behaviour: for each of the tasks I have defined for example: rake custom_behaviour:update:users. I found the following code:
Rake::Task.tasks.each do |task|
task "custom_behaviour:#{task.name}" do
puts "Custom Behaviour Task: rake #{task.name}"
end
end
And this works properly if the file where I'm storing this code starts with "z", this is because it is the last file to be executed, but if I name the file as custom_behaviour.rake all the tasks defined in the files that are after the "c" are not "custom_behavioured" because the tasks are not loaded yet...
So my question is: What's the proper way of doing this? Where should I put the "custom_behaviour" code so when it's executed "all the tasks" are loaded?
I'd use the filename convention. This happens with other stuff that's order dependent, like Rails initializers.
The alternate that might allow you to accomplish what you need (I'm assuming you're looking have some custom behavior before/after the standard task) with is rake_hooks: https://github.com/guillermo/rake-hooks

What is the scope of the code in db/seeds.rb in Ruby on Rails?

Question 1
Suppose I write define some variables in db/seeds.rb, e.g.: user = User.create(...).
What is the scope of these variables ?
Question 2
If I have a big amount of code in db/seeds.rb, is it recommended to put it in a class ?
The variables are in the scope of the rake instance that has been started.
So they would be in scope for other tasks if multiple tasks where started at once.
For example
rake db:seed custom:sometask
Instance variables defined in db:seed could be accessed in 'sometask'
If the rake file is too big because of adding too many records, you could move the data that is to be inserted into a yaml file, that could make your seeds file cleaner, rather than creating a class.
Seed data is anything that must be loaded for an application to work properly. An application needs its seed data loaded in order to run in development, test, and production.
Seed data is mostly unchanging. It typically won’t be edited in your application. But requirements can and do change, so seed data may need to be reloaded on deployed applications.
Answer for your second question
lines of code in seed.rb doesn't affect the performance the basic task of seed is to initialize the database with predefined records. Keep one thing in mind that the parent creation is done before the child is created.
Here are some references that might help you
ASCIICasts
Rail Spikes

How to handle periodically changing database data in your Rails app?

EDIT: I have totally rewritten this question for clarity. I got no comments and no answers earlier.
I am maintaining a 2.x Rails app with plenty of statistical data. Some data is real and some is estimated for the future years. Every year I need to update estimated data with real data and calculate new estimates.
I have been using BIG yml-files and migrations for loading the data into the app every year. My migrations are full of estimation calculations and data corrections.
Problem
My migrations are full of none-schema related material and I can't even dream of doing db:migrate:reset without waiting few hours (if it even works). I'd love to see my migrations nice and clean - only with schema related modifications. But how I am suppose to update the data every year if not using migrations?
Help needed
I'd like to hear your comments and answers. I'm not looking for a silver bullet - more like best practises and ideas how people are handling similar situation.
It sounds like you have a large operation (data load using yml files) once a year but smaller operations once a month.
From my experience with statistical data you will probably end up doing more and more of these operations to clean and add more data.
I would use a job processing framework like resque and resque scheduler.
You can schedule the jobs to run once a month, year, day or constantly running. A job is something like loading yml files (or sets of yml files) or cleaning up data. You can control parameters to send to your job so you can use one class but alternate how it updates or cleans your data based on the way you enqueue or schedule the job.
First of all, I have to say that this is a very interesting question. As far as i know, it isn't a good idea loading data from migrations. Generally speaking you should use db/seeds.rb for data loading in your db and I think it could be a good idea to write a little class helper to put in your lib dir and then call it from db/seeds.rb. I image you could organize you files in the following way:
lib/data_loader.rb
lib/years/2009.rb
lib/years/2010.rb
Obviously, you should clear your migrations and write code for lib/data_loader.rb in the way you should prefer but I was only trying to offer a general idea of how I'd organize my code if I have to face a problem like that.
I'm not sure I've replied to your question in a way that helps but I hope it does.
If I were you I would go with creating custom rake task. You will have access to all you models and activerecord connections and once a year you will end up doing:
rake calculate
I have a situation where I need to load data from CSV files that change infrequently and update data from the Internet daily. I'll include a somewhat complete example on how to do the former.
First I have a rake file in lib/tasks/update.rake:
require 'update/from_csv_files.rb'
namespace :update do
task :csvfiles => :environment do
Dir.glob('db/static_data/*.csv') do |file|
Update::FromCsvFiles.load(file)
end
end
end
The => :environment means we will have access to the database via the usual models.
Then I have code in the lib/update/from_csv_files.rb file to do the actual work:
require 'csv'
module Update
module FromCsvFiles
def FromCsvFiles.load(file)
csv = CSV.open(file, 'r')
csv.each do |row|
id = row[0]
s = Statistic.find_by_id(id)
if (s.nil?)
s = Statistic.new
s.id= id
end
s.survey_area = row[1]
s.nr_of_space_men = row[2]
s.save
end
end
end
end
Then I can just run rake update:csvfiles whenever my CSV files changes to load the new data. I also have another task that is set up in a similar way to update my daily data.
In your case you should be able to write some code to load your YML files or make your calculations directly. To handle your smaller corrections you could make a generic method for loading YML files and call it with specific files from the rake task. That way you only need to include the YML file and update the rake file with a new task. To handle execution order you can make a rake task that calls the other rake tasks in the appropriate order. I'm just throwing around some ideas now, you know better than me.

Populate a constant values table

In a Rails application, I need a table in my database to contain constant data.
This table content is not intended to change for the moment but I do not want to put the content in the code, to be able to change it whenever needed.
I tried filling this table in the migration that created it, but this does not seem to work with the test environment and breaks my unit tests. In test environment, my model is never able to return any value while it is ok in my development environment.
Is there a way to fill that database correctly even in test environment ? Is there another way of handling these kind of data that should not be in code ?
edit
Thanks all for your answers and especially Vlad R for explaining the problem.
I now understand why my data are not loaded in test. This is because the test environment uses the db:load rake command which directly loads the schema instead of running the migrations. Having put my values in the migration only and not in the schema, these values are not loaded for test.
What you are probably observing is that the test framework is not running the migrations (db:migrate), but loading db/schema.rb directly (db:load) instead.
You have two options:
continue to use the migration for production and development; for the test environment, add your constant data to the corresponding yml files in db/fixtures
leave the existing db/fixtures files untouched, and create another set of yml files (containing the constant data) in the same vein as db/fixtures, but usable by both test and production/development environments when doing a rake db:load schema initialization
To cover those scenarios that use db:load (instead of db:migrate - e.g. test, bringing up a new database on a new development machine using the faster db:load instead of db:migrate, etc.) is create a drop-in rakefile in RAILS_APP/lib/tasks to augment the db:load task by loading your constant intialization data from "seed" yml files (one for each model) into the database.
Use the db:seed rake task as an example. Put your seed data in db/seeds/.yml
#the command is: rake:db:load
namespace :db do
desc 'Initialize data from YAML.'
task :load => :environment do
require 'active_record/fixtures'
Dir.glob(RAILS_ROOT + '/db/seeds/*.yml').each do |file|
Fixtures.create_fixtures('db/seeds', File.basename(file, '.*'))
end
end
end
To cover the incremental scenarios (db:migrate), define one migration that does the same thing as the task defined above.
If your seed data ever changes, you will need to add another migration to remove the old seed data and load the new one instead, which may be non-trivial in case of foreign-key dependencies etc.
Take a look at my article on loading seed data.
There's a number of ways to do this. I like a rake task called db:populate which lets you specify your fixed data in normal ActiveRecord create statements. For getting the data into tests, I've just be loading this populate file in my test_helper. However, I think I am going to switch to a test database that already has the seed data populated.
There's also plugin called SeedFu that helps with this problem.
Whatever you do, I recommend against using fixtures for this, because they don't validate your data, so it's very easy to create invalid records.

Resources