I have a large database with 4+ million addresses/records.
The rake command (below) worked fine when the database was a small test set, but now with the large database it just simply stalls.
rake geocode:all CLASS=YourModel
2 questions:
1. Is there any simple method to have geocoder code a null/nil lat and long when the records are called (on the fly). I have a feeling that this would be hard.
2. Anyone else have problems with geocode-ing a large dataset and using the rake command?
Thanks!
Update:
I create pull request based on this answer and now you can use batch in geocoder:
rake geocode:all CLASS=YourModel SLEEP=0.25 BATCH=100
I would use this solution for large database, i take it from geocoder gem rake task:
You can refine this for your needs.
Some example create rake task:
namespace :geocode_my_data do
desc "Geocode all objects in my databse."
task all: :environment do
klass = User
klass.where(geocoded: false).find_each(limit: 100) do |obj|
obj.geocode; obj.save
end
end
end
$> rake geocode_my_data:all
Used below code and put it into my lib/tasks folder as geocode_my_data.rake
To run:
rake geocode:all CLASS=YourModel
Works great!
namespace :geocode_my_data do
desc "Geocode all objects without coordinates."
task :all => :environment do
class_name = ENV['CLASS'] || ENV['class']
sleep_timer = ENV['SLEEP'] || ENV['sleep']
raise "Please specify a CLASS (model)" unless class_name
klass = class_from_string(class_name)
klass.not_geocoded.find_each(batch_size: 100) do |obj|
obj.geocode; obj.save
sleep(sleep_timer.to_f) unless sleep_timer.nil?
end
end
end
##
# Get a class object from the string given in the shell environment.
# Similar to ActiveSupport's +constantize+ method.
#
def class_from_string(class_name)
parts = class_name.split("::")
constant = Object
parts.each do |part|
constant = constant.const_get(part)
end
constant
end
Related
I am creating a model whose name is the input argument in a rake task. After the rake task, I wish to use the model to insert data.
So for example, I call my rake task with input Apple and the model Apple is created. Then I wish to do Apple.insert_all([{name: x},{name: y}...]) in another rake task but I get NameError: uninitialized constant Apple
Here's a better picture of the flow of what I'm doing
Rake::Task["create:fruit"].invoke("Apple") # create model here
Rake::Task["create:insert"].invoke("Apple") # insert data here but getting error
This is how I process the input in the second rake task:
task :insert, [:name] do |t, args|
fruit = args.name
fruit.classify.constantize.insert_all(xxx)
end
Any suggestions for how to go about this?
I created a new project and tried your code. I think the problem is in this line
fruit.classify.constantize.insert_all(xxx)
The code bellow works and create new records. I use a simple rake command to run it.
create.rake file
namespace :create do
desc "TODO"
task :insert, [:name] do |t, args|
klass = Object.const_get(args.name)
klass.create([{name: 'x'},{name: 'y'}])
p klass.count # testing new records have been saved
end
end
Rakefile file
require File.expand_path('../config/application', __FILE__)
Rails.application.load_tasks
task :default do
Rake::Task["create:insert"].invoke("Apple")
end
I got a rake task which invokes other rake tasks, so my development data can be easily reset.
the first rake task (lib/tasks/populate.rake)
# Rake task to populate development database with test data
# Run it with "rake db:populate"
namespace :db do
desc 'Erase and fill database'
task populate: :environment do
...
Rake::Task['test_data:create_company_plans'].invoke
Rake::Task['test_data:create_companies'].invoke
Rake::Task['test_data:create_users'].invoke
...
end
end
the second rake task (lib/tasks/populate_sub_scripts/create_company_plans.rake)
namespace :test_data do
desc 'Create Company Plans'
task create_company_plans: :environment do
Company::ProfilePlan.create!(name: 'Basic', trial_period_days: 30, price_monthly_cents: 4000)
Company::ProfilePlan.create!(name: 'Professional', trial_period_days: 30, price_monthly_cents: 27_500)
Company::ProfilePlan.create!(name: 'Enterprise', trial_period_days: 30, price_monthly_cents: 78_500)
end
end
when I run bin/rake db:populate then i get this error
rake aborted! LoadError: Unable to autoload constant
Company::ProfilePlan, expected
/home/.../app/models/company/profile_plan.rb to define it
but when I run the second rake task independently it works well.
The model (path: /home/.../app/models/company/profile_plan.rb)
class Company::ProfilePlan < ActiveRecord::Base
# == Constants ============================================================
# == Attributes ===========================================================
# == Extensions ===========================================================
monetize :price_monthly_cents
# == Relationships ========================================================
has_many :profile_subscriptions
# == Validations ==========================================================
# == Scopes ===============================================================
# == Callbacks ============================================================
# == Class Methods ========================================================
# == Instance Methods =====================================================
end
Rails 5.0.1
Ruby 2.4.0
The App was just upgraded from 4.2 to 5
It works when I require the whole path:
require "#{Rails.root}/app/models/company/profile_plan.rb"
But this seems strange to me, because in the error message rails has the correct path to the Model. Does someone know why I have to require the file when invoked from another rake task?
Thank you very much
Well, it seems that rake doesn't eager load, so when you call the create_company_plans.rake alone it loads the referred objects, however when you invoke it from another rake, it doesn't know you will need them and so they are not loaded.
You can take a look at this other QA which was similar to yours.
I think maybe you don't need to require the whole path, just:
require 'models/company/profile_plan'
From what I understand, you can probably overcome the problem by reenable ing and then revoke ing the task as given below. Pardon me if this doesn't work.
['test_data:create_company_plans', 'test_data:create_companies'].each do |task|
Rake::Task[task].reenable
Rake::Task[task].invoke
end
There is more info on this stackoverflow question how-to-run-rake-tasks-from-within-rake-tasks .
I have a number of rake tasks for which I would like to implement around-hook-like behavior. Specifically, I'm looking for a way to ensure that all of my Rake tasks execute in a particular (complicated, derived) Time.use_zone block.
For analogy, I have this in my ApplicationController:
around_filter :use_time_zone
def use_time_zone
time_zone = non_trivial_derivation
Time.use_zone(time_zone) { yield }
end
And now all of my controller actions will appropriately execute in the specified time zone. I would like some mechanism like this for Rake. I'd be willing to change or modify the dependency chain for my rake tasks, but I don't want to insert the actual time zone derivation code at the top of each rake task, out of concerns that that would lead to maintenance fragility. I'm pretty sure that Rake dependencies hold the solution--after all, Rake dependencies allow me to execute code in the context of my Rails application. But I can't figure out how to get that done for this use case.
I came up with a simple solution that doesn't require any external dependencies or gems such as rake-hooks:
desc "rake around hook"
task :use_timezone, [:subtask] => :environment do |name, args|
puts "using timezone"
Rake::Task[args[:subtask]].invoke
puts "end using timezone"
end
task :testing do
puts "testing"
end
The idea is that you execute the main use_timezone task and pass in your actual task as an argument:
$ rake use_timezone[testing]
That outputs:
> using timezone
> testing
> end using timezone
For your case you can write it like this:
task :use_timezone, [:subtask] => :environment do |name, args|
time_zone = non_trivial_derivation
Time.use_zone(time_zone) { Rake::Task[args[:subtask]].invoke }
end
And use it like this:
$ rake use_timezone[your_task]
Hope that helps.
In my rails project (Rails 3.1, Ruby 1.9.3) there are around 40 rake tasks defined. The requirement is that I should be able to create an entry (the rake details) in a database table right when we start each rake. The details I need are the rake name, arguments, start time and end time. For this purpose, I don't want rake files to be updated with the code. Is it possible to do this outside the scope of rake files.
Any help would be greatly appreciated!!
Try this
https://github.com/guillermo/rake-hooks
For example in your Rakefile
require 'rake/hooks'
task :say_hello do
puts "Good Morning !"
end
before :say_hello do
puts "Hi !"
end
#For multiple tasks
namespace :greetings do
task :hola do puts "Hola!" end ;
task :bonjour do puts "Bonjour!" end ;
task :gday do puts "G'day!" end ;
end
before "greetings:hola", "greetings:bonjour", "greetings:gday" do
puts "Hello!"
end
rake greetings:hola # => "Hello! Hola!"
This seems to be a bit awkward, But it may help others.
Rake.application.top_level_tasks
will return an array of information including Rake name and its arguments.
Reference attached below.
pry(main)> a = Rake.application.top_level_tasks
=> ["import_data[client1,", "data.txt]"]
When you create rake task, you can pass a parent task which will run before your task:
task my_task: :my_parent_task do
# ...
end
If your task depends from more than 1 task, you can pass an array of parent tasks
task my_task: [:my_prev_task, :my_another_prev_task] do
# ...
end
I want to create schema.sql instead of schema.rb. After googling around I found that it can be done by setting sql schema format in application.rb. So I set following in application.rb
config.active_record.schema_format = :sql
But if I set schema_format to :sql, schema.rb/schema.sql is not created at all. If I comment the line above it creates schema.rb but I need schema.sql. I am assuming that it will have database structure dumped in it and
I know that the database structure can be dumped using
rake db:structure:dump
But I want it to be done automatically when database is migrated.
Is there anything I am missing or assuming wrong ?
Five months after the original question the problem still exists. The answer is that you did everything correctly, but there is a bug in Rails.
Even in the guides it looks like all you need is to change the format from :ruby to :sql, but the migrate task is defined like this (activerecord/lib/active_record/railties/databases.rake line 155):
task :migrate => [:environment, :load_config] do
ActiveRecord::Migration.verbose = ENV["VERBOSE"] ? ENV["VERBOSE"] == "true" : true
ActiveRecord::Migrator.migrate(ActiveRecord::Migrator.migrations_paths, ENV["VERSION"] ? ENV["VERSION"].to_i : nil)
db_namespace["schema:dump"].invoke if ActiveRecord::Base.schema_format == :ruby
end
As you can see, nothing happens unless the schema_format equals :ruby.
Automatic dumping of the schema in SQL format was working in Rails 1.x. Something has changed in Rails 2, and has not been fixed.
The problem is that even if you manage to create the schema in SQL format, there is no task to load this into the database, and the task rake db:setup will ignore your database structure.
The bug has been noticed recently: https://github.com/rails/rails/issues/715 (and issues/715), and there is a patch at https://gist.github.com/971720
You may want to wait until the patch is applied to Rails (the edge version still has this bug), or apply the patch yourself (you may need to do it manually, since line numbers have changed a little).
Workaround:
With bundler it's relatively hard to patch the libraries (upgrades are so easy, that they are done very often and the paths are polluted with strange numbers - at least if you use edge rails ;-), so, instead of patching the file directly, you may want to create two files in your lib/tasks folder:
lib/tasks/schema_format.rake:
import File.expand_path(File.dirname(__FILE__)+"/schema_format.rb")
# Loads the *_structure.sql file into current environment's database.
# This is a slightly modified copy of the 'test:clone_structure' task.
def db_load_structure(filename)
abcs = ActiveRecord::Base.configurations
case abcs[Rails.env]['adapter']
when /mysql/
ActiveRecord::Base.establish_connection(Rails.env)
ActiveRecord::Base.connection.execute('SET foreign_key_checks = 0')
IO.readlines(filename).join.split("\n\n").each do |table|
ActiveRecord::Base.connection.execute(table)
end
when /postgresql/
ENV['PGHOST'] = abcs[Rails.env]['host'] if abcs[Rails.env]['host']
ENV['PGPORT'] = abcs[Rails.env]['port'].to_s if abcs[Rails.env]['port']
ENV['PGPASSWORD'] = abcs[Rails.env]['password'].to_s if abcs[Rails.env]['password']
`psql -U "#{abcs[Rails.env]['username']}" -f #{filename} #{abcs[Rails.env]['database']} #{abcs[Rails.env]['template']}`
when /sqlite/
dbfile = abcs[Rails.env]['database'] || abcs[Rails.env]['dbfile']
`sqlite3 #{dbfile} < #{filename}`
when 'sqlserver'
`osql -E -S #{abcs[Rails.env]['host']} -d #{abcs[Rails.env]['database']} -i #{filename}`
# There was a relative path. Is that important? : db\\#{Rails.env}_structure.sql`
when 'oci', 'oracle'
ActiveRecord::Base.establish_connection(Rails.env)
IO.readlines(filename).join.split(";\n\n").each do |ddl|
ActiveRecord::Base.connection.execute(ddl)
end
when 'firebird'
set_firebird_env(abcs[Rails.env])
db_string = firebird_db_string(abcs[Rails.env])
sh "isql -i #{filename} #{db_string}"
else
raise "Task not supported by '#{abcs[Rails.env]['adapter']}'"
end
end
namespace :db do
namespace :structure do
desc "Load development_structure.sql file into the current environment's database"
task :load => :environment do
file_env = 'development' # From which environment you want the structure?
# You may use a parameter or define different tasks.
db_load_structure "#{Rails.root}/db/#{file_env}_structure.sql"
end
end
end
and lib/tasks/schema_format.rb:
def dump_structure_if_sql
Rake::Task['db:structure:dump'].invoke if ActiveRecord::Base.schema_format == :sql
end
Rake::Task['db:migrate' ].enhance do dump_structure_if_sql end
Rake::Task['db:migrate:up' ].enhance do dump_structure_if_sql end
Rake::Task['db:migrate:down'].enhance do dump_structure_if_sql end
Rake::Task['db:rollback' ].enhance do dump_structure_if_sql end
Rake::Task['db:forward' ].enhance do dump_structure_if_sql end
Rake::Task['db:structure:dump'].enhance do
# If not reenabled, then in db:migrate:redo task the dump would be called only once,
# and would contain only the state after the down-migration.
Rake::Task['db:structure:dump'].reenable
end
# The 'db:setup' task needs to be rewritten.
Rake::Task['db:setup'].clear.enhance(['environment']) do # see the .clear method invoked?
Rake::Task['db:create'].invoke
Rake::Task['db:schema:load'].invoke if ActiveRecord::Base.schema_format == :ruby
Rake::Task['db:structure:load'].invoke if ActiveRecord::Base.schema_format == :sql
Rake::Task['db:seed'].invoke
end
Having these files, you have monkeypatched rake tasks, and you still can easily upgrade Rails. Of course, you should monitor the changes introduced in the file activerecord/lib/active_record/railties/databases.rake and decide whether the modifications are still necessary.
I'm using rails 2.3.5 but this may apply to 3.0 as well:
rake db:structure:dump does the trick for me.
It's possible you need to delete the schema.rb for the schema.sql to be created.