I have a rake task I need to run in order to sanitize (remove forward slashes) some data in the database. Here's the task:
namespace :db do
desc "Remove slashes from old-style URLs"
task :substitute_slashes => :environment do
puts "Starting"
contents = Content.all
contents.each do |c|
if c.permalink != nil
c.permalink.gsub!("/","")
c.save!
end
end
puts "Finished"
end
end
Which allows me to run rake db:substitute_slashes --trace
If I do puts c.permalink after the gsub! I can see it's setting the attribute properly. However the save! doesn't seem to be working because the data is not changed. Can someone spot what the issue may be?
Another thing, I have paperclip installed and this task is triggering [paperclip] Saving attachments. which I would rather avoid.
try this:
namespace :db do
desc "Remove slashes from old-style URLs"
task :substitute_slashes => :environment do
puts "Starting"
contents = Content.all
contents.each do |c|
unless c.permalink.nil?
c.permalink = c.permalink.gsub(/\//,'')
c.save!
end
end
puts "Finished"
end
end
1.) Change != nil to unless record.item.nil? (I don't know if it makes a different, but I've never used != nil. You may want to use .blank? also judging by your code)
2.) Your gsub was malformed. The pattern must be between two / (/ stuff /). The \ is necessary because you need to escape the /.
3.) Bang (!) updates the object in place. I think your biggest issue may be that you are overusing !.
4.) You're also making this very inefficient... You are looking at every record and updating every record. Rails isn't always the best option. Learn SQL and do this in one line:
"UPDATE contents SET permalink = replace(permalink, '/', '');"
If you MUST use Rails:
ActiveRecord::Base.connection.execute "UPDATE contents SET permalink = replace(permalink, '/', '');"
Wow! One query. Amazing! :)
The next thing I would try would be
c.permalink = c.permalink.gsub("/","")
As for saving without callbacks, this stackoverflow page has some suggestions.
Related
In my rake task I have:
namespace :example do
desc "this does something"
task :something, [:arg1] => :environment do |t, args|
(some_irrelevant_code)
print 'YES/ NO : '
choice = STDIN.gets.chomp.upcase
case choice
when 'YES'
do_something
break
when 'NO'
break
end
end
end
In my spec I have:
require "spec_helper"
require "rake"
feature "Example" do
before do
load File.expand_path("../../../lib/tasks/example.rake", __FILE__)
Rake::Task.define_task(:environment)
end
scenario "something" do
Rake.application.invoke_task("example:something[rake_args_here]")
end
All is working fine, although I am having troubles finding a way to avoid having to type the user input in the console when running the test.
Basically I want the test to run and assume that the user is going to type "YES".
Please let me know if you have a solution for this or point me in the right direction.
Thanks in advance.
If you use STDIN, you're stuck, that's a constant. It's worth noting that using STDIN is not recommended because of this limitation.
If you use $stdin, the global variable equivalent and modern replacement, you can reassign it:
require 'stringio'
$stdin = StringIO.new("fake input")
$stdin.gets.chomp.upcase
# => "FAKE INPUT"
That means you can, for testing purposes, rework $stdin. You'll want to put it back, though, which means you need a wrapper like this:
def with_stdin(input)
prev = $stdin
$stdin = StringIO.new(input)
yield
ensure
$stdin = prev
end
So in practice:
with_stdin("fake input") do
puts $stdin.gets.chomp.upcase
end
You should stub STDIN object like this STDIN.stub(gets: 'test')
or
allow(STDIN).to receive(:gets).and_return('test')
If both of them do not work then try:
allow(Kernel).to receive(:gets).and_return('test')
I am in a situation where I have to update more than 100k records in the database with best efficient way Please see below my code:
namespace :order do
desc "update confirmed at field for Payments::Order"
task set_confirmed_at: :environment do
puts "==> Updating confirmed_at for orders starts ...".blue
Payments::Order.find_each(batch_size: 10000) do |orders|
order_action = orders.actions.where("sender LIKE ?", "%ConfirmJob%").first if orders.actions
if !order_action.blank?
orders.update_attribute(:confirmed_at, order_action.created_at)
puts "order id = #{orders.id} has been updated.".green
end
end
puts "== completed ==".blue
end
end
Here I am breaking records into 10000 of each batch size and then try to update the record on the basis of some conditions so could anyone suggest me a more efficient way to do the same task.
Thank you in advance!
You can try update_all:
Payments::Order.joins(:actions).where(Payment::OrderAction.arel_table[:sender].matches("%ConfirmJob%")).update_all("confirmed_at = actions.created_at")
So your code will look like this:
namespace :order do
desc "update confirmed at field for Payments::Order"
task set_confirmed_at: :environment do
puts "==> Updating confirmed_at for orders starts ...".blue
Payments::Order.joins(:actions).where(Payments::OrderAction.arel_table[:sender].matches("%ConfirmJob%")).update_all("confirmed_at = actions.created_at")
puts "== completed ==".blue
end
end
Update:
I've investigated an issue and found out that bulk update with joined table is a long term issue in rails
As set part uses string parameter as it is I suggest to add from clause there.
namespace :order do
desc "update confirmed at field for Payments::Order"
task set_confirmed_at: :environment do
puts "==> Updating confirmed_at for orders starts ...".blue
Payments::Order.joins(:actions).
where(Order::Action.arel_table[:sender].matches("%ConfirmJob%")).
update_all("confirmed_at = actions.created_at FROM actions")
puts "== completed ==".blue
end
end
You are doing Payments::Order.find_each so your solution will loop for each Payment::Order when you only want to loop for the ones having actions.server like '%ConfirmJob%', so I will go with this solution:
Payments::Order
.includes(:actions)
.joins(:actions)
.where("actions.server like '%?%'", "ConfirmJob")
.find_each do |order|
order_action = order.actions.first
order.update!(confirmed_at: order_action.created_at)
end
I'm using raw sql bulk updates (for performance reasons) in the context of a rake task. Something like the following:
update_sql = Book.connection.execute("UPDATE books AS b SET
stock = vs.stock,
promotion = vs.promotion,
sales = vs.sales
FROM (values #{values_string}) AS vs
(stock, promotion, sales) WHERE b.id = vs.id;")
While everything is "transparent" in local development, if this SQL fails in production during the execution of the rails task (for example because the promotion column is nil and the statement becomes invalid), no error is logged.
I can manually log this with catching the exception, like below, however some option that would allow for automatic logging would be better.
begin
...
rescue ActiveRecord::StatementInvalid => e
Rails.logger.fatal "Books update: ActiveRecord::StatementInvalid: "+ e.to_s
end
You can make your own custom class in your model folder:
app/models/custom_sql_logger.rb :
class CustomSqlLogger
def self.debug(msg=nil)
#custom_log ||= Logger.new("#{Rails.root}/log/custom_sql.log")
#custom_log.debug(msg) unless msg.nil?
end
end
Then go to the rake task where you would like to debug updated fields for example lib/task/calculate_avarages.rake and call your custom debugger:
CustomSqlLogger.debug "The field was successfully updated into DB"
Example from my project:
require 'rake'
task :calculate_averages => :environment do
products = Product.all
products.each do |product|
puts "Calculating average rating for #{product.name}..."
product.update_attribute(:average_rating, product.reviews.average("rating"))
CustomSqlLogger.debug "#{product.name} was susscefully updated into DB"
end
end
Custom debugger will create the new file custom_sql.log into log folder: log/custom_sql.log and saved all information there. Beware of a log file size after a while.
So i stumbled across this: https://github.com/typhoeus/typhoeus
I'm wondering if this is what i need to speed up my rake task
Event.all.each do |row|
begin
url = urlhere + row.first + row.second
doc = Nokogiri::HTML(open(url))
doc.css('.table__row--event').each do |tablerow|
table = tablerow.css('.table__cell__body--location').css('h4').text
next unless table == row.eventvenuename
tablerow.css('.table__cell__body--availability').each do |button|
buttonurl = button.css('a')[0]['href']
if buttonurl.include? '/checkout/external'
else
row.update(row: buttonurl)
end
end
end
rescue Faraday::ConnectionFailed
puts "connection failed"
next
end
end
I'm wondering if this would speed it up, Or because i'm doing a .each it wouldn't?
If it would could you provide an example?
Sam
If you set up Typhoeus::Hydra to run parallel requests, you might be able to speed up your code, assuming that the Kernel#open calls are what's slowing you down. Before you optimize, you might want to run benchmarks to validate this assumption.
If it is true, and parallel requests would speed it up, you would need to restructure your code to load events in batches, build a queue of parallel requests for each batch, and then handle them after they execute. Here's some sketch code.
class YourBatchProcessingClass
def initialize(batch_size: 200)
#batch_size = batch_size
#hydra = Typhoeus::Hydra.new(max_concurrency: #batch_size)
end
def perform
# Get an array of records
Event.find_in_batches(batch_size: #batch_size) do |batch|
# Store all the requests so we can access their responses later.
requests = batch.map do |record|
request = Typhoeus::Request.new(your_url_build_logic(record))
#hydra.queue request
request
end
#hydra.run # Run requests in parallel
# Process responses from each request
requests.each do |request|
your_response_processing(request.response.body)
end
end
rescue WhateverError => e
puts e.message
end
private
def your_url_build_logic(event)
# TODO
end
def your_response_processing(response_body)
# TODO
end
end
# Run the service by calling this in your Rake task definition
YourBatchProcessingClass.new.perform
Ruby can be used for pure scripting, but it functions best as an object-oriented language. Decomposing your processing work into clear methods can help clarify your code and help you catch things like Tom Lord mentioned in the comments on your question. Also, instead of wrapping your whole script in a begin..rescue block, you can use method-level rescues as in #perform above, or just wrap #hydra.run.
As a note, .all.each is a memory hog, and is thus considered a bad solution to iterating over records: .all loads all of the records into memory before iterating over them with .each. To save memory, it's better to use .find_each or .find_in_batches, depending on your use case. See: http://api.rubyonrails.org/classes/ActiveRecord/Batches.html
AllegroAPI is a class in the /models directory that calls an external API. It works as I wish when I test in somewhere else not by running rake task.
Example working code:
require "./AllegroAPI"
allegro = AllegroAPI.new(login: 'LOGIN',
password: File.read('XXXX.txt'),
webapikey: File.read('XXX.txt')
)
puts allegro.do_search({"search-string"=>"nokia",
"search-price-from"=>300.0,
"search-price-to"=>500.0,
"search-limit"=>50}).to_s
As I've said it works correctly. It calls the API and prints out the result.
File allegro.rb is also in the models directory and it's a file I'm executing by running this task:
namespace :data do
desc "Update auctions table in database"
task update_auctions: :environment do
Allegro.check_for_new_auctions
end
end
allegro.rb:
module Allegro
require 'AllegroAPI'
def self.check_for_new_auctions
allegro = AllegroAPI.new(login: 'LOGIN',
password: File.read('app/models/ignore/XXXX.txt'),
webapikey: File.read('app/models/ignore/XXX.txt')
)
looks = Look.all
looks.each do |l|
hash_to_ask = ActiveSupport::JSON.decode(l[:look_query]).symbolize_keys
hash_to_ask = hash_to_ask.each_with_object({}) do |(k,v), h|
if v.is_number?
h[k.to_s.split('_').join('-')] = v.to_f
else
h[k.to_s.split('_').join('-')] = v
end
end
results = allegro.do_search(hash_to_ask)
#do something with data
end
end
end
The problem is that it doesn't return anything. var result is not nil, but it does not hold anything.
When I'm trying to debug it and call API from the inside do_search function it's calling API, doesn't raise a error but response is nothing. AllegroAPI works correctly. There is no problem with var "hash_to_ask", it's exactly the same hash as in working example.
EDIT:
I've commented out check_for_new_auctions and used "puts", it works fine when I run it by executing rake task. Then I've used exactly the same code which I used in normal file which have ran properly:
class Allegro
def self.check_for_new_auctions
allegro = AllegroAPI.new(login: 'LOGIN',
password: File.read('app/models/ignore/XXXX.txt'),
webapikey: File.read('app/models/ignore/XXXX.txt')
)
hash_to_ask = {"search-string"=>"nokia",
"search-price-from"=>300.0,
"search-price-to"=>500.0,
"search-limit"=>50}
allegro.do_search(hash_to_ask).to_s
end
end
It have not worked;/ The returned value from allegro.do_search(hash_to_ask) is hash, not empty, not nil but when I try to print it, it's nothing, empty place.
EDIT:
Everything have worked properly, waste like 15 hours total debugging the problem which have not existed. I'm not sure why it have not worked but it couldn't print to the console after converting to string, so I tried writing it down to file blindly. What I have found in the text file? Data.
I don't know why it couldn't print out everything in the console.
In the IRB script that you show, you have some puts statement that is not in your rake task. So for debugging, I would add puts ... to your Rake task, e.g.:
namespace :data do
desc "Update auctions table in database"
task update_auctions: :environment do
puts "Start Auctions..."
results = Allegro.check_for_new_auctions
puts "Results: #{results}"
end
end
Now, when you run:
rake data:update_auctions
You should get some output. Otherwise rinse-and-repeat by adding puts statements in the method that you are calling.