Rake task w/ splat arguments - ruby-on-rails

I'm attempting to create a rake task that takes a required first argument, and then any number of additional arguments which I want to lump together into an array:
rake course["COURSE NAME", 123, 456, 789]
I've tried the following but args[:numbers] is simply a string w/ 123 instead of all of the numbers.
task :course, [:name, *:numbers] => :environment do |t, args|
puts args # {:name=>"COURSE NAME", :numbers=>"123"}
end

Starting with rake 10.1.0 you can use Rake::TaskArguments#extras:
task :environment
task :course, [:name] => :environment do |t, args|
name = args[:name]
numbers = args.extras
puts "name = #{name}"
puts "numbers = #{numbers.join ','}"
end
Output:
$ rake "course[COURSE NAME, 123, 456, 789]"
name = COURSE NAME
numbers = 123,456,789
For rake < 10.1.0 you could create a sufficienty large argument list.
Here's a workaround for up to 26 numbers:
task :course, [:name, *:a..:z] => :environment do |t, args|
name = args[:name]
numbers = args.values_at(*:a..:z).compact
puts "name = #{name}"
puts "numbers = #{numbers.join ','}"
end

Related

SQL execution time in Rake tasks

I've various rake tasks inside my rails app. One simple example is shown below.
desc "Simple rake task"
task :test_rake do |task|
first_sql_query = FirstModel.find(10)
SecondModel.create(:name => 'Test 101', :email => 'abc#def.co')
final_query = SecondModel.find(900)
end
Now in the above rake task, we're making three database calls with each of them taking x, y, z seconds supposedly.
Is there any way to find out the total time spent on db operations(x+y+z secs) for a given rake task..??
Use benchmark
task :test_rake do |task|
time = Benchmark.realtime {
first_sql_query = FirstModel.find(10)
SecondModel.create(:name => 'Test 101', :email => 'abc#def.co')
final_query = SecondModel.find(900)
}
puts time
end
getting separate benchmarks:
puts Benchmark.measure { FirstModel.find(10) }
puts Benchmark.measure { SecondModel.create(:name => 'Test 101', :email => 'abc#def.co') }
puts Benchmark.measure { final_query = SecondModel.find(900) }
Query timing is included in the log file:
❯❯❯ rake db:migrate:status
database: ml_development
Status Migration ID Migration Name
--------------------------------------------------
up 20180612055823 ********** NO FILE **********
then:
❯❯❯ cat log/development.log
[DEBUG] (0.4ms) SELECT "schema_migrations"."version" FROM "schema_migrations" ORDER BY "schema_migrations"."version" ASC
```
#pattu Rails internally use Benchmark for execution time check say_with_time
You can add the same function in a module and include in the rack task to find the execution time of SQL queries.
These Two functions are needed:
def say(message, subitem = false)
puts "#{subitem ? " ->" : "--"} #{message}"
end
def say_with_time(message = "")
say(message)
result = nil
time = Benchmark.measure { result = yield }
say "%.4fs" % time.real, :subitem
say("#{result} rows", :subitem) if result.is_a?(Integer)
result
end
Use this in Your rake task as
desc "Simple rake task"
task :test_rake do |task|
say_with_time do
first_sql_query = FirstModel.find(10)
SecondModel.create(:name => 'Test 101', :email => 'abc#def.co')
final_query = SecondModel.find(900)
end
end

Rake task not saving or creating new record in database

I've created a ruby script that executes fine if I run it from Console.
The script fetches some information from various websites and saves it to my database table.
However, when I want to turn the code into a rake task, the code still runs, but it does not save any new records. I don't get any errors from the rake either.
# Add your own tasks in files placed in lib/tasks ending in .rake,
# for example lib/tasks/capistrano.rake, and they will automatically be available to Rake.
require File.expand_path('../config/application', __FILE__)
Rails.application.load_tasks
require './crawler2.rb'
task :default => [:crawler]
task :crawler do
### ###
require 'rubygems'
require 'nokogiri'
require 'open-uri'
start = Time.now
$a = 0
sites = ["http://www.nytimes.com","http://www.news.com"]
for $a in 0..sites.size-1
url = sites[$a]
$i = 75
$error = 0
avoid_these_links = ["/tv", "//www.facebook.com/"]
doc = Nokogiri::HTML(open(url))
links = doc.css("a")
hrefs = links.map {|link| link.attribute('href').to_s}.uniq.sort.delete_if {|href| href.empty?}.delete_if {|href| avoid_these_links.any? { |w| href =~ /#{w}/ }}.delete_if {|href| href.size < 10 }
#puts hrefs.length
#puts hrefs
for $i in 0..hrefs.length
begin
#puts hrefs[60] #for debugging)
#file = open(url)
#doc = Nokogiri::HTML(file) do
if hrefs[$i].downcase().include? "http://"
doc = Nokogiri::HTML(open(hrefs[$i]))
else
doc = Nokogiri::HTML(open(url+hrefs[$i]))
end
image = doc.at('meta[property="og:image"]')['content']
title = doc.at('meta[property="og:title"]')['content']
article_url = doc.at('meta[property="og:url"]')['content']
description = doc.at('meta[property="og:description"]')['content']
category = doc.at('meta[name="keywords"]')['content']
newspaper_id = 1
puts "\n"
puts $i
#puts "Image: " + image
#puts "Title: " + title
#puts "Url: " + article_url
#puts "Description: " + description
puts "Catory: " + category
Article.create({
:headline => title,
:caption => description,
:thumbnail_url => image,
:category_id => 3,
:status => true,
:journalist_id => 2,
:newspaper_id => newspaper_id,
:from_crawler => true,
:description => description,
:original_url => article_url}) unless Article.exists?(original_url: article_url)
$i +=1
#puts $i #for debugging
rescue
#puts "Error here: " + url+hrefs[$i] if $i < hrefs.length
$i +=1 # do_something_* again, with the next i
$error +=1
end
end
puts "Page: " + url
puts "Articles: " + hrefs.length.to_s
puts "Errors: " + $error.to_s
$a +=1
end
finish = Time.now
diff = ((finish - start)/60).to_s
puts diff + " Minutes"
### ###
end
The code executes fine, if I save the file as crawler.rb and open it in Console by doing --> " load './crawler2.rb' ". When I use the exact same code in a rake task, I get no new records.
I figured out what was wrong.
I need to remove:
require './crawler2.rb'
task :default => [:crawler]
and instead edit the following:
task :crawler => :environment do
Now the crawler runs every ten minutes with a bit of help from Heroku scheduler :-)
Thanks for the help guys - and sorry for the bad formatting. Hope this answer may help others.

Relationships created by a rake task are not persisted though the rails server

I'm working my first project using Neo4j. I'm parsing wikipedia's page and pagelinks dumps to create a graph where the nodes are pages and the edges are links.
I've defined some rake tasks that download the dumps, parse the data, and save it in a Neo4j database. At the end of the rake task I print the number of pages and links created, and some of the pages with the most links. Here is the output of the raks task for the zawiki.
$ rake wiki[zawiki]
[ omitted ]
...
:: Done parsing zawiki
:: 1984 pages
:: 2144 links
:: The pages with the most links are:
9625.0 - Emijrp/List_of_Wikipedians_by_number_of_edits_(bots_included): 40
1363.0 - Gvangjsih_Bouxcuengh_Swcigih: 30
9112.0 - Fuzsuih: 27
1367.0 - Cungzcoj: 26
9279.0 - Vangz_Yenfanh: 19
It looks like pages and links are being created, but when I start a rails console, or the server the links aren't found.
$ rails c
jruby-1.7.5 :013 > Pages.all.count
=> 1984
jruby-1.7.5 :003 > Pages.all.reduce(0) { |count, page| count + page.links.count}
=> 0
jruby-1.7.5 :012 > Pages.all.sort_by { |p| p.links.count }.reverse[0...5].map { |p| p.links.count }
=> [0, 0, 0, 0, 0]
Here is the rake task, and this is the projects github page. Can anyone tell me why the links aren't saved?
DUMP_DIR = Rails.root.join('lib','assets')
desc "Download wiki dumps and parse them"
task :wiki, [:wiki] => 'wiki:all'
namespace :wiki do
task :all, [:wiki] => [:get, :parse] do |t, args|
# Print info about the newly created pages and links.
link_count = 0
Pages.all.each do |page|
link_count += page.links.count
end
indent "Done parsing #{args[:wiki]}"
indent "#{Pages.count} pages"
indent "#{link_count} links"
indent "The pages with the most links are:"
Pages.all.sort_by { |a| a.links.count }.reverse[0...5].each do |page|
puts "#{page.page_id} - #{page.title}: #{page.links.count}"
end
end
desc "Download wiki page and page links database dumps to /lib/assets"
task :get, :wiki do |t, args|
indent "Downloading dumps"
sh "#{Rails.root.join('lib', "get_wiki").to_s} #{args[:wiki]}"
indent "Done"
end
desc "Parse all dumps"
task :parse, [:wiki] => 'parse:all'
namespace :parse do
task :all, [:wiki] => [:pages, :pagelinks]
desc "Read wiki page dumps from lib/assests into the database"
task :pages, [:wiki] => :environment do |t, args|
parse_dumps('page', args[:wiki]) do |obj|
page = Pages.create_from_dump(obj)
end
indent = "Created #{Pages.count} pages"
end
desc "Read wiki pagelink dumps from lib/assests into the database"
task :pagelinks, [:wiki] => :environment do |t, args|
errors = 0
parse_dumps('pagelinks', args[:wiki]) do |from_id, namespace, to_title|
from = Pages.find(:page_id => from_id)
to = Pages.find(:title => to_title)
if to.nil? || from.nil?
errors = errors.succ
else
from.links << to
from.save
end
end
end
end
end
def indent *args
print ":: "
puts args
end
def parse_dumps(dump, wiki_match, &block)
wiki_match ||= /\w+/
DUMP_DIR.entries.each do |file|
file, wiki = *(file.to_s.match(Regexp.new "(#{wiki_match})-#{dump}.sql"))
if file
indent "Parsing #{wiki} #{dump.pluralize} from #{file}"
each_value(DUMP_DIR.join(file), &block)
end
end
end
def each_value(filename)
f = File.open(filename)
num_read = 0
begin # read file until line starting with INSERT INTO
line = f.gets
end until line.match /^INSERT INTO/
begin
line = line.match(/\(.*\)[,;]/)[0] # ignore begining of line until (...) object
begin
yield line[1..-3].split(',').map { |e| e.match(/^['"].*['"]$/) ? e[1..-2] : e.to_f }
num_read = num_read.succ
line = f.gets.chomp
end while(line[0] == '(') # until next insert block, or end of file
end while line.match /^INSERT INTO/ # Until line doesn't start with (...
f.close
end
app/models/pages.rb
class Pages < Neo4j::Rails::Model
include Neo4j::NodeMixin
has_n(:links).to(Pages)
property :page_id
property :namespace, :type => Fixnum
property :title, :type => String
property :restrictions, :type => String
property :counter, :type => Fixnum
property :is_redirect, :type => Fixnum
property :is_new, :type => Fixnum
property :random, :type => Float
property :touched, :type => String
property :latest, :type => Fixnum
property :length, :type => Fixnum
property :no_title_convert, :type => Fixnum
def self.create_from_dump(obj)
# TODO: I wonder if there is a way to compine these calls
page = {}
# order of this array is important, it corresponds to the data in obj
attrs = [:page_id, :namespace, :title, :restrictions, :counter, :is_redirect,
:is_new, :random, :touched, :latest, :length, :no_title_convert]
attrs.each_index { |i| page[attrs[i]] = obj[i] }
page = Pages.create(page)
return page
end
end
I must admit that I have no idea of how Neo4j works.
Transferring from other databases though, I too assume that either some validation is wrong, or maybe even something is misconfigured in your use of the database. The latter I can't give any advice on where to look, but if it's about validation, you can look at Page#errors or try calling Page#save! and see what it raises.
One crazy idea that just came to mind looking at this example is that maybe for that relation to be configured properly, you need a back reference, too.
Maybe has_n(:links).to(Page, :links) will help you. Or, if that doesn't work:
has_n(:links_left).to(Page, :links_right)
has_n(:links_right).from(Page, :links_left)
The more I look at this, the more I think the back reference to the same table is not configured properly and thus won't validate.

How to initialize a method in a rake file?

Apologies for the probably noobie question:
I have a rake task that is designed to take data from a site and save it as a RateData object.
rs.each do |market,url|
doc = Nokogiri::HTML(open(url))
doc.xpath("//table/tr").each do |item|
provider = "rs"
market = market
rate = item.xpath('td[1]').text.gsub!(/[^0-9\.]/, '')
volume = item.xpath('td[2]').text.gsub(/[^k0-9\.]/, '')
volume = volume.gsub(/\.(?=.k)/, '')
volume = volume.gsub(/k/, '00')
volume = volume.to_f
rate = rate.to_f
RateData.create(:provider => provider, :market => market, :rate => rate, :volume => volume, :bid_ask => 1)
end
end
The RateData.create method is in the rate_data_controller and is accessible when I call it in the rails console. How can I make it available in this rake task?
Many thanks!
you need to pass the environment into the task
task :your_task, [] => :environment do
or with args
task :your_task, [:foo] => :environment do |task, args|

how to import data into rails?

I have a Rails 3 application with a User class, and a tab-delimited file of users that I want to import.
How do I get access to the Active Record model outside the rails console, so that I can write a script to do
require "???active-record???"
File.open("users.txt", "r").each do |line|
name, age, profession = line.strip.split("\t")
u = User.new(:name => name, :age => age, :profession => profession)
u.save
end
Do I use the "ar-extensions" gem, or is there another way? (I don't particularly care about speed right now, I just want something simple.)
You can write a rake method to so.
Add this to a my_rakes.rake file in your_app/lib/tasks folder:
desc "Import users."
task :import_users => :environment do
File.open("users.txt", "r").each do |line|
name, age, profession = line.strip.split("\t")
u = User.new(:name => name, :age => age, :profession => profession)
u.save
end
end
An then call $ rake import_users from the root folder of your app in Terminal.
Use the activerecord-import gem for bulk importing.
Install via your Gemfile:
gem 'activerecord-import'
Collect your users and import:
desc "Import users."
task :import_users => :environment do
users = File.open("users.txt", "r").map do |line|
name, age, profession = line.strip.split("\t")
User.new(:name => name, :age => age, :profession => profession)
end
User.import users
end

Resources