How to initialize a method in a rake file? - ruby-on-rails

Apologies for the probably noobie question:
I have a rake task that is designed to take data from a site and save it as a RateData object.
rs.each do |market,url|
doc = Nokogiri::HTML(open(url))
doc.xpath("//table/tr").each do |item|
provider = "rs"
market = market
rate = item.xpath('td[1]').text.gsub!(/[^0-9\.]/, '')
volume = item.xpath('td[2]').text.gsub(/[^k0-9\.]/, '')
volume = volume.gsub(/\.(?=.k)/, '')
volume = volume.gsub(/k/, '00')
volume = volume.to_f
rate = rate.to_f
RateData.create(:provider => provider, :market => market, :rate => rate, :volume => volume, :bid_ask => 1)
end
end
The RateData.create method is in the rate_data_controller and is accessible when I call it in the rails console. How can I make it available in this rake task?
Many thanks!

you need to pass the environment into the task
task :your_task, [] => :environment do
or with args
task :your_task, [:foo] => :environment do |task, args|

Related

SQL execution time in Rake tasks

I've various rake tasks inside my rails app. One simple example is shown below.
desc "Simple rake task"
task :test_rake do |task|
first_sql_query = FirstModel.find(10)
SecondModel.create(:name => 'Test 101', :email => 'abc#def.co')
final_query = SecondModel.find(900)
end
Now in the above rake task, we're making three database calls with each of them taking x, y, z seconds supposedly.
Is there any way to find out the total time spent on db operations(x+y+z secs) for a given rake task..??
Use benchmark
task :test_rake do |task|
time = Benchmark.realtime {
first_sql_query = FirstModel.find(10)
SecondModel.create(:name => 'Test 101', :email => 'abc#def.co')
final_query = SecondModel.find(900)
}
puts time
end
getting separate benchmarks:
puts Benchmark.measure { FirstModel.find(10) }
puts Benchmark.measure { SecondModel.create(:name => 'Test 101', :email => 'abc#def.co') }
puts Benchmark.measure { final_query = SecondModel.find(900) }
Query timing is included in the log file:
❯❯❯ rake db:migrate:status
database: ml_development
Status Migration ID Migration Name
--------------------------------------------------
up 20180612055823 ********** NO FILE **********
then:
❯❯❯ cat log/development.log
[DEBUG] (0.4ms) SELECT "schema_migrations"."version" FROM "schema_migrations" ORDER BY "schema_migrations"."version" ASC
```
#pattu Rails internally use Benchmark for execution time check say_with_time
You can add the same function in a module and include in the rack task to find the execution time of SQL queries.
These Two functions are needed:
def say(message, subitem = false)
puts "#{subitem ? " ->" : "--"} #{message}"
end
def say_with_time(message = "")
say(message)
result = nil
time = Benchmark.measure { result = yield }
say "%.4fs" % time.real, :subitem
say("#{result} rows", :subitem) if result.is_a?(Integer)
result
end
Use this in Your rake task as
desc "Simple rake task"
task :test_rake do |task|
say_with_time do
first_sql_query = FirstModel.find(10)
SecondModel.create(:name => 'Test 101', :email => 'abc#def.co')
final_query = SecondModel.find(900)
end
end

How to speed up sitemap_generator with parallel gem

I am trying to speed up sitemap_generator by adding parallelization via the parallel gem. I have the following code but my groups aren't getting written to the public/sitemaps directory. I am thinking it's due to lambdas getting executed in a different space in parallel. Any feedback would be helpful. Thanks!
#!/usr/bin/env ruby
require 'rubygems'
require 'sitemap_generator'
require 'benchmark'
require 'parallel'
require 'random-word'
SitemapGenerator::Sitemap.default_host = "http://localhost"
a = lambda {
SitemapGenerator::Sitemap.group(:filename => :biz, :sitemaps_path => 'sitemaps/biz/') do
(1..1000).each do |index|
url = "/#{RandomWord.adjs.next}/#{RandomWord.nouns.next}"
add url, :priority => 0.8
end
end
}
b = lambda {
SitemapGenerator::Sitemap.group(:filename => :wedding_ugc, :sitemaps_path => 'sitemaps/ugc') do
(1..1000).each do |index|
url = "/#{RandomWord.adjs.next}/#{RandomWord.nouns.next}"
add url, :priority => 0.8
end
end
}
#working example
# SitemapGenerator::Sitemap.default_host = "http://localhost"
# SitemapGenerator::Sitemap.create(:compress => false) do
# group(:filename => :biz, :sitemaps_path => 'sitemaps/biz/') do
# (1..1000).each do |index|
# url = "/#{RandomWord.adjs.next}/#{RandomWord.nouns.next}"
# add url, :priority => 0.8
# end
# end
# end
puts Time.now
Parallel.each([a,b]){|job| job.call()}
puts Time.now
I got this working and posted the solution on github here
Here is the code incase the url gets broken.
SitemapGenerator::Sitemap.create(:compress => false, :create_index => false) do
group1 = lambda {
group = sitemap.group(:filename => :group1, :sitemaps_path => 'sitemaps/group1') do
Record.find_each do |record|
add '/record/path'
end
end
group.sitemap.write unless group.sitemap.written? #write if not full
}
# group2 like above...
Parallel.each([group1, group2], :in_processes => 8) do |group|
group.call
end
end
#regenerate the index sitemap xml file because I couldn't figure out how to track it with multiple processes
SitemapGenerator::Sitemap.create(:compress => false) do
Dir.chdir(sitemap.public_path.to_s)
xml_files = File.join("**", "sitemaps", "**", "*.xml")
xml_file_paths = Dir.glob(xml_files)
xml_file_paths.each do |file|
add file
end
end

Relationships created by a rake task are not persisted though the rails server

I'm working my first project using Neo4j. I'm parsing wikipedia's page and pagelinks dumps to create a graph where the nodes are pages and the edges are links.
I've defined some rake tasks that download the dumps, parse the data, and save it in a Neo4j database. At the end of the rake task I print the number of pages and links created, and some of the pages with the most links. Here is the output of the raks task for the zawiki.
$ rake wiki[zawiki]
[ omitted ]
...
:: Done parsing zawiki
:: 1984 pages
:: 2144 links
:: The pages with the most links are:
9625.0 - Emijrp/List_of_Wikipedians_by_number_of_edits_(bots_included): 40
1363.0 - Gvangjsih_Bouxcuengh_Swcigih: 30
9112.0 - Fuzsuih: 27
1367.0 - Cungzcoj: 26
9279.0 - Vangz_Yenfanh: 19
It looks like pages and links are being created, but when I start a rails console, or the server the links aren't found.
$ rails c
jruby-1.7.5 :013 > Pages.all.count
=> 1984
jruby-1.7.5 :003 > Pages.all.reduce(0) { |count, page| count + page.links.count}
=> 0
jruby-1.7.5 :012 > Pages.all.sort_by { |p| p.links.count }.reverse[0...5].map { |p| p.links.count }
=> [0, 0, 0, 0, 0]
Here is the rake task, and this is the projects github page. Can anyone tell me why the links aren't saved?
DUMP_DIR = Rails.root.join('lib','assets')
desc "Download wiki dumps and parse them"
task :wiki, [:wiki] => 'wiki:all'
namespace :wiki do
task :all, [:wiki] => [:get, :parse] do |t, args|
# Print info about the newly created pages and links.
link_count = 0
Pages.all.each do |page|
link_count += page.links.count
end
indent "Done parsing #{args[:wiki]}"
indent "#{Pages.count} pages"
indent "#{link_count} links"
indent "The pages with the most links are:"
Pages.all.sort_by { |a| a.links.count }.reverse[0...5].each do |page|
puts "#{page.page_id} - #{page.title}: #{page.links.count}"
end
end
desc "Download wiki page and page links database dumps to /lib/assets"
task :get, :wiki do |t, args|
indent "Downloading dumps"
sh "#{Rails.root.join('lib', "get_wiki").to_s} #{args[:wiki]}"
indent "Done"
end
desc "Parse all dumps"
task :parse, [:wiki] => 'parse:all'
namespace :parse do
task :all, [:wiki] => [:pages, :pagelinks]
desc "Read wiki page dumps from lib/assests into the database"
task :pages, [:wiki] => :environment do |t, args|
parse_dumps('page', args[:wiki]) do |obj|
page = Pages.create_from_dump(obj)
end
indent = "Created #{Pages.count} pages"
end
desc "Read wiki pagelink dumps from lib/assests into the database"
task :pagelinks, [:wiki] => :environment do |t, args|
errors = 0
parse_dumps('pagelinks', args[:wiki]) do |from_id, namespace, to_title|
from = Pages.find(:page_id => from_id)
to = Pages.find(:title => to_title)
if to.nil? || from.nil?
errors = errors.succ
else
from.links << to
from.save
end
end
end
end
end
def indent *args
print ":: "
puts args
end
def parse_dumps(dump, wiki_match, &block)
wiki_match ||= /\w+/
DUMP_DIR.entries.each do |file|
file, wiki = *(file.to_s.match(Regexp.new "(#{wiki_match})-#{dump}.sql"))
if file
indent "Parsing #{wiki} #{dump.pluralize} from #{file}"
each_value(DUMP_DIR.join(file), &block)
end
end
end
def each_value(filename)
f = File.open(filename)
num_read = 0
begin # read file until line starting with INSERT INTO
line = f.gets
end until line.match /^INSERT INTO/
begin
line = line.match(/\(.*\)[,;]/)[0] # ignore begining of line until (...) object
begin
yield line[1..-3].split(',').map { |e| e.match(/^['"].*['"]$/) ? e[1..-2] : e.to_f }
num_read = num_read.succ
line = f.gets.chomp
end while(line[0] == '(') # until next insert block, or end of file
end while line.match /^INSERT INTO/ # Until line doesn't start with (...
f.close
end
app/models/pages.rb
class Pages < Neo4j::Rails::Model
include Neo4j::NodeMixin
has_n(:links).to(Pages)
property :page_id
property :namespace, :type => Fixnum
property :title, :type => String
property :restrictions, :type => String
property :counter, :type => Fixnum
property :is_redirect, :type => Fixnum
property :is_new, :type => Fixnum
property :random, :type => Float
property :touched, :type => String
property :latest, :type => Fixnum
property :length, :type => Fixnum
property :no_title_convert, :type => Fixnum
def self.create_from_dump(obj)
# TODO: I wonder if there is a way to compine these calls
page = {}
# order of this array is important, it corresponds to the data in obj
attrs = [:page_id, :namespace, :title, :restrictions, :counter, :is_redirect,
:is_new, :random, :touched, :latest, :length, :no_title_convert]
attrs.each_index { |i| page[attrs[i]] = obj[i] }
page = Pages.create(page)
return page
end
end
I must admit that I have no idea of how Neo4j works.
Transferring from other databases though, I too assume that either some validation is wrong, or maybe even something is misconfigured in your use of the database. The latter I can't give any advice on where to look, but if it's about validation, you can look at Page#errors or try calling Page#save! and see what it raises.
One crazy idea that just came to mind looking at this example is that maybe for that relation to be configured properly, you need a back reference, too.
Maybe has_n(:links).to(Page, :links) will help you. Or, if that doesn't work:
has_n(:links_left).to(Page, :links_right)
has_n(:links_right).from(Page, :links_left)
The more I look at this, the more I think the back reference to the same table is not configured properly and thus won't validate.

Rake task w/ splat arguments

I'm attempting to create a rake task that takes a required first argument, and then any number of additional arguments which I want to lump together into an array:
rake course["COURSE NAME", 123, 456, 789]
I've tried the following but args[:numbers] is simply a string w/ 123 instead of all of the numbers.
task :course, [:name, *:numbers] => :environment do |t, args|
puts args # {:name=>"COURSE NAME", :numbers=>"123"}
end
Starting with rake 10.1.0 you can use Rake::TaskArguments#extras:
task :environment
task :course, [:name] => :environment do |t, args|
name = args[:name]
numbers = args.extras
puts "name = #{name}"
puts "numbers = #{numbers.join ','}"
end
Output:
$ rake "course[COURSE NAME, 123, 456, 789]"
name = COURSE NAME
numbers = 123,456,789
For rake < 10.1.0 you could create a sufficienty large argument list.
Here's a workaround for up to 26 numbers:
task :course, [:name, *:a..:z] => :environment do |t, args|
name = args[:name]
numbers = args.values_at(*:a..:z).compact
puts "name = #{name}"
puts "numbers = #{numbers.join ','}"
end

Inputting scraped data into database

Heyo,
So I built a working scraper and added the file to my app. I am now trying to take the information in the scraper and place it in my database. I am attempting to use the find_or_create method but I keep getting the following error.
breads_scraper.rb:49:in `block in summary': uninitialized constant Scraper::Bread (NameError)
from /Users/Cameron/.rvm/gems/ruby-1.9.3-p392/gems/nokogiri- 1.5.9/lib/nokogiri/xml/node_set.rb:239:in `block in each'
from /Users/Cameron/.rvm/gems/ruby-1.9.3-p392/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:238:in `upto'
from /Users/Cameron/.rvm/gems/ruby-1.9.3-p392/gems/nokogiri-1.5.9/lib/nokogiri/xml/node_set.rb:238:in `each'
from breads_scraper.rb:24:in `map'
from breads_scraper.rb:24:in `summary'
from breads_scraper.rb:57:in `<class:Scraper>'
from breads_scraper.rb:9:in `<main>'
My code looks like the following. My theory is that I am using find_or_create incorrectly, or the file doesn't know how to reach the bread method and controller.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'uri'
require 'json'
url = Nokogiri::HTML(open("http://en.wikipedia.org/wiki/List_of_breads"))
class Scraper
def initialize
#url = "http://en.wikipedia.org/wiki/List_of_breads"
#nodes = Nokogiri::HTML(open(#url))
end
def summary
bread_data = #nodes
breads = bread_data.css('div.mw-content-ltr table.wikitable tr')
bread_data.search('sup').remove
bread_hashes = breads.map {|x|
if content = x.css('td')[0]
name = content.text
end
if content = x.css('td a.image').map {|link| link ['href']}
image =content[0]
end
if content = x.css('td')[2]
type = content.text
end
if content = x.css('td')[3]
country = content.text
end
if content = x.css('td')[4]
description =content.text
end
{
:name => name,
:image => image,
:type => type,
:country => country,
:description => description,
}
Bread.find_or_create(:title => name, :description => description, :image_url => image, :country_origin => country, :type => type)
}
end
bready = Scraper.new
bready.summary
puts "atta boy"
end
Thanks!
Invoke the the scraper from a rake task.
lib/tasks/scraper.rake
namespace :app do
desc "Scrape breads"
task :scrape_breads => :environment do
Scraper.new.summary
end
end
Now, you can run the rake task as follows:
rake app:scrape_breads
It looks like the Bread class is not loaded.

Resources