How to speed up sitemap_generator with parallel gem

How to speed up sitemap_generator with parallel gem - ruby-on-rails

I am trying to speed up sitemap_generator by adding parallelization via the parallel gem. I have the following code but my groups aren't getting written to the public/sitemaps directory. I am thinking it's due to lambdas getting executed in a different space in parallel. Any feedback would be helpful. Thanks!
#!/usr/bin/env ruby
require 'rubygems'
require 'sitemap_generator'
require 'benchmark'
require 'parallel'
require 'random-word'
SitemapGenerator::Sitemap.default_host = "http://localhost"
a = lambda {
SitemapGenerator::Sitemap.group(:filename => :biz, :sitemaps_path => 'sitemaps/biz/') do
(1..1000).each do |index|
url = "/#{RandomWord.adjs.next}/#{RandomWord.nouns.next}"
add url, :priority => 0.8
end
end
}
b = lambda {
SitemapGenerator::Sitemap.group(:filename => :wedding_ugc, :sitemaps_path => 'sitemaps/ugc') do
(1..1000).each do |index|
url = "/#{RandomWord.adjs.next}/#{RandomWord.nouns.next}"
add url, :priority => 0.8
end
end
}
#working example
# SitemapGenerator::Sitemap.default_host = "http://localhost"
# SitemapGenerator::Sitemap.create(:compress => false) do
# group(:filename => :biz, :sitemaps_path => 'sitemaps/biz/') do
# (1..1000).each do |index|
# url = "/#{RandomWord.adjs.next}/#{RandomWord.nouns.next}"
# add url, :priority => 0.8
# end
# end
# end
puts Time.now
Parallel.each([a,b]){|job| job.call()}
puts Time.now

I got this working and posted the solution on github here
Here is the code incase the url gets broken.
SitemapGenerator::Sitemap.create(:compress => false, :create_index => false) do
group1 = lambda {
group = sitemap.group(:filename => :group1, :sitemaps_path => 'sitemaps/group1') do
Record.find_each do |record|
add '/record/path'
end
end
group.sitemap.write unless group.sitemap.written? #write if not full
}
# group2 like above...
Parallel.each([group1, group2], :in_processes => 8) do |group|
group.call
end
end
#regenerate the index sitemap xml file because I couldn't figure out how to track it with multiple processes
SitemapGenerator::Sitemap.create(:compress => false) do
Dir.chdir(sitemap.public_path.to_s)
xml_files = File.join("**", "sitemaps", "**", "*.xml")
xml_file_paths = Dir.glob(xml_files)
xml_file_paths.each do |file|
add file
end
end

Related

What could cause performance issue(s) when using roadie-rails for our case?

Originally posted as
https://github.com/Mange/roadie-rails/issues/75
We are seeing performance issue for our daily email jobs
By using NewRelic custom instrumentation,
we found out that most time is spent in calling Roadies
Screenshot of our NewRelic data for an example worker:
The integration code:
# frozen_string_literal: true
require "rails"
require "action_controller"
require "contracts"
require "memoist"
require "roadie"
require "roadie-rails"
require "new_relic/agent/method_tracer"
module Shared::MailerMixins
module WithRoadieIntegration
# I don't want to include the constants into the class as well
module Concern
def self.included(base)
base.extend ClassMethods
end
include ::NewRelic::Agent::MethodTracer
def mail(*args, &block)
super.tap do |m|
options = roadie_options
next unless options
trace_execution_scoped(
[
[
"WithRoadieIntegration",
"Roadie::Rails::MailInliner.new(m, options).execute",
].join("/"),
],
) do
Roadie::Rails::MailInliner.new(m, options).execute
end
end
end
private
def roadie_options
::Rails.application.config.roadie.tap do |options|
options.asset_providers = [UserAssetsProvider.new]
options.external_asset_providers = [UserAssetsProvider.new]
options.keep_uninlinable_css = false
options.url_options = url_options.slice(*[
:host,
:port,
:path,
:protocol,
:scheme,
])
end
end
add_method_tracer(
:roadie_options,
"WithRoadieIntegration/roadie_options",
)
end
class UserAssetsProvider
extend(
::Memoist,
)
include(
::Contracts::Core,
::Contracts::Builtin,
)
include ::NewRelic::Agent::MethodTracer
ABSOLUTE_ASSET_PATH_REGEXP = /\A#{Regexp.escape("//")}.+#{Regexp.escape("/assets/")}/i
Contract String => Maybe[Roadie::Stylesheet]
def find_stylesheet(name)
return nil unless file_exists?(name)
Roadie::Stylesheet.new("whatever", stylesheet_content(name))
end
add_method_tracer(
:find_stylesheet,
"UserAssetsProvider/find_stylesheet",
)
Contract String => Roadie::Stylesheet
def find_stylesheet!(name)
stylesheet = find_stylesheet(name)
if stylesheet.nil?
raise Roadie::CssNotFound.new(
name,
"does not exists",
self,
)
end
stylesheet
end
add_method_tracer(
:find_stylesheet!,
"UserAssetsProvider/find_stylesheet!",
)
private
def file_exists?(name)
if assets_precompiled?
File.exists?(local_file_path(name))
else
sprockets_asset(name)
end
end
memoize :file_exists?
# If on-the-fly asset compilation is disabled, we must be precompiling assets.
def assets_precompiled?
!Rails.configuration.assets.compile
rescue
false
end
def local_file_path(name)
asset_path = asset_path(name)
if asset_path.match(ABSOLUTE_ASSET_PATH_REGEXP)
asset_path.gsub!(ABSOLUTE_ASSET_PATH_REGEXP, "assets/")
end
File.join(Rails.public_path, asset_path)
end
memoize :local_file_path
add_method_tracer(
:local_file_path,
"UserAssetsProvider/local_file_path",
)
def sprockets_asset(name)
asset_path = asset_path(name)
if asset_path.match(ABSOLUTE_ASSET_PATH_REGEXP)
asset_path.gsub!(ABSOLUTE_ASSET_PATH_REGEXP, "")
end
# Strange thing is since rails 4.2
# name is passed in like
# `/assets/mailer-a9c96bd713d0b091297b82053ccd9155b933c00a53595812d755825d1747f42d.css`
# Before any processing
# And since `sprockets_asset` is used for preview
# We just "fix" the name by removing the
#
# Regexp taken from gem `asset_sync`
# https://github.com/AssetSync/asset_sync/blob/v1.2.1/lib/asset_sync/storage.rb#L142
#
# Modified to match what we need here (we need `.css` suffix)
if asset_path =~ /-[0-9a-fA-F]{32,}\.css$/
asset_path.gsub!(/-[0-9a-fA-F]{32,}\.css$/, ".css")
end
Rails.application.assets.find_asset(asset_path)
end
add_method_tracer(
:sprockets_asset,
"UserAssetsProvider/sprockets_asset",
)
def asset_path(name)
name.gsub(%r{^[/]?assets/}, "")
end
Contract String => String
def stylesheet_content(name)
if assets_precompiled?
File.read(local_file_path(name))
else
# This will compile and return the asset
sprockets_asset(name).to_s
end.strip
end
memoize :stylesheet_content
add_method_tracer(
:stylesheet_content,
"UserAssetsProvider/stylesheet_content",
)
end
end
end

I would like to report my own findings
With NewRelic data, we think most of the time is spent on
Roadies::Inliner/selector_elements => Roadie::Inliner/elements_matching_selector
And it seems a stylesheet with more style rules will make the style inlining takes longer
Benchmark code will be something like:
# frozen_string_literal: true
require "benchmark/ips"
class TestMailer < ::ActionMailer::Base
def show(benchmark_file_path:)
return mail(
from: "somewhere#test.com",
to: ["somewhere#test.com"],
subject: "some subject",
# This is trying to workaround a strange bug in `mail` gem
# https://github.com/mikel/mail/issues/912#issuecomment-156186383
content_type: "text/html",
) do |format|
format.html do
render(
file: benchmark_file_path,
layout: false,
)
end
end
end
end
Benchmark.ips do |x|
x.warmup = 5
x.time = 60
options = Roadie::Rails::Options.new(
# Use your own provider or use built-in providers
# I use a custom provider which can be used inside a rails app,
# See https://github.com/Mange/roadie for built-in providers
#
# options.asset_providers = [UserAssetsProvider.new]
# options.external_asset_providers = [UserAssetsProvider.new]
options.keep_uninlinable_css = false
)
# Need to prepare html_file yourself with
# different stylesheet tag pointing to two different stylesheet files
x.report("fat") do
message = ::TestMailer.
show(
benchmark_file_path: "benchmark-fat-stylesheet.html",
).message.tap do |m|
Roadie::Rails::MailInliner.new(m, options).execute
end
if message.body.to_s =~ /stylesheet/
raise "stylesheet not processed"
end
end
x.report("slim") do
message = ::TestMailer.
show(
benchmark_file_path: "benchmark-slim-stylesheet.html",
).message
if message.body.to_s =~ /stylesheet/
raise "stylesheet not processed"
end
end
# Compare the iterations per second of the various reports!
x.compare!
end

Ruby on Rails, Zendesk API integration not loading the client

I am trying to set up the Zendesk API in my app, I have decided to go with the API that was built by Zendesk
I have set up the initializer object to load the client.
config/initializers/zendesk.rb
require 'zendesk_api'
client = ZendeskAPI::Client.new do |config|
# Mandatory:
config.url = Rails.application.secrets[:zendesk][:url]
# Basic / Token Authentication
config.username = Rails.application.secrets[:zendesk][:username]
config.token = Rails.application.secrets[:zendesk][:token]
# Optional:
# Retry uses middleware to notify the user
# when hitting the rate limit, sleep automatically,
# then retry the request.
config.retry = true
# Logger prints to STDERR by default, to e.g. print to stdout:
require 'logger'
config.logger = Logger.new(STDOUT)
# Changes Faraday adapter
# config.adapter = :patron
# Merged with the default client options hash
# config.client_options = { :ssl => false }
# When getting the error 'hostname does not match the server certificate'
# use the API at https://yoursubdomain.zendesk.com/api/v2
end
This is pretty much copy paste from the site, but I have decided on using the token + username combination.
I then created a service object that I pass a JSON object and have it construct tickets. This service object is called from a controller.
app/services/zendesk_notifier.rb
class ZendeskNotifier
attr_reader :data
def initialize(data)
#data = data
end
def create_ticket
options = {:comment => { :value => data[:reasons] }, :priority => "urgent" }
if for_operations?
options[:subject] = "Ops to get additional info for CC"
options[:requester] = { :email => 'originations#testing1.com' }
elsif school_in_usa_or_canada?
options[:subject] = "SRM to communicate with student"
options[:requester] = { :email => 'srm#testing2.com' }
else
options[:subject] = "SRM to communicate with student"
options[:requester] = { :email => 'srm_row#testing3.com' }
end
ZendeskAPI::Ticket.create!(client, options)
end
private
def for_operations?
data[:delegate] == 1
end
def school_in_usa_or_canada?
data[:campus_country] == "US" || "CA"
end
end
But now I am getting
NameError - undefined local variable or method `client' for #<ZendeskNotifier:0x007fdc7e5882b8>:
app/services/zendesk_notifier.rb:20:in `create_ticket'
app/controllers/review_queue_applications_controller.rb:46:in `post_review'
I thought that the client was the same one defined in my config initializer. Somehow I think this is a different object now. I have tried looking at their documentation for more information but I am lost as to what this is?

If you want to use the client that is defined in the initializer you would need to make it global by changing it to $client. Currently you have it setup as a local variable.

I used a slightly different way of initializing the client, copying from this example rails app using the standard Zendesk API gem:
https://github.com/devarispbrown/zendesk_help_rails/blob/master/app/controllers/application_controller.rb
As danielrsmith noted, the client variable is out of scope. You could instead have an initializer like this:
config/initializers/zendesk_client.rb:
class ZendeskClient < ZendeskAPI::Client
def self.instance
#instance ||= new do |config|
config.url = Rails.application.secrets[:zendesk][:url]
config.username = Rails.application.secrets[:zendesk][:username]
config.token = Rails.application.secrets[:zendesk][:token]
config.retry = true
config.logger = Logger.new(STDOUT)
end
end
end
Then return the client elsewhere by client = ZendeskClient.instance (abridged for brevity):
app/services/zendesk_notifier.rb:
class ZendeskNotifier
attr_reader :data
def initialize(data)
#data = data
#client = ZendeskClient.instance
end
def create_ticket
options = {:comment => { :value => data[:reasons] }, :priority => "urgent" }
...
ZendeskAPI::Ticket.create!(#client, options)
end
...
end
Hope this helps.

how do i get message list with Gmail Api?

i want to access list of messages
Object:
2.0.0-p481 :008 > g.gmail_api.users.messages.list
=> # < Google::APIClient::Method:0x41c948c ID:gmail.users.messages.list >
i'm new in this API and unable to get how do i use Gmail API.
Thanks

# Google
gem "omniauth-google-oauth2"
gem "google-api-client"
in my model
def query_google( email )
self.refresh_token_from_google if self.expires_at.to_i < Time.now.to_i
#google_api_client = Google::APIClient.new(
application_name: 'Joggle',
application_version: '1.0.0'
)
#google_api_client.authorization.access_token = self.access_key
#gmail = #google_api_client.discovered_api('gmail', "v1")
# https://developers.google.com/gmail/api/v1/reference/users/messages/list
# Now that we instantiated gmail, we can take the category (messages) and the method
# You can also add parameters if you wish to do so:
# #calendar_events = google_api_client.execute(
# :api_method => #calendar.events.list,
# :parameters => {
# "calendarId" => current_user.email,
# 'timeMin' => date.to_s,
# 'timeMax' => max_date.to_s
# # 'items' => [{'id' => current_user.email}]
# },
# :headers => {'Content-Type' => 'application/json'}
# )
#emails = #google_api_client.execute(
api_method: #gmail.users.messages.list,
parameters: {
userId: "me",
# searching messages based on number of queries:
# https://developers.google.com/gmail/api/guides/filtering
q: "from:" + email.to_s
},
headers: {'Content-Type' => 'application/json'}
)
count = #emails.data.messages.count
Rails.logger.error count
{count: count, last_emails: get_three_emails} if count > 0
end
for reference : https://github.com/google/google-api-ruby-client/issues/135

Not a ruby expert but that looks like a method, can you call () it? I imagine you've gone through steps to setup authentication, etc? the messages.list() function should be able to be called without any parameters generally (well, you may need to specify a userId, which you can just use "me" for to get the authenticated user).

How do you output call tree profiling for KCacheGrind with ruby-prof for a Rails app?

According to the documentation, you can profile Rails apps
http://ruby-prof.rubyforge.org/
I added this to my config.ru
if Rails.env.development?
use Rack::RubyProf, :path => 'tmp/profile'
end
However it only outputs the following files
users-1-call_stack.html
users-1-flat.txt
users-1-graph.html
users-1-graph.txt
The output is completely incomprehensible. So I downloaded QCacheGrind for Windows.
http://sourceforge.net/projects/qcachegrindwin/?source=recommended
It won't read any of those files. The ruby-prof docs says that you can generate KCacheGrind files
RubyProf::CallTreePrinter - Creates a call tree report compatible with KCachegrind.
But it won't say how to do it with Rails. I looked at the page for RubyProf, but it was empty.
http://ruby-prof.rubyforge.org/classes/Rack/RubyProf.html
Ruby 2.0.0, Rails 4.0.3

helper_method :profile
around_action :profile, only: [:show]
def profile(prefix = "profile")
result = RubyProf.profile { yield }
# dir = File.join(Rails.root, "tmp", "profile", params[:controller].parameterize)
dir = File.join(Rails.root, "tmp", "profile")
FileUtils.mkdir_p(dir)
file = File.join(dir, "callgrind.%s.%s.%s" % [prefix.parameterize, params[:action].parameterize, Time.now.to_s.parameterize] )
open(file, "w") {|f| RubyProf::CallTreePrinter.new(result).print(f, :min_percent => 1) }
end

Change config.ru
use Rack::RubyProf, :path => ::File.expand_path('tmp/profile'),
:printers => {::RubyProf::FlatPrinter => 'flat.txt',
::RubyProf::GraphPrinter => 'graph.txt',
::RubyProf::GraphHtmlPrinter => 'graph.html',
::RubyProf::CallStackPrinter => 'call_stack.html',
::RubyProf::CallTreePrinter => 'call_grind.txt',
}

Relationships created by a rake task are not persisted though the rails server

I'm working my first project using Neo4j. I'm parsing wikipedia's page and pagelinks dumps to create a graph where the nodes are pages and the edges are links.
I've defined some rake tasks that download the dumps, parse the data, and save it in a Neo4j database. At the end of the rake task I print the number of pages and links created, and some of the pages with the most links. Here is the output of the raks task for the zawiki.
$ rake wiki[zawiki]
[ omitted ]
...
:: Done parsing zawiki
:: 1984 pages
:: 2144 links
:: The pages with the most links are:
9625.0 - Emijrp/List_of_Wikipedians_by_number_of_edits_(bots_included): 40
1363.0 - Gvangjsih_Bouxcuengh_Swcigih: 30
9112.0 - Fuzsuih: 27
1367.0 - Cungzcoj: 26
9279.0 - Vangz_Yenfanh: 19
It looks like pages and links are being created, but when I start a rails console, or the server the links aren't found.
$ rails c
jruby-1.7.5 :013 > Pages.all.count
=> 1984
jruby-1.7.5 :003 > Pages.all.reduce(0) { |count, page| count + page.links.count}
=> 0
jruby-1.7.5 :012 > Pages.all.sort_by { |p| p.links.count }.reverse[0...5].map { |p| p.links.count }
=> [0, 0, 0, 0, 0]
Here is the rake task, and this is the projects github page. Can anyone tell me why the links aren't saved?
DUMP_DIR = Rails.root.join('lib','assets')
desc "Download wiki dumps and parse them"
task :wiki, [:wiki] => 'wiki:all'
namespace :wiki do
task :all, [:wiki] => [:get, :parse] do |t, args|
# Print info about the newly created pages and links.
link_count = 0
Pages.all.each do |page|
link_count += page.links.count
end
indent "Done parsing #{args[:wiki]}"
indent "#{Pages.count} pages"
indent "#{link_count} links"
indent "The pages with the most links are:"
Pages.all.sort_by { |a| a.links.count }.reverse[0...5].each do |page|
puts "#{page.page_id} - #{page.title}: #{page.links.count}"
end
end
desc "Download wiki page and page links database dumps to /lib/assets"
task :get, :wiki do |t, args|
indent "Downloading dumps"
sh "#{Rails.root.join('lib', "get_wiki").to_s} #{args[:wiki]}"
indent "Done"
end
desc "Parse all dumps"
task :parse, [:wiki] => 'parse:all'
namespace :parse do
task :all, [:wiki] => [:pages, :pagelinks]
desc "Read wiki page dumps from lib/assests into the database"
task :pages, [:wiki] => :environment do |t, args|
parse_dumps('page', args[:wiki]) do |obj|
page = Pages.create_from_dump(obj)
end
indent = "Created #{Pages.count} pages"
end
desc "Read wiki pagelink dumps from lib/assests into the database"
task :pagelinks, [:wiki] => :environment do |t, args|
errors = 0
parse_dumps('pagelinks', args[:wiki]) do |from_id, namespace, to_title|
from = Pages.find(:page_id => from_id)
to = Pages.find(:title => to_title)
if to.nil? || from.nil?
errors = errors.succ
else
from.links << to
from.save
end
end
end
end
end
def indent *args
print ":: "
puts args
end
def parse_dumps(dump, wiki_match, &block)
wiki_match ||= /\w+/
DUMP_DIR.entries.each do |file|
file, wiki = *(file.to_s.match(Regexp.new "(#{wiki_match})-#{dump}.sql"))
if file
indent "Parsing #{wiki} #{dump.pluralize} from #{file}"
each_value(DUMP_DIR.join(file), &block)
end
end
end
def each_value(filename)
f = File.open(filename)
num_read = 0
begin # read file until line starting with INSERT INTO
line = f.gets
end until line.match /^INSERT INTO/
begin
line = line.match(/\(.*\)[,;]/)[0] # ignore begining of line until (...) object
begin
yield line[1..-3].split(',').map { |e| e.match(/^['"].*['"]$/) ? e[1..-2] : e.to_f }
num_read = num_read.succ
line = f.gets.chomp
end while(line[0] == '(') # until next insert block, or end of file
end while line.match /^INSERT INTO/ # Until line doesn't start with (...
f.close
end
app/models/pages.rb
class Pages < Neo4j::Rails::Model
include Neo4j::NodeMixin
has_n(:links).to(Pages)
property :page_id
property :namespace, :type => Fixnum
property :title, :type => String
property :restrictions, :type => String
property :counter, :type => Fixnum
property :is_redirect, :type => Fixnum
property :is_new, :type => Fixnum
property :random, :type => Float
property :touched, :type => String
property :latest, :type => Fixnum
property :length, :type => Fixnum
property :no_title_convert, :type => Fixnum
def self.create_from_dump(obj)
# TODO: I wonder if there is a way to compine these calls
page = {}
# order of this array is important, it corresponds to the data in obj
attrs = [:page_id, :namespace, :title, :restrictions, :counter, :is_redirect,
:is_new, :random, :touched, :latest, :length, :no_title_convert]
attrs.each_index { |i| page[attrs[i]] = obj[i] }
page = Pages.create(page)
return page
end
end

I must admit that I have no idea of how Neo4j works.
Transferring from other databases though, I too assume that either some validation is wrong, or maybe even something is misconfigured in your use of the database. The latter I can't give any advice on where to look, but if it's about validation, you can look at Page#errors or try calling Page#save! and see what it raises.
One crazy idea that just came to mind looking at this example is that maybe for that relation to be configured properly, you need a back reference, too.
Maybe has_n(:links).to(Page, :links) will help you. Or, if that doesn't work:
has_n(:links_left).to(Page, :links_right)
has_n(:links_right).from(Page, :links_left)
The more I look at this, the more I think the back reference to the same table is not configured properly and thus won't validate.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to speed up sitemap_generator with parallel gem - ruby-on-rails

Related

What could cause performance issue(s) when using roadie-rails for our case?

Ruby on Rails, Zendesk API integration not loading the client

how do i get message list with Gmail Api?

How do you output call tree profiling for KCacheGrind with ruby-prof for a Rails app?

Relationships created by a rake task are not persisted though the rails server

Categories

Resources