when use forkmanager by ruby .it happen this:
ruby version:
ruby 2.4.1p111 (2017-03-22 revision 58053) [x64-mingw32]
system version:
windows7 64
Uncaught exception: fork() function is unimplemented on this machine
D:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/parallel-forkmanager-2.0.1/lib/parallel/forkmanager.rb:525:in fork'
D:/Ruby24-x64/lib/ruby/gems/2.4.0/gems/parallel-forkmanager-
2.0.1/lib/parallel/forkmanager.rb:525:instart'
#!/usr/bin/env ruby
#encoding: UTF-8
# 抓取每一个站点的首页链接数量
require 'rubygems'
require 'ap'
require 'json'
require 'net/http'
require 'nokogiri'
require 'forkmanager'
require 'beanstalk-client'
class MultipleCrawler
class Crawler
def initialize(user_agent, redirect_limit=1)
#user_agent = user_agent
#redirect_limit = redirect_limit
#timeout = 20
end
attr_accessor :user_agent, :redirect_limit, :timeout
def fetch(website)
print "Pid:#{Process.pid}, fetch: #{website}\n"
redirect, url = #redirect_limit, website
start_time = Time.now
redirecting = false
begin
begin
uri = URI.parse(url)
req = Net::HTTP::Get.new(uri.path)
req.add_field('User-Agent', #user_agent)
res = Net::HTTP.start(uri.host, uri.port) do |http|
http.read_timeout = #timeout
http.request(req)
end
if res.header['location'] # 遇到重定向,则url设定为location,再次抓取
url = res.header['location']
redirecting = true
end
redirect -= 1
end while redirecting and redirect>=0
opened_time = (Time.now - start_time).round(4) # 统计打开网站耗时
encoding = res.body.scan(/<meta.+?charset=["'\s]*([\w-]+)/i)[0]
encoding = encoding ? encoding[0].upcase : 'GB18030'
html = 'UTF-8'==encoding ? res.body : res.body.force_encoding('GB2312'==encoding || 'GBK'==encoding ? 'GB18030' : encoding).encode('UTF-8')
doc = Nokogiri::HTML(html)
processed_time = (Time.now - start_time - opened_time).round(4) # 统计分析链接耗时, 1.8.7, ('%.4f' % float).to_f 替换 round(4)
[opened_time, processed_time, doc.css('a[#href]').size, res.header['server']]
rescue =>e
e.message
end
end
end
def initialize(websites, beanstalk_jobs, pm_max=1, user_agent='', redirect_limit=1)
#websites = websites # 网址数组
#beanstalk_jobs = beanstalk_jobs # beanstalk服务器地址和管道参数
#pm_max = pm_max # 最大并行运行进程数
#user_agent = user_agent # user_agent 伪装成浏览器访问
#redirect_limit = redirect_limit # 允许最大重定向次数
#ipc_reader, #ipc_writer = IO.pipe # 缓存结果的 ipc 管道
end
attr_accessor :user_agent, :redirect_limit
def init_beanstalk_jobs # 准备beanstalk任务
beanstalk = Beanstalk::Pool.new(*#beanstalk_jobs)
#清空beanstalk的残留消息队列
begin
while job = beanstalk.reserve(0.1)
job.delete
end
rescue Beanstalk::TimedOut
print "Beanstalk queues cleared!\n"
end
#websites.size.times{|i| beanstalk.put(i)} # 将所有的任务压栈
beanstalk.close
rescue => e
puts e
exit
end
def process_jobs # 处理任务
start_time = Time.now
pm = Parallel::ForkManager.new(#pm_max)
#pm_max.times do |i|
# 启动后,立刻 next 不会等待进程执行完,这样才可以并行运算
pm.start(i) and next
beanstalk = Beanstalk::Pool.new(*#beanstalk_jobs)
# 关闭读取管道,子进程只返回数据
#ipc_reader.close
loop{
begin
# 检测超时为0.1秒,因为任务以前提前压栈
job = beanstalk.reserve(0.1)
index = job.body
job.delete
website = #websites[index.to_i]
result = Crawler.new(#user_agent).fetch(website)
#ipc_writer.puts( ({website=>result}).to_json )
rescue Beanstalk::DeadlineSoonError, Beanstalk::TimedOut, SystemExit, Interrupt
break
end
}
#ipc_writer.close
pm.finish(0)
end
#ipc_writer.close
begin
# 等待所有子进程处理完毕
pm.wait_all_children
# 遇到中断,打印消息
rescue SystemExit, Interrupt
print "Interrupt wait all children!\n"
ensure
results = read_results
# 打印处理结果
ap results, :indent => -4 , :index=>false
print "Process end, total: #{#websites.size}, crawled: #{results.size}, time: #{'%.4f' % (Time.now - start_time)}s.\n"
end
end
def read_results # 通过管道读取子进程抓取返回的数据
results = {}
while result = #ipc_reader.gets
results.merge! JSON.parse(result)
end
#ipc_reader.close
results
end
def run # 运行入口
init_beanstalk_jobs
process_jobs
end
end
websites = %w(
http://www.51buy.com/ http://www.360buy.com/ http://www.tmall.com/ http://www.taobao.com/
http://china.alibaba.com/ http://www.paipai.com/ http://shop.qq.com/ http://www.lightinthebox.com/
http://www.amazon.cn/ http://www.newegg.com.cn/ http://www.vancl.com/ http://www.yihaodian.com/
http://www.dangdang.com/ http://www.m18.com/ http://www.suning.com/ http://www.hstyle.com/
)
beanstalk_jobs = [['127.0.0.1:11300'],'crawler-jobs']
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
pm_max = 10
MultipleCrawler.new(websites, beanstalk_jobs, pm_max, user_agent).run
You appear to be running this on a Windows PC.
fork is a POSIX/Unix system call and is therefore only available on
POSIX/Unix systems.
A possible solution would be to use Cygwyn on your Windows machine.
See the Rails test guides, section 3.2 to change workers -- which use the said fork(method) to paralleled processes -- to threads. This last one use another method to parallel processes. Also take a look on the beginning of the section 3, which describes how the tests are made.
I have used a WSL2 to run Rails. However the local test needs the jruby to be executed in somewhat.
To change the parallelization method to use threads over forks put the following in your test_helper.rb
class ActiveSupport::TestCase
parallelize(workers: :number_of_processors, with: :threads)
end
After, specify in your command rails test how much workers you need, such as PARALLEL_WORKERS=1 rails test or PARALLEL_WORKERS=15 rails test.
Related
I'm trying to attach all pictures from the Microsoft Graph to my new Rails application using active storage and the rest-client gem.
It works for a single user I do it like this:
User.find_by_email("user.email#domain.com).avatar.attach io:StringIO.open(image.to_blob), filename: "avatar.jpg", content_type: metadata["mimeType"], identify: false
But in a batch loop, it doesn't work.
class RestController < ApplicationController
require 'rest-client'
def sync_azure_picture
#token = RestController.get_azure_token
User.find_each do |currentUser|
request_url = 'https://graph.microsoft.com/v1.0/users/'+currentUser[:id_azure]+'/photo/$value'
puts request_url
resp = RestClient.get(request_url,'Authorization' => #token)
image = MiniMagick::Image.read(resp.body)
metadata = image.data
currentUser.avatar.attach io:StringIO.open(image.to_blob), filename: "avatar.jpg", content_type: metadata["mimeType"], identify: false
end
end
end`
The error i'm getting is
RestClient::NotFound
Probably not all users have attachments? You can log errors and then check these urls
User.find_each do |user|
# ...
rescue RestClient::NotFound => e
Rails.logger.error("#{request_url}:")
Rails.logger.error(e)
end
EDIT: same things happens when I fork a process manually...
I'm getting some weird behavior with a Rails Job that calls a module of mine called RedisService.
I've added lib/modules to my autoload_paths but the TextService module that calls the RedisService one loses reference to it, sometimes immediately, sometimes 3 or 4 job calls in...
I've even required the module in my TextService to no avail, even added some puts to check that always show the module is defined and responds to the method I'm calling...!
Something escapes me...
Here's a gist to the backtrace
Repo: https://gitlab.com/thomasbromehead/snmp-simulator-ruby-manager.
ruby --version: 2.6.5
rails version: 6.1.3.1
My "service" objects:
Module that calls RedisService
require_relative 'redis_service'
module TextService
def self.write_to_file(dataObject, redis, path: "./")
begin
file_with_path = path + dataObject.filename
# Store all lines prior to the one being modified, File.read closes the file
f = File.read(file_with_path)
new_content = f.gsub(dataObject.old_set_value, dataObject.new_set_value)
# File.open closes the file when passed a block
File.open(file_with_path, "w") { |file| file.puts new_content }
puts "Redis is: #{redis}" ======> RedisService
puts "Redis responds to multi: #{redis.respond_to?(:multi)}" ======> true
redis.multi do
redis.zrem("#{dataObject.name}-sorted-set", dataObject.old_set_value)
redis.hset("#{dataObject.name}-offsets", "#{dataObject.start_index}:#{dataObject.oid}:#{dataObject.end_index}", dataObject.new_set_value)
redis.zadd("#{dataObject.name}-sorted-set", dataObject.start_index, dataObject.new_set_value)
end
rescue EOFError
end
end
Variation class called from VariateJob
require_relative '../../../lib/modules/redis_service'
module Snmp
class Variation
include ActiveModel::Model
attr_reader :oid, :type, :duration, :to, :from, :filename, :redis
def initialize(oid:nil, duration:nil, type:nil, to:nil, filename: nil, from:nil)
#to = to
#from = from
#oid = oid
#type = type
#filename = filename
#redis = RedisService
end
def run(data)
current_value, new_set_value, start_index, end_index = prepare_values(JSON.parse(data))
transferData = Snmp::TransferData.new({
filename: #filename,
old_set_value: current_value,
new_set_value: new_set_value,
start_index: start_index,
end_index: end_index,
name: #name,
oid: oid
})
TextService.write_to_file(transferData, #redis)
end
VariateJob
class VariateJob < ApplicationJob
queue_as :default
def perform(dumped_variation, data)
Marshal.load(dumped_variation).run(Marshal.load(data))
end
end
VariationsController
class VariationsController < ApplicationController
before_action :set_file_name, only: :start
def start
if params["linear"]
type = :linear
elsif params["random"]
type = :random
end
data = redis.hscan_each("##name-offsets", match: "*:#{params["snmp_variation"]["oid"]}*")
# data is an Enumerator, transform it to an array and dump to JSON
variation = Snmp::Variation.new(params_to_keywords(params["snmp_variation"]).merge({type: type}))
VariateJob.perform_later(Marshal.dump(variation), Marshal.dump(JSON.generate(data.to_a.first)))
end
RedisService
require 'redis'
module RedisService
include GlobalID::Identification
[...]
def self.multi(&block)
#redis.multi { block.call() }
end
[...]
end
You are not losing the reference to the RedisService, but to Redis in your RedisService. Probably because you use a server or worker that forks new processes and you don't initialize a new connection after the fork.
To fix this issue I would replace this method
def self.start(host,port)
#redis ||= Redis.new(host: host, port: port)
self
end
with
def self.redis
#redis ||= Redis.new(host: ::Snmpapp.redis_config[:host], port: ::Snmpapp.redis_config[:port])
end
And then I would replace all call to the #redis with a redis call to the new method.
Originally posted as
https://github.com/Mange/roadie-rails/issues/75
We are seeing performance issue for our daily email jobs
By using NewRelic custom instrumentation,
we found out that most time is spent in calling Roadies
Screenshot of our NewRelic data for an example worker:
The integration code:
# frozen_string_literal: true
require "rails"
require "action_controller"
require "contracts"
require "memoist"
require "roadie"
require "roadie-rails"
require "new_relic/agent/method_tracer"
module Shared::MailerMixins
module WithRoadieIntegration
# I don't want to include the constants into the class as well
module Concern
def self.included(base)
base.extend ClassMethods
end
include ::NewRelic::Agent::MethodTracer
def mail(*args, &block)
super.tap do |m|
options = roadie_options
next unless options
trace_execution_scoped(
[
[
"WithRoadieIntegration",
"Roadie::Rails::MailInliner.new(m, options).execute",
].join("/"),
],
) do
Roadie::Rails::MailInliner.new(m, options).execute
end
end
end
private
def roadie_options
::Rails.application.config.roadie.tap do |options|
options.asset_providers = [UserAssetsProvider.new]
options.external_asset_providers = [UserAssetsProvider.new]
options.keep_uninlinable_css = false
options.url_options = url_options.slice(*[
:host,
:port,
:path,
:protocol,
:scheme,
])
end
end
add_method_tracer(
:roadie_options,
"WithRoadieIntegration/roadie_options",
)
end
class UserAssetsProvider
extend(
::Memoist,
)
include(
::Contracts::Core,
::Contracts::Builtin,
)
include ::NewRelic::Agent::MethodTracer
ABSOLUTE_ASSET_PATH_REGEXP = /\A#{Regexp.escape("//")}.+#{Regexp.escape("/assets/")}/i
Contract String => Maybe[Roadie::Stylesheet]
def find_stylesheet(name)
return nil unless file_exists?(name)
Roadie::Stylesheet.new("whatever", stylesheet_content(name))
end
add_method_tracer(
:find_stylesheet,
"UserAssetsProvider/find_stylesheet",
)
Contract String => Roadie::Stylesheet
def find_stylesheet!(name)
stylesheet = find_stylesheet(name)
if stylesheet.nil?
raise Roadie::CssNotFound.new(
name,
"does not exists",
self,
)
end
stylesheet
end
add_method_tracer(
:find_stylesheet!,
"UserAssetsProvider/find_stylesheet!",
)
private
def file_exists?(name)
if assets_precompiled?
File.exists?(local_file_path(name))
else
sprockets_asset(name)
end
end
memoize :file_exists?
# If on-the-fly asset compilation is disabled, we must be precompiling assets.
def assets_precompiled?
!Rails.configuration.assets.compile
rescue
false
end
def local_file_path(name)
asset_path = asset_path(name)
if asset_path.match(ABSOLUTE_ASSET_PATH_REGEXP)
asset_path.gsub!(ABSOLUTE_ASSET_PATH_REGEXP, "assets/")
end
File.join(Rails.public_path, asset_path)
end
memoize :local_file_path
add_method_tracer(
:local_file_path,
"UserAssetsProvider/local_file_path",
)
def sprockets_asset(name)
asset_path = asset_path(name)
if asset_path.match(ABSOLUTE_ASSET_PATH_REGEXP)
asset_path.gsub!(ABSOLUTE_ASSET_PATH_REGEXP, "")
end
# Strange thing is since rails 4.2
# name is passed in like
# `/assets/mailer-a9c96bd713d0b091297b82053ccd9155b933c00a53595812d755825d1747f42d.css`
# Before any processing
# And since `sprockets_asset` is used for preview
# We just "fix" the name by removing the
#
# Regexp taken from gem `asset_sync`
# https://github.com/AssetSync/asset_sync/blob/v1.2.1/lib/asset_sync/storage.rb#L142
#
# Modified to match what we need here (we need `.css` suffix)
if asset_path =~ /-[0-9a-fA-F]{32,}\.css$/
asset_path.gsub!(/-[0-9a-fA-F]{32,}\.css$/, ".css")
end
Rails.application.assets.find_asset(asset_path)
end
add_method_tracer(
:sprockets_asset,
"UserAssetsProvider/sprockets_asset",
)
def asset_path(name)
name.gsub(%r{^[/]?assets/}, "")
end
Contract String => String
def stylesheet_content(name)
if assets_precompiled?
File.read(local_file_path(name))
else
# This will compile and return the asset
sprockets_asset(name).to_s
end.strip
end
memoize :stylesheet_content
add_method_tracer(
:stylesheet_content,
"UserAssetsProvider/stylesheet_content",
)
end
end
end
I would like to report my own findings
With NewRelic data, we think most of the time is spent on
Roadies::Inliner/selector_elements => Roadie::Inliner/elements_matching_selector
And it seems a stylesheet with more style rules will make the style inlining takes longer
Benchmark code will be something like:
# frozen_string_literal: true
require "benchmark/ips"
class TestMailer < ::ActionMailer::Base
def show(benchmark_file_path:)
return mail(
from: "somewhere#test.com",
to: ["somewhere#test.com"],
subject: "some subject",
# This is trying to workaround a strange bug in `mail` gem
# https://github.com/mikel/mail/issues/912#issuecomment-156186383
content_type: "text/html",
) do |format|
format.html do
render(
file: benchmark_file_path,
layout: false,
)
end
end
end
end
Benchmark.ips do |x|
x.warmup = 5
x.time = 60
options = Roadie::Rails::Options.new(
# Use your own provider or use built-in providers
# I use a custom provider which can be used inside a rails app,
# See https://github.com/Mange/roadie for built-in providers
#
# options.asset_providers = [UserAssetsProvider.new]
# options.external_asset_providers = [UserAssetsProvider.new]
options.keep_uninlinable_css = false
)
# Need to prepare html_file yourself with
# different stylesheet tag pointing to two different stylesheet files
x.report("fat") do
message = ::TestMailer.
show(
benchmark_file_path: "benchmark-fat-stylesheet.html",
).message.tap do |m|
Roadie::Rails::MailInliner.new(m, options).execute
end
if message.body.to_s =~ /stylesheet/
raise "stylesheet not processed"
end
end
x.report("slim") do
message = ::TestMailer.
show(
benchmark_file_path: "benchmark-slim-stylesheet.html",
).message
if message.body.to_s =~ /stylesheet/
raise "stylesheet not processed"
end
end
# Compare the iterations per second of the various reports!
x.compare!
end
I am trying to access the AdSense Management API using ruby. They recommend using their generic Google-API client library:
http://code.google.com/p/google-api-ruby-client/#Google_AdSense_Management_API
This hasn't been very helpful and I have run into errors:
Faraday conflicts in google_drive and google-api-client
Where should I start in order to get access to my AdSense data?
Thanks in advance.
Unfortunately, we haven't prepared any sample code for the AdSense Management API... yet! As you point out, though, the client library is generic, and should work with any of the newer Google APIs, so some of the other samples may help.
If you're running into any specific issues, please create a question focused on those and point me to it, and I'll do my best to help.
If you want a quick sample to get started, I can cook that up for you, but we should make sure the issues you're running into have to do with the AdSense Management API itself, and not just the client library, as the one you were linking to.
[Edit]
Here's a quick sample, based on Sinatra:
#!/usr/bin/ruby
require 'rubygems'
require 'sinatra'
require 'google/api_client'
FILENAME = 'auth.obj'
OAUTH_CLIENT_ID = 'INSERT_OAUTH2_CLIENT_ID_HERE'
OAUTH_CLIENT_SECRET = 'INSERT_OAUTH2_CLIENT_SECRET_HERE'
before do
#client = Google::APIClient.new
#client.authorization.client_id = OAUTH_CLIENT_ID
#client.authorization.client_secret = OAUTH_CLIENT_SECRET
#client.authorization.scope = 'https://www.googleapis.com/auth/adsense'
#client.authorization.redirect_uri = to('/oauth2callback')
#client.authorization.code = params[:code] if params[:code]
# Load the access token here if it's available
if File.exist?(FILENAME)
serialized_auth = IO.read(FILENAME)
#client.authorization = Marshal::load(serialized_auth)
end
if #client.authorization.refresh_token && #client.authorization.expired?
#client.authorization.fetch_access_token!
end
#adsense = #client.discovered_api('adsense', 'v1.1')
unless #client.authorization.access_token || request.path_info =~ /^\/oauth2/
redirect to('/oauth2authorize')
end
end
get '/oauth2authorize' do
redirect #client.authorization.authorization_uri.to_s, 303
end
get '/oauth2callback' do
#client.authorization.fetch_access_token!
# Persist the token here
serialized_auth = Marshal::dump(#client.authorization)
File.open(FILENAME, 'w') do |f|
f.write(serialized_auth)
end
redirect to('/')
end
get '/' do
call = {
:api_method => #adsense.reports.generate,
:parameters => {
'startDate' => '2011-01-01',
'endDate' => '2011-08-31',
'dimension' => ['MONTH', 'CUSTOM_CHANNEL_NAME'],
'metric' => ['EARNINGS', 'TOTAL_EARNINGS']
}
}
response = #client.execute(call)
output = ''
if response && response.data && response.data['rows'] &&
!response.data['rows'].empty?
result = response.data
output << '<table><tr>'
result['headers'].each do |header|
output << '<td>%s</td>' % header['name']
end
output << '</tr>'
result['rows'].each do |row|
output << '<tr>'
row.each do |column|
output << '<td>%s</td>' % column
end
output << '</tr>'
end
output << '</table>'
else
output << 'No rows returned'
end
output
end
I have a class which is currently being executed via delayed job. One of the tasks is to execute "rake spec" and redirect the output.
I do this as such:
class Executor
def execute_command(cmd, &block)
STDOUT.sync = true # That's all it takes...
IO.popen(cmd + " 2>&1") do |pipe| # Redirection is performed using operators
pipe.sync = true
while str = pipe.gets
block.call str # This is synchronous!
end
end
return $?.success?
end
end
However, none of the output appears and it doesn't even aware to execute the unit tests correctly.
Capistrano works and it works on OSX. My server is Ubuntu running Passenger.
Anyone have any ideas why the output wouldn't be redirecting?
Thanks
Ben
Try without the STDFDES redirection on cmd
here is what i used and what i get.
class Executor
def execute_command(cmd, &block)
STDOUT.sync = true # That's all it takes...
IO.popen(cmd) do |pipe| # Redirection is performed using operators
pipe.sync = true
while str = pipe.gets
block.call str # This is synchronous!
end
end
return $?.success?
end
end
Tested with:
ex = Executor.new
ex.execute_command "ps aux" do |str|
p str
end
result:
"USER PID %CPU %MEM VSZ RSS TT STAT STARTED
TIME COMMAND\n" "mitch 423 3.4 1.0 2750692 159552 ??
S 23Apr12 19:41.59 /Users/mitch/iTerm.app/Contents/MacOS/iTerm
-psn_0_40970\n" "_windowserver 90 3.1 1.8 3395124 301576 ?? Ss 20Apr12 75:19.86
/System/Library/Frameworks/ApplicationServices.framework/Frameworks/CoreGraphics.framework/Resources/WindowServer
-daemon\n" "mitch 78896 2.0 0.8 1067248 136088 ?? R Thu03PM 37:46.59 /Applications/Spotify.app/Contents/MacOS/Spotify
-psn_0_4900012\n" "mitch 436 1.8 1.0 1063952 169320 ?? S 23Apr12 100:23.87
/Applications/Skype.app/Contents/MacOS/Skype -psn_0_90134\n"