I'm writing an application in Rails and I want to get a huge amount of information from an API – which I'm streaming through an Enumerator object as a CSV export. I want to rescue an error that is called within the Enumerator.
CONTROLLER: Enumerator
def csv_lines( url )
Enumerator.new do |y|
per_page = 200
# Parse parameters and get shelf information
_params = BrowseScraper.get_params(url)
shelf = BrowseScraper.get_preso( _params, 0 )
total_items = shelf['response']['total_results']['all'].to_i
total_pages = ( total_items / per_page.to_f ).ceil
shelf_info = BrowseScraper.crawl_ids( shelf['response']['query']['category'] )
y << BrowseScraper.csv_header(url, shelf_info, total_items, ["Tool ID", "Name", "Price", "URL"])
total_pages.times { |i| y << BrowseScraper.csv_body( _params, per_page, i+1) }
end
end
The following functions are raising errors, but I can't catch them outside of the Enumerator:
MODEL: methods
def self.get_params
response = open(url)
raise if response.code != 200
end
CONTROLLER: Display
def export
url = params[:url]
raise StandardError, "Please enter a Browse URL below" if !url || url.empty?
respond_to do |format|
format.csv do
render_csv(url)
end
format.html { render_csv(url) }
end
rescue => e
flash[:error] = e.message
redirect_to scraper_path
end
private
def render_csv( url )
set_file_headers
set_streaming_headers
response.status = 200
# Rails should iterate this enumerator
self.response_body = csv_lines(url)
end
def set_file_headers( name = "browse_export" )
headers["Content-Type"] ||= 'text/csv'
headers["Content-Disposition"] = "attachment; filename=\"#{name}.csv\""
headers["Content-Transfer-Encoding"] = "binary"
headers["Last-Modified"] = Time.now.ctime.to_s
end
def set_streaming_headers
#nginx doc: Setting this to "no" will allow unbuffered responses suitable for Comet and HTTP streaming applications
headers['X-Accel-Buffering'] = 'no'
headers["Cache-Control"] ||= "no-cache"
headers.delete("Content-Length")
end
Rescuing the error raised in export works. Rescuing an error within the Enumerator works (example:
Enumerator do |y|
begin
y << BrowseScraper.get_params(_params)
rescue => e
Rails.logger.error "Failed to get parameters: #{e.message}"
end
end
How can I rescue an exception outside of the Enumerator so I can properly redirect the user with a flash message? How do I pass the exception from within the Enumerator object? What is it about the Enumerator that isn't letting me rescue it with:
def method
Enumerator do |y|
y << BrowseScraper.get_params(_params)
end
rescue => e
Rails.logger.error "Error in Enumerator is #{e.message}"
end
I think I've figured out what's going on here. When you write code in an Enumerator, the block isn't actually executed within the Enumerator. Therefore, if I add a rescue within the Enumerator, it doesn't matter.
This is because the |y| in Enumerator is actually a yielder object which does the yielding (more on that in the Enumerator documentation or the Enumerator::Yielder documentation.
You have to rescue things beforehand.
Related
I want to redirect another page or give alert messages due to user table info in this controller.
but, I get bellow error message.
like this:
class Ir::FactsetUrlsController < Ir::ApplicationController
def show
user = User.find(params[:user_id])
factset_url, option = get_factset_url(user)
redirect_to factset_url, option
end
private
def get_factset_url(investor)
url, option = if !current_user.factset_enabled?
url, {alert: "こちらの閲覧には有料契約が必要です。"}
elsif investor.company.is_fresh?
url, {alert: "現在収録作業中です。申し訳ございませんが、少々お待ち下さい。"}
elsif investor.company.is_fresh?
url, {alert: "申し訳ございませんが、投資家DBに収録がありません 収録されていない理由 ①投資助言会社や投資アドバイザーなど直接保有していない ②HPや開示されている情報がない"}
else
investor.company.factset_url
end
end
end
Simply wrap in square brackets [ and ] within conditions as multiple values are being assigned.
def get_factset_url(investor)
url, option = if !current_user.factset_enabled?
[url, {alert: "こちらの閲覧には有料契約が必要です。"}]
elsif investor.company.is_fresh?
[url, {alert: "現在収録作業中です。申し訳ございませんが、少々お待ち下さい。"}]
elsif investor.company.is_fresh?
[url, {alert: "申し訳ございませんが、投資家DBに収録がありません 収録されていない理由 ①投資助言会社や投資アドバイザーなど直接保有していない ②HPや開示されている情報がない"}]
else
[investor.company.factset_url, nil]
end
end
I have a service method that makes api requests and if the response was not ok, it would notify Bugsnag. It looks like this:
def send_request
#response = HTTParty.get(api_endpoint, options)
return JSON.parse(#response.body, symbolize_names: true) if #response.ok?
raise StandardError.new(JSON.parse(#response.body))
rescue StandardError => exception
BugsnagService.notify(exception, #response)
end
My BugsnagService#notify looks something like this:
class BugsnagService
def self.notify(exception, response = nil, **options)
if response
response_body = if valid_json?(response.body) # Error right here
JSON.parse(response.body)
else
response.body
end
options[:response_body] = response_body
options[:response_code] = response.code
end
# Raising exception in test and development environment, or else the exception will be
# silently ignored.
raise exception if Rails.env.test? || Rails.env.development?
Bugsnag.notify(exception) do |report|
report.add_tab(:debug_info, options) if options.present?
end
end
def self.valid_json?(json_string)
JSON.parse(json_string)
true
rescue JSON::ParserError => e
false
end
end
I set response = nil in my notify method because not every error is an API error, so sometimes I would just call BugsnagService.notify(exception).
I found out that if I just call it like I am in the snippet above, it would raise an error saying it can't call .body on a Hash. Somehow, when I pass #response into BugsnagService#notify, the object turns from HTTParty::Response into Hash.
But if I pass something in for the **options parameter, it will work. So I can call it like this:
BugsnagService.notify(exception, #response, { })
I've been trying to figure this one out but I couldn't find anything that would explain this. I'm not sure if there's something wrong with the way I define my parameters or if this is some bug with the HTTParty gem. Can anyone see why this is happening? Thanks!
The problem is that your #response is being passed in as the options, as response can be nil. The double splat is converting it to a hash.
Try:
def testing(x, y = nil, **z)
puts "x = #{x}"
puts "y = #{y}"
puts "z = #{z}"
end
testing 1, 2, z: 3
#=> x = 1
#=> y = 2
#=> z = {:z=>3}
testing 1, y: 2
#=> x = 1
#=> y =
#=> z = {:y=>2}
testing 1, { y: 2 }, {}
#=> x = 1
#=> {:y=>2}
#=> {}
I'd suggest the best approach would be to have response be a keyword arg, as in:
def self.notify(exception, response: nil, **options)
...
end
That way, you can still omit or include the response as desired, and pass in subsequent options.
I have a Xpath query which accepts array elements for output using Axslx, I need to tidy up my ouput for certain conditions one of which is the 'Software included'
My xpath scrapes the following URL http://h10010.www1.hp.com/wwpc/ie/en/ho/WF06b/321957-321957-3329742-89318-89318-5186820-5231694.html?dnr=1
A sample of my code is below:
clues = Array.new
clues << 'Optical drive'
clues << 'Pointing device'
clues << 'Software included'
selector = "//td[text()='%s']/following-sibling::td"
data = clues.map do |clue|
xpath = selector % clue
[clue, doc.at(xpath).text.strip]
end
Axlsx::Package.new do |p|
p.workbook.add_worksheet do |sheet|
data.each { |datum| sheet.add_row datum }
end
p.serialize 'output.xlsx'
end
My Current output formatting
My Desired output formatting
If you can rely on the data always using ';' for separators, have a go at this:
data = []
clues.each do |clue|
xpath = selector % clue
details = doc.at(xpath).text.strip.split(';')
data << [clue, details.pop]
details.each { |detail| data << ['', detail] }
end
to generate the data before the Axlsx::Package.new block
In answer to you comment/question: You do it with something like this ;)
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'axlsx'
class Scraper
def initialize(url, selector)
#url = url
#selector = selector
end
def hooks
#hooks ||= {}
end
def add_hook(clue, p_roc)
hooks[clue] = p_roc
end
def export(file_name)
Scraper.clues.each do |clue|
if detail = parse_clue(clue)
output << [clue, detail.pop]
detail.each { |datum| output << ['', datum] }
end
end
serialize(file_name)
end
private
def self.clues
#clues ||= ['Operating system', 'Processors', 'Chipset', 'Memory type', 'Hard drive', 'Graphics',
'Ports', 'Webcam', 'Pointing device', 'Keyboard', 'Network interface', 'Chipset', 'Wireless',
'Power supply type', 'Energy efficiency', 'Weight', 'Minimum dimensions (W x D x H)',
'Warranty', 'Software included', 'Product color']
end
def doc
#doc ||= begin
Nokogiri::HTML(open(#url))
rescue
raise ArgumentError, 'Invalid URL - Nothing to parse'
end
end
def output
#output ||= []
end
def selector_for_clue(clue)
#selector % clue
end
def parse_clue(clue)
if element = doc.at(selector_for_clue(clue))
call_hook(clue, element) || element.inner_html.split('<br>').each(&:strip)
end
end
def call_hook(clue, element)
if hooks[clue].is_a? Proc
value = hooks[clue].call(element)
value.is_a?(Array) ? value : [value]
end
end
def package
#package ||= Axlsx::Package.new
end
def serialize(file_name)
package.workbook.add_worksheet do |sheet|
output.each { |datum| sheet.add_row datum }
end
package.serialize(file_name)
end
end
scraper = Scraper.new("http://h10010.www1.hp.com/wwpc/ie/en/ho/WF06b/321957-321957-3329742-89318-89318-5186820-5231694.html?dnr=1", "//td[text()='%s']/following-sibling::td")
# define a custom action to take against any elements found.
os_parse = Proc.new do |element|
element.inner_html.split('<br>').each(&:strip!).each(&:upcase!)
end
scraper.add_hook('Operating system', os_parse)
scraper.export('foo.xlsx')
And the FINAL answer is... a gem.
http://rubydoc.info/gems/ninja2k/0.0.2/frames
First of all Thanks for you all for helping programmers like me with your valuable inputs in solving day to day issues.
This is my first question in stack overflow as I am experiencing this problems from almost one week.
WE are building a crawler which crawls the specific websites and extract the contents from it, we are using mechanize to acheive this , as it was taking loads of time we decided to run the crawling process as a background task using resque with redis gem , but while sending the process to background I am experiencing the error as the title saying,
my code in lib/parsers/home.rb
require 'resque'
require File.dirname(__FILE__)+"/../index"
class Home < Index
Resque.enqueue(Index , :page )
def self.perform(page)
super (page)
search_form = page.form_with :name=>"frmAgent"
resuts_page = search_form.submit
total_entries = resuts_page.parser.xpath('//*[#id="PagingTable"]/tr[2]/td[2]').text
if total_entries =~ /(\d+)\s*$/
total_entries = $1
else
total_entries = "unknown"
end
start_res_idx = 1
while true
puts "Found #{total_entries} entries"
detail_links = resuts_page.parser.xpath('//*[#id="MainTable"]/tr/td/a')
detail_links.each do |d_link|
if d_link.attribute("class")
next
else
data_page = #agent.get d_link.attribute("href")
fields = get_fields_from_page data_page
save_result_page page.uri.to_s, fields
#break
end
end
site_done
rescue Exception => e
puts "error: #{e}"
end
end
and the superclass in lib/index.rb is
require 'resque'
require 'mechanize'
require 'mechanize/form'
class Index
#queue = :Index_queue
def initialize(site)
#site = site
#agent = Mechanize.new
#agent.user_agent = Mechanize::AGENT_ALIASES['Windows Mozilla']
#agent.follow_meta_refresh = true
#rows_parsed = 0
#rows_total = 0
rescue Exception => e
log "Unable to login: #{e.message}"
end
def run
log "Parsing..."
url = "unknown"
if #site.url
url = #site.url
log "Opening #{url} as a data page"
#page = #agent.get(url)
#perform method should be override in subclasses
#data = self.perform(#page)
else
#some sites do not have "datapage" URL
#for example after login you're already on your very own datapage
#this is to be addressed in 'perform' method of subclass
#data = self.perform(nil)
end
rescue Exception=>e
puts "Failed to parse URL '#{url}', exception=>"+e.message
set_site_status("error "+e.message)
end
#overriding method
def self.perform(page)
end
def save_result_page(url, result_params)
result = Result.find_by_sql(["select * from results where site_id = ? AND ref_code = ?", #site.id, utf8(result_params[:ref_code])]).first
if result.nil?
result_params[:site_id] = #site.id
result_params[:time_crawled] = DateTime.now().strftime "%Y-%m-%d %H:%M:%S"
result_params[:link] = url
result = Result.create result_params
else
result.result_fields.each do |f|
f.delete
end
result.link = url
result.time_crawled = DateTime.now().strftime "%Y-%m-%d %H:%M:%S"
result.html = result_params[:html]
fields = []
result_params[:result_fields_attributes].each do |f|
fields.push ResultField.new(f)
end
result.result_fields = fields
result.save
end
#rows_parsed +=1
msg = "Saved #{#rows_parsed}"
msg +=" of #{#rows_total}" if #rows_total.to_i > 0
log msg
return result
end
end
What's Wrong with this code?
Thanks
Here's how some of my existing logging code with Log4r is working. As you can see in the WorkerX::a_method, any time that I log a message I want the class name and the calling method to be included (I don't want all the caller history or any other noise, which was my purpose behind LgrHelper).
class WorkerX
include LgrHelper
def initialize(args = {})
#logger = Lgr.new({:debug => args[:debug], :logger_type => 'WorkerX'})
end
def a_method
error_msg("some error went down here")
# This prints out: "WorkerX::a_method - some error went down here"
end
end
class Lgr
require 'log4r'
include Log4r
def initialize(args = {}) # args: debug boolean, logger type
#debug = args[:debug]
#logger_type = args[:logger_type]
#logger = Log4r::Logger.new(#logger_type)
format = Log4r::PatternFormatter.new(:pattern => "%l:\t%d - %m")
outputter = Log4r::StdoutOutputter.new('console', :formatter => format)
#logger.outputters = outputter
if #debug then
#logger.level = DEBUG
else
#logger.level = INFO
end
end
def debug(msg)
#logger.debug(msg)
end
def info(msg)
#logger.info(msg)
end
def warn(msg)
#logger.warn(msg)
end
def error(msg)
#logger.error(msg)
end
def level
#logger.level
end
end
module LgrHelper
# This module should only be included in a class that has a #logger instance variable, obviously.
protected
def info_msg(msg)
#logger.info(log_intro_msg(self.method_caller_name) + msg)
end
def debug_msg(msg)
#logger.debug(log_intro_msg(self.method_caller_name) + msg)
end
def warn_msg(msg)
#logger.warn(log_intro_msg(self.method_caller_name) + msg)
end
def error_msg(msg)
#logger.error(log_intro_msg(self.method_caller_name) + msg)
end
def log_intro_msg(method)
msg = class_name
msg += '::'
msg += method
msg += ' - '
msg
end
def class_name
self.class.name
end
def method_caller_name
if /`(.*)'/.match(caller[1]) then # caller.first
$1
else
nil
end
end
end
I really don't like this approach. I'd rather just use the existing #logger instance variable to print the message and be smart enough to know the context. How can this, or similar simpler approach, be done?
My environment is Rails 2.3.11 (for now!).
After posting my answer using extend, (see "EDIT", below), I thought I'd try using set_trace_func to keep a sort of stack trace like in the discussion I posted to. Here is my final solution; the set_trace_proc call would be put in an initializer or similar.
#!/usr/bin/env ruby
# Keep track of the classes that invoke each "call" event
# and the method they called as an array of arrays.
# The array is in the format: [calling_class, called_method]
set_trace_func proc { |event, file, line, id, bind, klass|
if event == "call"
Thread.current[:callstack] ||= []
Thread.current[:callstack].push [klass, id]
elsif event == "return"
Thread.current[:callstack].pop
end
}
class Lgr
require 'log4r'
include Log4r
def initialize(args = {}) # args: debug boolean, logger type
#debug = args[:debug]
#logger_type = args[:logger_type]
#logger = Log4r::Logger.new(#logger_type)
format = Log4r::PatternFormatter.new(:pattern => "%l:\t%d - %m")
outputter = Log4r::StdoutOutputter.new('console', :formatter => format)
#logger.outputters = outputter
if #debug then
#logger.level = DEBUG
else
#logger.level = INFO
end
end
def debug(msg)
#logger.debug(msg)
end
def info(msg)
#logger.info(msg)
end
def warn(msg)
#logger.warn(msg)
end
def error(msg)
#logger.error(msg)
end
def level
#logger.level
end
def invoker
Thread.current[:callstack] ||= []
( Thread.current[:callstack][-2] || ['Kernel', 'main'] )
end
end
class CallingMethodLogger < Lgr
[:info, :debug, :warn, :error].each do |meth|
define_method(meth) { |msg| super("#{invoker[0]}::#{invoker[1]} - #{msg}") }
end
end
class WorkerX
def initialize(args = {})
#logger = CallingMethodLogger.new({:debug => args[:debug], :logger_type => 'WorkerX'})
end
def a_method
#logger.error("some error went down here")
# This prints out: "WorkerX::a_method - some error went down here"
end
end
w = WorkerX.new
w.a_method
I don't know how much, if any, the calls to the proc will affect the performance of an application; if it ends up being a concern, perhaps something not as intelligent about the calling class (like my old answer, below) will work better.
[EDIT: What follows is my old answer, referenced above.]
How about using extend? Here's a quick-and-dirty script I put together from your code to test it out; I had to reorder things to avoid errors, but the code is the same with the exception of LgrHelper (which I renamed CallingMethodLogger) and the second line of WorkerX's initializer:
#!/usr/bin/env ruby
module CallingMethodLogger
def info(msg)
super("#{#logger_type}::#{method_caller_name} - " + msg)
end
def debug(msg)
super("#{#logger_type}::#{method_caller_name} - " + msg)
end
def warn(msg)
super("#{#logger_type}::#{method_caller_name} - " + msg)
end
def error(msg)
super("#{#logger_type}::#{method_caller_name} - " + msg)
end
def method_caller_name
if /`(.*)'/.match(caller[1]) then # caller.first
$1
else
nil
end
end
end
class Lgr
require 'log4r'
include Log4r
def initialize(args = {}) # args: debug boolean, logger type
#debug = args[:debug]
#logger_type = args[:logger_type]
#logger = Log4r::Logger.new(#logger_type)
format = Log4r::PatternFormatter.new(:pattern => "%l:\t%d - %m")
outputter = Log4r::StdoutOutputter.new('console', :formatter => format)
#logger.outputters = outputter
if #debug then
#logger.level = DEBUG
else
#logger.level = INFO
end
end
def debug(msg)
#logger.debug(msg)
end
def info(msg)
#logger.info(msg)
end
def warn(msg)
#logger.warn(msg)
end
def error(msg)
#logger.error(msg)
end
def level
#logger.level
end
end
class WorkerX
def initialize(args = {})
#logger = Lgr.new({:debug => args[:debug], :logger_type => 'WorkerX'})
#logger.extend CallingMethodLogger
end
def a_method
#logger.error("some error went down here")
# This prints out: "WorkerX::a_method - some error went down here"
end
end
w = WorkerX.new
w.a_method
The output is:
ERROR: 2011-07-24 20:01:40 - WorkerX::a_method - some error went down here
The downside is, via this method, the caller's class name isn't automatically figured out; it's explicit based on the #logger_type passed into the Lgr instance. However, you may be able to use another method to get the actual name of the class--perhaps something like the call_stack gem or using Kernel#set_trace_func--see this thread.