getting errors when scrapping data with kimurai ruby on rails - ruby-on-rails

FATAL -- web_scrapper_spider: Spider: stopped: {:spider_name=>"web_scrapper_spider", :status=>:failed, :error=>"#<NameError: uninitialized constant URI::HTTP\n\n raise InvalidUrlError, "Requested url is invalid: #{url}" unless URI.parse(url).kind_of?(URI::HTTP)\n >", :environment=>"development", :start_time=>2022-11-17 19:03:32.418101772 +0530, :stop_time=>2022-11-17 19:03:32.420534612 +0530, :running_time=>"0s", :visits=>{:requests=>0, :responses=>0}, :items=>{:sent=>0, :processed=>0}, :events=>{:requests_errors=>{}, :drop_items_errors=>{}, :custom=>{}}}
~/.rvm/gems/ruby-3.1.2/gems/kimurai-1.4.0/lib/kimurai/base.rb:194:in `request_to': uninitialized constant URI::HTTP (NameError)
raise InvalidUrlError, "Requested url is invalid: #{url}" unless URI.parse(url).kind_of?(URI::HTTP)
^^^^^^
from ~/.rvm/gems/ruby-3.1.2/gems/kimurai-1.4.0/lib/kimurai/base.rb:128:in `block in crawl!'
require 'open-uri'
require 'nokogiri'
require 'kimurai'
class WebScrapper < Kimurai::Base
#name = "web_scrapper_spider"
#engine = :mechanize
#start_urls = ["https://metaruby.com/"]
#config = {
user_agent: "Chrome/68.0.3440.84"
}
def parse(response, url:, data: {})
blogs = []
response.xpath("//table[#class='topic-list']//tbody//tr").each do |tr|
scrapped_data = {
title: tr.at('td[1]//span').text,
category: tr.at('td[1]//div//span').text,
date: tr.at('td[3]').text.strip
}
blogs << scrapped_data
save_to "results.json", scrapped_data.as_json, format: :json
end
end
end

From the logs posted in the question, it is clear that you are using Ruby version 3.1.2.
The gem Kimura that you are using is not maintained since 2020 and hence it is not compatible with Ruby 3.x due to the separation of positional and keyword arguments from 3.0.
Refer to Kimura issue #66 for more details and updates on the reported issue.
For now, you can use the gem Kimura with earlier Ruby versions.
I tried your code with Ruby 2.x and found no issues with the code.

Related

private method `open' called for URI:Module ERROR

Working with a gem but its throwing an error on an older build
private method `open' called for URI:Module
Extracted source (around line #135):
133
134
135
136
137
138
url = construct_url(path)
#URI::open(url, read_timeout: 600).read
URI.open(url, read_timeout: 600).read
rescue OpenURI::HTTPError => e
if error = JSON.load(e.io.read)["error"]
puts "server returns error for url: #{url}"
I'm running Rails 4.2.6 on Ruby ruby 2.3.8p459
Im a LOT out of my depth here :(
this is the code that calls the error in the gem
I unpacked and rebuilt with the comment out above as can't find documentation bth URI::open( and URI.open throw same error, private method called.
require 'google_search_results'
require 'open-uri'
params = {
q: #q,
location: "United Kingdom",
hl: "en",
gl: "uk",
google_domain: "google.co.uk",
api_key: ENV["google_search_api_key"],
num: 20
}
search = GoogleSearch.new(params)
#hash_results = search.get_hash
I know its to do with the version of Rails / Ruby Im running but don't know where to look or terminology of question to ask.
You can use the method like that from ruby 2.4 onwards, but for 2.3 you should just use it as:
open(url, read_timeout: 600).read

How to use SOAP service with xml in Rails (EU VAT number check)

I'd like to add a method in my Rails application that checks the validity of a VAT number using the EU's VIES system: http://ec.europa.eu/taxation_customs/vies/technicalInformation.html
I'm already pretty new to programming in Rails and the instructions here use xml. So I have trouble figuring this out. How should I include the code mentioned on the mentioned website in my Rails application?
In other words, what should the validate_vat(country, vatnumber) method below look like and how to process the response received from the SOAP service?
def vatsubmission
#organization = Organization.find(params[:id])
#organization.update_attributes(vat_params)
#organization.validate_vat(#organization.country, #organization.vatnumber) if (#organization.vatnumber? && #organization.vatnumber?)
# Process response
if valid == false
#organization.update_attributes(valid_vat: false)
flash.now[:danger] = "False VAT number"
render ...
elsif valid == true
#organization.update_attributes(valid_vat: true)
flash.now[:success] = "VAT number validated"
render ...
else
flash.now[:danger] = "VAT number could not be validated"
render ...
end
end
def validate_vat(country, vatnumber)
??
end
Update: I've added gem 'savon', '2.11.1' to my gemfile. In my controller I have:
def update
#organization = Organization.find(params[:id])
if #organization.check_valid == true
#organization.update_attributes(validnr: true)
else
#organization.update_attributes(validnr: false)
end
end
And I have added the following model method:
require 'savon'
def check_valid
debugger
if ["DK", "CY", "etc"].include? self.country
client = Savon.client(wsdl: 'http://ec.europa.eu/taxation_customs/vies/checkVatService.wsdl')
resp = client.call :check_vat do
message country_code: self.country, vat_number: self.vatnr
end
data = resp.to_hash[:check_vat_response]
data[:valid]
end
end
Error: The line message country_code: self.country, vat_number: self.vatnr fails with the error message: wrong number of arguments (1 for 2). I checked with the debugger and self.country as well as self.varnr do have values. What am I doing wrong?
For working with SOAP from Ruby I used excelent Savon gem.
With Savon v2, working code looks like this:
require 'savon'
client = Savon.client(wsdl: 'http://ec.europa.eu/taxation_customs/vies/checkVatService.wsdl')
resp = client.call :check_vat do
message country_code: 'AT', vat_number: '123'
end
data = resp.to_hash[:check_vat_response]
data[:valid] #=> false :)
Note Savon v3 is still in preparation.
I've just started using the ValVat gem for this and it workd beautifully so far!

'GMT-05:00) EST' option unexpectedly included in time_zone_select dropdown

(With Rails 3.2.20, ruby 1.9.3)
In my website in production the '(GMT-05:00) EST' option is unexpectedly included in the time_zone_select dropdown:
This timezone does not exists in ActiveSupport::TimeZone::MAPPING. It is injected ActiveSupport::TimeZone.zones_map:
def zones_map
#zones_map ||= begin
new_zones_names = MAPPING.keys - lazy_zones_map.keys
new_zones = Hash[new_zones_names.map { |place| [place, create(place)] }]
lazy_zones_map.merge(new_zones)
end
end
on the line lazy_zones_map.merge(new_zones) because lazy_zones_map contains the "EST" zone.
It does not happen on my development box and does not happen in my Rails production console: ActiveSupport::TimeZone.send :lazy_zones_map returns {"UTC"=>(GMT+00:00) UTC}.
But when I put in my view in production
<%= ActiveSupport::TimeZone.send :lazy_zones_map %>
I get {"UTC"=>(GMT+00:00) UTC, "EST"=>(GMT-05:00) EST}. I don't understand why it happens, any idea?
The #lazy_zones_map definition is
def lazy_zones_map
require_tzinfo
#lazy_zones_map ||= Hash.new do |hash, place|
hash[place] = create(place) if MAPPING.has_key?(place)
end
end
EDIT: the same version of tzinfo is used in production and development, 0.3.44 (I've checked with the command Gem.loaded_specs["tzinfo"]).

ruby net_dav sample puts

I'm trying to put a file on a site with WEB_DAV. (a ruby gem)
When I follow the example, I get a nil exception
#### GEMS
require 'rubygems'
begin
gem "net_dav"
rescue LoadError
system("gem install net_dav")
Gem.clear_paths
end
require 'net/dav'
uri = URI('https://staging.web.mysite');
user = "dave"
pasw = "correcthorsebatterystaple"
dav = Net::DAV.new(uri, :curl => false)
dav.verify_server = false
dav.credentials(user, pasw)
cargo = ("testing.txt")
File.open(cargo, "rb") { |stream|
dav.put(urI.path +'/'+ cargo, stream, File.size(cargo))
}
when I run this I get
`digest_auth': can't convert nil into String (TypeError)
this relates to line 197 in my nav.rb file.
request_digest << ':' << params['nonce']
So what I'm wondering is what step did I not add?
Is there a reasonable example of the correct use of this gem? Something that does something that works would be sweet :)
SIDE QUESTION: Is this the correct gem to use to do web_DAV? It seems an old unmaintained gem, perhaps there's something used by more to accomplish the task?
Try referencing the hash with a symbol rather than a string, i.e.
request_digest << ':' << params[:nonce]
In a simple test
baz = "baz"
params = {:foo => "bar"}
baz << ':' << params['foo']
results in the same error as you're getting.

ActionMailer 3 error - undefined method `encode!' for "Welcome":String

I'm getting this error while sending mail to the registered user in rails 3:
undefined method 'encode!' for "Welcome":String
I have the following code
#content = content
mail(:to => content[:email], :subject => "test")
If there is a subject then above error message displaying, if I remove the subject content
#content = content
mail(:to => content[:email], :subject => "") no error message sending with out subject
I'm using:
Rails version 3.0.1
action mailer 3.0.1
mail gem checks for Encoded global constant. If its defined by any gem or your code then it calls encode! on the mail object. Here is this call from UnstructuredField mail gem class:
def encode(value)
value.encode!(charset) if defined?(Encoding) && charset
(value.not_ascii_only? ? [value].pack("M").gsub("=\n", '') : value).gsub("\r", "=0D").gsub("\n", "=0A")
end
For me it was mail subject, a String, so I monkey patched String:
class String
def encode!(value)
#Do any encoding or simply return it
value
end
end
Try using ruby version 1.9
I got this error while using devise with rails 3.0.3 and ruby 1.8.7.
I migrated to ruby 1.9 and it worked like a charm.

Resources