Rails/unicode issue - ruby-on-rails

I have a bit of my Ruby/Rails (Ruby 2.0.0p195, Rails 3.2.13) project that works as a proxy; that is, you pass it a URL, it goes out and fetches the page, and presents it to you. This generally works as expected, but it seems to munge certain characters (such as è).
A simplified version of the controller is this:
class HomeController < ApplicationController
def geoproxy
require 'net/http'
require 'timeout'
rawurl = CGI::unescape(params[:url])
fixedurl = rawurl.gsub('\\', '%5C') # Escape backslashes... why oh why???!?
r = nil;
status = 200
content_type = ''
begin
Timeout::timeout(15) { # Time, in seconds
if request.get? then
res = Net::HTTP.get_response(URI.parse(fixedurl))
status = res.code # If there was an error, pass that code back to our caller
#page = res.body.encode('UTF-8')
content_type = res['content-type']
end
}
rescue Timeout::Error
#page = "TIMEOUT"
status = 504 # 504 Gateway Timeout We're the gateway, we timed out. Seems logical.
end
render :layout => false, :status => status, :content_type => content_type
end
end
The corresponding view is quite simple:
<%= raw #page %>
When I use this proxy to fetch XML containing an è (for example), I get the following error:
Encoding::UndefinedConversionError in HomeController#geoproxy
"\xE8" from ASCII-8BIT to UTF-8
This error occurs at the following line:
#page = res.body.encode('UTF-8')
If I remove the .encode(), the error is resolved, but my XML contains a placeholder instead of the è.
How can I get my project to display the XML properly?

Could you check if the following code works for you? I was able to fix similar problem of mine with it.
#page = res.body.force_encoding('Windows-1254').encode('UTF-8')

Related

How to catch a Rack RangeError in Rails 6

I have a Rails 6 app to which users can upload CSV files. Rails/Rack imposes a limit in the number of params that can be included in a request, and I've set this to a size larger than likely submissions to my app. However, I would like to return a friendly response if a too-large file is uploaded.
It looks like I need to add some custom middleware, to catch and rescue the error, but I can't get the code to work - the basic error is still raised without my rescue block being called.
The error from the server is:
Rack app error handling request { POST /[PATH_TO]/datasets }
#<RangeError: exceeded available parameter key space>
The code in my app/middleware/catch_errors.rb file is basically taken from a previous SO answer, where someone was catching ActionDispatch::ParamsParser::ParseError in JSON, but with my own code in the rescue block (which I realise may not work properly in this context, but that's not the issue right now):
class CatchErrors
def initialize(_app)
#app = _app
end
def call(_env)
begin
#app.call(_env)
rescue RangeError => _error
_error_output = "There were too many fields in the data you submitted: #{_error}"
if env['HTTP_ACCEPT'] =~ /application\/html/
Rails.logger.error("Caught RangeError: #{_error}")
flash[:error_title] = 'Too many fields in your data'
flash[:error_detail1] = _error_output
render 'static_pages/error', status: :bad_request
elsif env['HTTP_ACCEPT'] =~ /application\/json/
return [
:bad_request, { "Content-Type" => "application/json" },
[ { status: :bad_request, error: _error_output }.to_json ]
]
else
raise _error
end
end
end
end
I'm loading it in config.application.rb like this:
require_relative '../app/middleware/catch_errors'
...
config.middleware.use CatchErrors
I'm resetting the size limit for testing in app/initializers/rack.rb like this:
if Rack::Utils.respond_to?("key_space_limit=")
Rack::Utils.key_space_limit = 1
end
Any help gratefully received!
First, execute command to see all middlewares:
bin/rails middleware
config.middleware.use place your middleware at the bottom of the middleware stack. Because of that it can not catch error. Try to place it at the top:
config.middleware.insert_before 0, CatchErrors
Another point to mention, may be you will need to config.middleware.move_after or even config.middleware.delete some middleware. For instance, while tinkering I needed to place:
config.middleware.move_after CatchErrors, Rack::MiniProfiler

Seeding Data from JSON API to Ruby on Rails APP

I'm getting a no implicit conversion of String into Integer error that has me stumped, and unable to import user records and seed my database with them.
So far I have no problem accessing the data, but receive an error referencing the '[]' on the line with User.find... on it
The code I'm using is as follows:
require 'net/http'
require 'uri'
require 'json'
require 'faker'
#this script imports APR user data from the zendesk api and populates
the database with it.
uri = URI.parse("https://blahsupport.zendesk.com/api/v2/users.json")
request = Net::HTTP::Get.new(uri)
request.content_type = "application/json"
request.basic_auth("blah#blah.com", "blahpass")
req_options = {
use_ssl: uri.scheme == "https",
}
#response = Net::HTTP.start(uri.hostname, uri.port, req_options) do |http|
http.request(request)
end
puts #response.body
puts #response.message
puts #response.code
info = #response.body
info.force_encoding("utf-8")
File.write('blahusers1.json', info)
puts "File Created Successfully!"
file = File.read('blahusers1.json')
users = JSON.load(file)
users.each do |a|
User.find_or_create_by_zendesk_id(:zendesk_id => a['id'], :url => a['url'], :name => a['name'], :email => a['email'])
end
Any ideas on how I've gotten this error? Thank you for any help!
**Edit
Below is an example of the data being returned.
{"users":[{"id":333653859,"url":"https://blahblah.zendesk.com/api/v2/users/333653859.json","name":"Randy Blah","email":"randy#blah.com","created_at":"2014-08-06T14:31:24Z","updated_at":"2018-04-04T14:22:06Z","time_zone":"Pacific Time (US & Canada)","phone":null,"shared_phone_number":null,"photo":{"url":"https://aprtechsupport.zendesk.com/api/v2/attachments/68955389.json","id":68955389,"file_name":"Work.jpg","content_url":"https://aprtechsupport.zendesk.com/system/photos/6895/5389/Work.jpg","mapped_content_url":"https://blahblah.zendesk.com/system/photos/6895/5389/Work.jpg","content_type":"image/jpeg","size":2528,"width":80,"height":80,"inline":false,"thumbnails":[{"url":"https://blahblah.zendesk.com/api/v2/attachments/68955399.json","id":68955399,"file_name":"Work_thumb.jpg","content_url":"https://blahblah.zendesk.com/system/photos/6895/5389/Work_thumb.jpg","mapped_content_url":"https://blahblah.zendesk.com/system/photos/6895/5389/Work_thumb.jpg","content_type":"image/jpeg","size":2522,"width":32,"height":32,"inline":false}]},"locale_id":1,"locale":"en-US","organization_id":null,"role":"admin","verified":true,"external_id":null,"tags":[],"alias":"","active":true,"shared":false,"shared_agent":false,"last_login_at":"2018-04-04T14:21:44Z","two_factor_auth_enabled":null,"signature":"Thanks for contacting the helpdesk!\n-Randy","details":"","notes":"","role_type":null,"custom_role_id":null,"moderator":true,"ticket_restriction":null,"only_private_comments":false,"restricted_agent":false,"suspended":false,"chat_only":false,"default_group_id":21692179,"user_fields":{}}
The example data you posted has a root object users that contains the array of user objects. So when you loop users using users.each, a is actually an Array and not a user Hash like you expected.
When you try to access an element of an Array using a 'String' index, it gives you the exception – no implicit conversion of String into Integer
So, try changing
users = JSON.load(file)
to
users = JSON.load(file)['users']
to get it working like how you'd expect.

invalid URI - How to prevent, URI::InvalidURIError errors?

I got the following back from delayed_job:
[Worker(XXXXXX pid:3720)] Class#XXXXXXX failed with URI::InvalidURIError: bad URI(is not URI?): https://s3.amazonaws.com/cline-local-dev/2/attachments/542/original/mac-os-x[1].jpeg?AWSAccessKeyId=xxxxxxxx&Expires=1295403309&Signature=xxxxxxx%3D - 3 failed attempts
The way this URI comes from in my app is.
In my user_mailer I do:
#comment.attachments.each do |a|
attachments[a.attachment_file_name] = open(a.authenticated_url()) {|f| f.read }
end
Then in my attachments model:
def authenticated_url(style = nil, expires_in = 90.minutes)
AWS::S3::S3Object.url_for(attachment.path(style || attachment.default_style), attachment.bucket_name, :expires_in => expires_in, :use_ssl => attachment.s3_protocol == 'https')
end
That being said, is there some type of URI.encode or parsing I can do to prevent a valid URI (as I checked the URL works in my browser) for erroring and killing delayed_job in rails 3?
Thank you!
Ruby has (at least) two modules for dealing with URIs.
URI is part of the standard library.
Addressable::URI, is a separate gem, and more comprehensive, and claims to conform to the spec.
Parse a URL with either one, modify any parameters using the gem's methods, then convert it using to_s before passing it on, and you should be good to go.
I tried ' open( URI.parse(URI.encode( a.authenticated_url() )) ' but that errord with OpenURI::HTTPError: 403 Forbidden
If you navigated to that page via a browser and it succeeded, then later failed going to it directly via code, it's likely there is a cookie or session state that is missing. You might need to use something like Mechanize, which will maintain that state while allowing you to navigate through a site.
EDIT:
require 'addressable/uri'
url = 'http://www.example.com'
uri = Addressable::URI.parse(url)
uri.query_values = {
:foo => :bar,
:q => '"one two"'
}
uri.to_s # => "http://www.example.com?foo=bar&q=%22one%20two%22"

code to ping websites works sometimes

I'm testing out a piece of code to ping a bunch of websites I own on a regular basis, to make sure they're up.
I'm using rails and so far I have this hideous test action that I'm using to try it out (see below).
The problem though, is that sometimes it works, and other times it won't ... sometimes it runs through the code just fine, other times, it seems to completely ignore the begin/rescue block ...
a. I need help figuring out what the problem is
b. And refactoring this to make it look respectable.
Your help is much appreciated.
edit 1: Here is the updated code, sorry it took so long, pastie.org was down since yesterday http://pastie.org/927201
Its still doing the same thing ... skipping the begin block (because it only updates up_check_time) ... however if one of the sites times out, it actually updates everything (check_msg, code etc) correctly ... confusing, yeah?
require 'net/http'
require 'uri'
def ping
#sites = NewsSource.all
#sites.each do |site|
if site.uri and !site.uri.empty?
uri = URI.parse(site.uri)
response = nil
path = uri.path.blank? ? '/' : uri.path
path = uri.query.blank? ? path : "#{path}?#{uri.query}"
begin
Net::HTTP.start(uri.host, uri.port) {|http|
http.open_timeout = 30
http.read_timeout = 30
response = http.head(path)
}
if response.code.eql?('200') or response.code.eql?('301') or response.code.eql?('302')
site.up = true
else
site.up = false
end
site.up_check_msg = response.message
site.up_check_code = response.code
rescue Errno::EBADF
rescue Timeout::Error
site.up = false
site.up_check_msg = 'timeout'
site.up_check_code = '408'
end
site.up_check_time = 0.seconds.ago
site.save
end
end
end
You currently have an empty rescue block for Errno::EBADF so if that exception is raised then you will not be setting site.up to false.
Also, a couple of other minor improvements:
Instead of if site.uri and !site.uri.empty? you can use:
next if site.uri.nil? or site.uri.empty?
to skip that iteration of the each loop and avoid indenting the code by an additional level.
And:
if response.code.eql?('200') or response.code.eql?('301') or response.code.eql?('302')
site.up = true
else
site.up = false
end
can be written more concisely:
site.up = ['200', '301', '302'].include? response.code
If you tidy up the code with some of these tips then it might help narrow down the problem.
Here's a snippet from one of my programs, maybe it helps:
urls.each_with_index do |url, idx|
print "Processing URL #%04d: " % (idx+1)
uri = URI.parse(url)
response = nil
begin
Net::HTTP.start(uri.host, uri.port) do |http|
response = http.head(uri.path.size > 0 ? uri.path : "/")
end
rescue => e
puts "#{e.message} - #{url}"
next
end
# handle redirects
if response.is_a?(Net::HTTPRedirection)
new_uri = URI.parse(response['location'])
puts "URI redirects to #{new_uri}"
next
end
puts case response.code
when '200' then ...
when '404' then ...
else ...
end
end
The only thing that I can think of is that you are getting some other exception in your begin block. Since you are only explicitly rescuing Errno::EBADF, Timeout::Error it would appear that your begin and rescue got skipped. You might be able to verify this by getting rid of Errno::EBADF, Timeout::Error and just having a plain rescue, then put the following in your rescue block
logger.info(">>Exception was: "+$!)
Then look in your logs to see what exceptions you are getting.
If you are monitoring your servers why not use Nagios? it's free and also has some Ruby support, Here and Here.
EDIT:
Ruby GEM: http://hobodave.com/2010/01/10/simple-nagios-probes-in-ruby/

Mechanize with FakeWeb

I'm using Mechanize to extract the links from the page.
To ease with development, I'm using fakeweb to do superfast response to get less waiting and annoying with every code run.
tags_url = "http://website.com/tags/"
FakeWeb.register_uri(:get, tags_url, :body => "tags.txt")
agent = WWW::Mechanize.new
page = agent.get(tags_url)
page.links.each do |link|
puts link.text.strip
end
When I run the above code, it says:
nokogiri_test.rb:33: undefined method `links' for #<WWW::Mechanize::File:0x9a886e0> (NoMethodError)
After inspecting the class of the page object
puts page.class # => File
If I don't fake out the tags_url, it works since the page class is now Page
puts page.class # => Page
So, how can I use the fakeweb with mechanize to return Page instead of File object?
Use FakeWeb to replay a prefetched HTTP request:
tags_url = "http://website.com/tags/"
request = `curl -is #{tags_url}`
FakeWeb.register_uri(:get, tags_url, :response => request)
agent = WWW::Mechanize.new
page = agent.get(tags_url)
page.links.each do |link|
puts link.text.strip
end
Calling curl with the -i flag will include headers in the response.
You can easily fix that adding the option :content_type => "text/html" you your FakeWeb.register_uri call

Resources