Not sure how to fix Rubocop's Lint/UriEscapeUnescape warning
Tried replacing URI with CGI thinking that was the "drop in" replacement but that blew up the test suite.
Here's the error followed by the line of code where URI is being used:
app/models/media_file.rb:76:5: W: Lint/UriEscapeUnescape: URI.escape method is obsolete and should not be used. Instead, use CGI.escape, URI.encode_www_form or URI.encode_www_form_component depending on your specific use case.
URI ...
^^^
# app/models/media_file.rb
...
def cdn_url(format: nil)
if format.nil?
"#{s3_config.cloudfront_endpoint}/#{escape_url(key)}"
elsif converted_urls.with_indifferent_access[format.to_s]
filename = converted_urls.with_indifferent_access[format.to_s]
if URI.parse(escape_url(filename)).host
filename
else
"#{s3_config.cloudfront_endpoint}/#{escape_url(filename)}"
end
else
converted(url)
end
end
...
private
def escape_url(url)
URI
.escape(url)
.gsub(/\(/, '%28')
.gsub(/\)/, '%29')
.gsub(/\[/, '%5B')
.gsub(/\]/, '%5D')
end
EDIT: Adding sample output of strings escaped with URI and CGI:
url: images/medium/test-image.jpg
URI.escape(url): images/medium/test-image.jpg
CGI.escape(url): images%2Fmedium%2Ftest-image.jpg
url: images/medium/test-image.jpg
URI.escape(url): images/medium/test-image.jpg
CGI.escape(url): images%2Fmedium%2Ftest-image.jpg
It would appear CGI is not a drop in replacement for URI as the listing error might have you believe. Thoughts?
Encounter with the same problem, got it fixed by using addressable lib.
escaped_query = URI.escape(search,
Regexp.new("[^#{URI::PATTERN::UNRESERVED}]"))
#W: Lint/UriEscapeUnescape: URI.escape method is obsolete and should not be used. Instead, use CGI.escape, URI.encode_www_form or URI.encode_www_form_component depending on your specific use case.
Solved by:
gem addressable gem in gemspec or Gemfile.
gem 'addressable', '~> 2.7'
require addressable/uri
Add appropriate.
escaped_query = Addressable::URI.encode_component(search, Addressable::URI::CharacterClasses::QUERY)
Related
I have the following header:
From: =?iso-8859-1?Q?Marta_Falc=E3o?= <marta.falcao#example.com.br>
I can easily split out the stuff before the <, which leaves me with
"=?iso-8859-1?Q?Marta_Falc=E3o?="
What can I use to turn this into "Marta Falcão"?
Using the newer Mail gem:
Mail::Encodings.value_decode(str) or
Mail::Encodings.unquote_and_convert_to(str, to_encoding)
Thanks to Roland Illig for his comment, which led me to two options:
install rfc2047-ruby and call Rfc2047.decode(header)
install TMail and call TMail::Unquoter.unquote_and_convert_to(header, 'utf-8') or better yet TMail::Address.parse(header).friendly, the latter of which strips out the <email address> part
Use Ruby to implement RFC 2047 isn't hard:
module Rfc2047
TOKEN = /[\041\043-\047\052\053\055\060-\071\101-\132\134\136\137\141-\176]+/.freeze
ENCODED_TEXT = /[\041-\076\100-\176]+/.freeze
ENCODED_WORD = /=\?(?<charset>#{TOKEN})\?(?<encoding>[QB])\?(?<encoded_text>#{ENCODED_TEXT})\?=/i.freeze
class << self
def encode(input)
"=?#{input.encoding}?B?#{[input].pack('m0')}?="
end
def decode(input)
match_data = ENCODED_WORD.match(input)
raise ArgumentError if match_data.nil?
charset, encoding, encoded_text = match_data.captures
decoded =
case encoding
when 'Q', 'q' then encoded_text.unpack1('M')
when 'B', 'b' then encoded_text.unpack1('m')
end
decoded.force_encoding(charset)
end
end
end
Rfc2047.decode '=?iso-8859-1?Q?Marta_Falc=E3o?=' # => Marta_Falcão
Update
mikel/mail is currently having an encoding issue which might not decode the string correctly.
If that really bothers you, you can try new_rfc_2047:
$ gem install new_rfc_2047
$ ruby -rrfc_2047 -e 'puts Rfc2047.decode "From: =?iso-8859-1?Q?Marta_Falc=E3o?= <marta.falcao#example.com.br>"'
From: Marta Falcão <marta.falcao#example.com.br>
Since the source code of mikel/mail is a little too complicated for me to do the modification, I just made my own gem for this.
Gem source is here: https://github.com/tonytonyjan/rfc_2047/
I found a gem and I want to try it. The procedure of gem connecting is described here (in Russian language).
In which file should I write this code, and how can I use this variable in the view?
require 'builder'
xml = Builder::XmlMarkup.new(target: STDOUT, indent: 2)
xml.person(type: "programmer") do
xml.name do
xml.first "Dave"
xml.last "Thomas"
end
xml.location "Texas"
xml.preference("ruby")
end
I have the next method call:
Formatting.git_log_to_html(`git log --no-merges master --pretty=full #{interval}`)
The value of interval is something like release-20130325-01..release-20130327-04.
The git_log_to_html ruby method is the next (I am only pasting the line what raises the error):
module Formatting
def self.git_log_to_html(git_log)
...
git_log.gsub(/^commit /, "COMMIT_STARTcommit").split("COMMIT_STARTcommit").each do |commit|
...
end
end
This used to work, but actually I checked that gsub is raising an "invalid byte sequence in UTF-8" error.
Could you help to understand why and how can I fix it? :/
Here is the output of git_log:
https://dl.dropbox.com/u/42306424/output.txt
For some reason, this command:
git log --no-merges master --pretty=full #{interval}
is giving you a result that is not encoded in UTF-8, it may be that your computer is working with a different charset, try the following:
module Formatting
def self.git_log_to_html(git_log)
...
git_log.force_encoding("utf8").gsub(/^commit /, "COMMIT_STARTcommit").split("COMMIT_STARTcommit").each do |commit|
...
end
end
I'm not sure if that will work, but you can try.
If that doesn't work, you can check ruby iconv to detect the charset and encode it on utf-8: http://www.ruby-doc.org/stdlib-2.0/libdoc/iconv/rdoc/
Based on the file you added on the comment, I did:
require 'open-uri'
content = open('https://dl.dropbox.com/u/42306424/output.txt').read
content.gsub(/^commit /, "COMMIT_STARTcommit").split("COMMIT_STARTcommit")
and worked nice without any kind of troubles
btw, you can try:
require 'iconv'
module Formatting
def self.git_log_to_html(git_log)
...
git_log = Iconv.conv 'UTF-8', 'iso8859-1', git_log
git_log.gsub(/^commit /, "COMMIT_STARTcommit").split("COMMIT_STARTcommit").each do |commit|
...
end
end
but you should really detect the charset of the string before attempting a conversion to utf-8.
I am trying to parse og meta tags using the HTTParty gem using this code:
link = http://www.usatoday.com/story/gameon/2013/01/08/nfl-jets-tony-sparano-fired/1817037/
# link = http://news.yahoo.com/chicago-lottery-winners-death-ruled-homicide-181627271.html
resp = HTTParty.get(link)
ret_body = resp.body
# title
og_title = ret_body.match(/\<[Mm][Ee][Tt][Aa] property\=\"og:title\"\ content\=\"(.*?)\"\/\>/)
og_title = og_title[1].to_s
The problem is that it worked on some sites (yahoo!) but not others (usa today)
Don't parse HTML with regular expressions, because they're too fragile for anything but the simplest problems. A tiny change to the HTML can break the pattern, causing you to begin a slow battle of maintaining an ever expanding pattern. It's a war you won't win.
Instead, use a HTML parser. Ruby has Nokogiri, which is excellent. Here's how I'd do what you want:
require 'nokogiri'
require 'httparty'
%w[
http://www.usatoday.com/story/gameon/2013/01/08/nfl-jets-tony-sparano-fired/1817037/
http://news.yahoo.com/chicago-lottery-winners-death-ruled-homicide-181627271.html
].each do |link|
resp = HTTParty.get(link)
doc = Nokogiri::HTML(resp.body)
puts doc.at('meta[property="og:title"]')['content']
end
Which outputs:
Jets fire offensive coordinator Tony Sparano
Chicago lottery winner's death ruled a homicide
Perhaps I can offer an easier solution? Check out the OpenGraph gem.
It's a simple library for parsing Open Graph protocol information from web sites and should solve your problem.
Solution:
og_title = ret_body.match(/\<[Mm][Ee][Tt][Aa] property\=\"og:title\"\ content\=\"(.*?)\"[\s\/\>|\/\>]/)
og_title = og_title[1].to_s
Trailing whitespace messed up the parsing so make sure to check for that. I added an OR clause to the regex to allow for both trailing and non trailing whitespace.
How do I URI::encode a string like:
\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a
to get it in a format like:
%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A
as per RFC 1738?
Here's what I tried:
irb(main):123:0> URI::encode "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
ArgumentError: invalid byte sequence in UTF-8
from /usr/local/lib/ruby/1.9.1/uri/common.rb:219:in `gsub'
from /usr/local/lib/ruby/1.9.1/uri/common.rb:219:in `escape'
from /usr/local/lib/ruby/1.9.1/uri/common.rb:505:in `escape'
from (irb):123
from /usr/local/bin/irb:12:in `<main>'
Also:
irb(main):126:0> CGI::escape "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
ArgumentError: invalid byte sequence in UTF-8
from /usr/local/lib/ruby/1.9.1/cgi/util.rb:7:in `gsub'
from /usr/local/lib/ruby/1.9.1/cgi/util.rb:7:in `escape'
from (irb):126
from /usr/local/bin/irb:12:in `<main>'
I looked all about the internet and haven't found a way to do this, although I am almost positive that the other day I did this without any trouble at all.
str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".force_encoding('ASCII-8BIT')
puts CGI.escape str
=> "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"
Nowadays, you should use ERB::Util.url_encode or CGI.escape. The primary difference between them is their handling of spaces:
>> ERB::Util.url_encode("foo/bar? baz&")
=> "foo%2Fbar%3F%20baz%26"
>> CGI.escape("foo/bar? baz&")
=> "foo%2Fbar%3F+baz%26"
CGI.escape follows the CGI/HTML forms spec and gives you an application/x-www-form-urlencoded string, which requires spaces be escaped to +, whereas ERB::Util.url_encode follows RFC 3986, which requires them to be encoded as %20.
See "What's the difference between URI.escape and CGI.escape?" for more discussion.
str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a"
require 'cgi'
CGI.escape(str)
# => "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"
Taken from #J-Rou's comment
I was originally trying to escape special characters in a file name only, not on the path, from a full URL string.
ERB::Util.url_encode didn't work for my use:
helper.send(:url_encode, "http://example.com/?a=\11\15")
# => "http%3A%2F%2Fexample.com%2F%3Fa%3D%09%0D"
Based on two answers in "Why is URI.escape() marked as obsolete and where is this REGEXP::UNSAFE constant?", it looks like URI::RFC2396_Parser#escape is better than using URI::Escape#escape. However, they both are behaving the same to me:
URI.escape("http://example.com/?a=\11\15")
# => "http://example.com/?a=%09%0D"
URI::Parser.new.escape("http://example.com/?a=\11\15")
# => "http://example.com/?a=%09%0D"
You can use Addressable::URI gem for that:
require 'addressable/uri'
string = '\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a'
Addressable::URI.encode_component(string, Addressable::URI::CharacterClasses::QUERY)
# "%5Cx12%5Cx34%5Cx56%5Cx78%5Cx9a%5Cxbc%5Cxde%5Cxf1%5Cx23%5Cx45%5Cx67%5Cx89%5Cxab%5Cxcd%5Cxef%5Cx12%5Cx34%5Cx56%5Cx78%5Cx9a"
It uses more modern format, than CGI.escape, for example, it properly encodes space as %20 and not as + sign, you can read more in "The application/x-www-form-urlencoded type" on Wikipedia.
2.1.2 :008 > CGI.escape('Hello, this is me')
=> "Hello%2C+this+is+me"
2.1.2 :009 > Addressable::URI.encode_component('Hello, this is me', Addressable::URI::CharacterClasses::QUERY)
=> "Hello,%20this%20is%20me"
Code:
str = "http://localhost/with spaces and spaces"
encoded = URI::encode(str)
puts encoded
Result:
http://localhost/with%20spaces%20and%20spaces
I created a gem to make URI encoding stuff cleaner to use in your code. It takes care of binary encoding for you.
Run gem install uri-handler, then use:
require 'uri-handler'
str = "\x12\x34\x56\x78\x9a\xbc\xde\xf1\x23\x45\x67\x89\xab\xcd\xef\x12\x34\x56\x78\x9a".to_uri
# => "%124Vx%9A%BC%DE%F1%23Eg%89%AB%CD%EF%124Vx%9A"
It adds the URI conversion functionality into the String class. You can also pass it an argument with the optional encoding string you would like to use. By default it sets to encoding 'binary' if the straight UTF-8 encoding fails.
If you want to "encode" a full URL without having to think about manually splitting it into its different parts, I found the following worked in the same way that I used to use URI.encode:
URI.parse(my_url).to_s