URI encoding not working - ruby-on-rails

On a rails app, I need to parse uris
a = 'some file name.txt'
URI(URI.encode(a)) # works
b = 'some filename with :colon in it.txt'
URI(URI.encode(b)) # fails URI::InvalidURIError: bad URI(is not URI?):
How can I safely pass a file name to URI that contains special characters? Why doesn't encode work on colon?

URI.escape (or encode) takes an optional second parameter. It's a Regexp matching all symbols that should be escaped. To escape all non-word characters you could use:
URI.encode('some filename with :colon in it.txt', /\W/)
#=> "some%20filename%20with%20%3Acolon%20in%20it%2Etxt"
There are two predefined regular expressions for encode:
URI::PATTERN::UNRESERVED #=> "\\-_.!~*'()a-zA-Z\\d"
URI::PATTERN::RESERVED #=> ";/?:#&=+$,\\[\\]"

require 'uri'
url = "file1:abc.txt"
p URI.encode_www_form_component url
--output:--
"file1%3Aabc.txt"
p URI(URI.encode_www_form_component url)
--output:--
#<URI::Generic:0x000001008abf28 URL:file1%3Aabc.txt>
p URI(URI.encode url, ":")
--output:--
#<URI::Generic:0x000001008abcd0 URL:file1%3Aabc.txt>
Why doesn't encode work on colon?
Because encode/escape is broken.

Use Addressable::URI::encode
require "addressable/uri"
a = 'some file name.txt'
Addressable::URI.encode(Addressable::URI.encode(a))
# => "some%2520file%2520name.txt"
b = 'some filename with :colon in it.txt'
Addressable::URI.encode(Addressable::URI.encode(b))
# => "some%2520filename%2520with%2520:colon%2520in%2520it.txt"

The problem seems to be the empty space preceding the colon, 'lol :lol.txt' don't work, but 'lol:lol.txt' works.
Maybe you could replace the spaces for something else.

If you want to escape special character from the given string. It is best to use
esc_uri=URI.escape("String with special character")
The result string is URI escaped string and safe to pass it to URI.
Refer URI::Escape for how to use URI escape. Hope this helps.

Related

Rails remove '\' from json response

json response
{"skill"=>"{\"dept_id\"=>\"01\", \"user_id\"=>\"001\", \"level_cd\"=>\"04_swim\", \"first_name\"=>\"rohit\", \"last_name\"=>\"patel\", \"dept_full_name\"=>\"swiming\", \"rank\"=>\"04_swim\"}, {\"dept_id\"=>\"02\", \"user_id\"=>\"002\", \"level_cd\"=>\"04_swim\", \"first_name\"=>\"ranjit\", \"last_name\"=>\"shinde\", \"dept_full_name\"=>\"running\", \"rank\"=>\"03_run\"}, {\"dept_id\"=>\"04\", \"user_id\"=>\"004\", \"level_cd\"=>\"02_jump\", \"first_name\"=>\"kedar\", \"last_name\"=>\"patil\", \"dept_full_name\"=>\"jumping\", \"rank\"=>\"02_jump\"}, {\"dept_id\"=>\"05\", \"user_id\"=>\"005\", \"level_cd\"=>\"03_run\", \"first_name\"=>\"kapil\", \"last_name\"=>\"bote\", \"dept_full_name\"=>\"Hammer\", \"rank\"=>\"03_run\"}"
How to remove only \ from this response
expected output is
"skill"=>{"dept_id"=>"01", "user_id"=>"001", "level_cd"=>"04_swim", "first_name"=>"rohit", "last_name"=>"patel", "dept_full_name"=>"swiming", "rank"=>"04_swim"}, {"dept_id"=>"02", "user_id"=>"002", "level_cd"=>"04_swim", "first_name"=>"ranjit", "last_name"=>"shinde", "dept_full_name"=>"running", "rank"=>"03_run"}, {"dept_id"=>"04", "user_id"=>"004", "level_cd"=>"02_jump", "first_name"=>"kedar", "last_name"=>"patil", "dept_full_name"=>"jumping", "rank"=>"02_jump"}, {"dept_id"=>"05", "user_id"=>"005", "level_cd"=>"03_run", "first_name"=>"kapil", "last_name"=>"bote", "dept_full_name"=>"Hammer", "rank"=>"03_run"}
There are currently no backslashes in the string. The backslash is only there because the context is within a double quoted string context.
If you want to use a double quote in double quoted string context you need to escape it with a backslash, otherwise the compiler thinks you want to end the string.
"John Doe said: "Hello Word!""
The above is not valid. The " before Hello World! will end the string. Meaning that Hello World! will not be in string context and Ruby tries to parse Hello and World as constants.
To prevent this from happening you escape the " with a backslash \.
"John Doe said: \"Hello Word!\""
\" will be interpreted as one " character. There is no backslash present within the resulting string. See the Ruby literals documentation.
When using single quotes for string delimiters there is no need to escape the double quotes (but you do need to escape single quotes). The above could also be written as:
'John Doe said: "Hello Word!"'
Similarly your data can be written as:
{"skill"=>'{"dept_id"=>"01", "user_id"=>"001", "level_cd"=>"04_swim", "first_name"=>"rohit", "last_name"=>"patel", "dept_full_name"=>"swiming", "rank"=>"04_swim"}, {"dept_id"=>"02", "user_id"=>"002", "level_cd"=>"04_swim", "first_name"=>"ranjit", "last_name"=>"shinde", "dept_full_name"=>"running", "rank"=>"03_run"}, {"dept_id"=>"04", "user_id"=>"004", "level_cd"=>"02_jump", "first_name"=>"kedar", "last_name"=>"patil", "dept_full_name"=>"jumping", "rank"=>"02_jump"}, {"dept_id"=>"05", "user_id"=>"005", "level_cd"=>"03_run", "first_name"=>"kapil", "last_name"=>"bote", "dept_full_name"=>"Hammer", "rank"=>"03_run"}'
The above clearly demonstrates that there are no backslash characters present in the string.
However the string is not JSON. I suggest changing the server response if possible. You can eval the current response, but I would advise not to use eval ever (eval is evil). If the server would send malicious Ruby code, eval will execute it without any issues and might corrupt your machine.
Looks like the hash example needs to end with an } to be valid. So I added it in my example. Further more it looks to be a collection of records, but it also looks like it's missing a list. If it were inside a list it would be valid but as the example stands now, it is not a valid hash.
But let's say just for fun, I did want to take the string and put it inside an array. Maybe something like this:
data = {"skill"=>"{\"dept_id\"=>\"01\", \"user_id\"=>\"001\", \"level_cd\"=>\"04_swim\", \"first_name\"=>\"rohit\", \"last_name\"=>\"patel\", \"dept_full_name\"=>\"swiming\", \"rank\"=>\"04_swim\"}, {\"dept_id\"=>\"02\", \"user_id\"=>\"002\", \"level_cd\"=>\"04_swim\", \"first_name\"=>\"ranjit\", \"last_name\"=>\"shinde\", \"dept_full_name\"=>\"running\", \"rank\"=>\"03_run\"}, {\"dept_id\"=>\"04\", \"user_id\"=>\"004\", \"level_cd\"=>\"02_jump\", \"first_name\"=>\"kedar\", \"last_name\"=>\"patil\", \"dept_full_name\"=>\"jumping\", \"rank\"=>\"02_jump\"}, {\"dept_id\"=>\"05\", \"user_id\"=>\"005\", \"level_cd\"=>\"03_run\", \"first_name\"=>\"kapil\", \"last_name\"=>\"bote\", \"dept_full_name\"=>\"Hammer\", \"rank\"=>\"03_run\"}"}
parsed_data = data["skill"].split("}, ").map{|x| x.end_with?("\"") ? x + '}' : x}.map{|x| eval(x)}
puts parsed_data
{"dept_id"=>"01", "user_id"=>"001", "level_cd"=>"04_swim", "first_name"=>"rohit", "last_name"=>"patel", "dept_full_name"=>"swiming", "rank"=>"04_swim"}
{"dept_id"=>"02", "user_id"=>"002", "level_cd"=>"04_swim", "first_name"=>"ranjit", "last_name"=>"shinde", "dept_full_name"=>"running", "rank"=>"03_run"}
{"dept_id"=>"04", "user_id"=>"004", "level_cd"=>"02_jump", "first_name"=>"kedar", "last_name"=>"patil", "dept_full_name"=>"jumping", "rank"=>"02_jump"}
{"dept_id"=>"05", "user_id"=>"005", "level_cd"=>"03_run", "first_name"=>"kapil", "last_name"=>"bote", "dept_full_name"=>"Hammer", "rank"=>"03_run"}
Now with the data in an array you can convert it to json if you'd like
require 'json'
2.6.5 :007 > parsed_data.to_json
=> "[{\"dept_id\":\"01\",\"user_id\":\"001\",\"level_cd\":\"04_swim\",\"first_name\":\"rohit\",\"last_name\":\"patel\",\"dept_full_name\":\"swiming\",\"rank\":\"04_swim\"},{\"dept_id\":\"02\",\"user_id\":\"002\",\"level_cd\":\"04_swim\",\"first_name\":\"ranjit\",\"last_name\":\"shinde\",\"dept_full_name\":\"running\",\"rank\":\"03_run\"},{\"dept_id\":\"04\",\"user_id\":\"004\",\"level_cd\":\"02_jump\",\"first_name\":\"kedar\",\"last_name\":\"patil\",\"dept_full_name\":\"jumping\",\"rank\":\"02_jump\"},{\"dept_id\":\"05\",\"user_id\":\"005\",\"level_cd\":\"03_run\",\"first_name\":\"kapil\",\"last_name\":\"bote\",\"dept_full_name\":\"Hammer\",\"rank\":\"03_run\

Tidy long string in Ruby

I have a method in Ruby, which needs an API URL:
request_url = "http://api.abc.com/v3/avail?rev=#{ENV['REV']}&key=#{ENV['KEY']}&locale=en_US&currencyCode=#{currency}&arrivalDate=#{check_in}&departureDate=#{check_out}&includeDetails=true&includeRoomImages=true&room1=#{total_guests}"
I want to format it to be more readable. It should take arguments.
request_url = "http://api.abc.com/v3/avail?
&rev=#{ENV['REV']}
&key=#{ENV['KEY']}
&locale=en_US
&currencyCode=#{currency}
&arrivalDate=#{check_in}
&departureDate=#{check_out}
&includeDetails=true
&includeRoomImages=true
&room1=#{total_guests}"
But of course there's line break. I tried heredoc, but I want it to be in one line.
I would prefer to not build URI queries by joining strings, because that might lead to URLs that are not correctly encoded (see a list of characters that need to be encoded in URIs).
There is the Hash#to_query method in Ruby on Rails that does exactly what you need and it ensure that the parameters are correctly URI encoded:
base_url = 'http://api.abc.com/v3/avail'
arguments = {
rev: ENV['REV'],
key: ENV['KEY'],
locale: 'en_US',
currencyCode: currency,
arrivalDate: check_in,
departureDate: check_out,
includeDetails: true,
includeRoomImages: true,
room1: total_guests
}
request_url = "#{base_url}?#{arguments.to_query}"
You could use an array and join the strings:
request_url = [
"http://api.abc.com/v3/avail?",
"&rev=#{ENV['REV']}",
"&key=#{ENV['KEY']}",
"&locale=en_US",
"&currencyCode=#{currency}",
"&arrivalDate=#{check_in}",
"&departureDate=#{check_out}",
"&includeDetails=true",
"&includeRoomImages=true",
"&room1=#{total_guests}",
].join('')
Even easier, you can use the %W array shorthand notation so you don't have to write out all the quotes and commas:
request_url = %W(
http://api.abc.com/v3/avail?
&rev=#{ENV['REV']}
&key=#{ENV['KEY']}
&locale=en_US
&currencyCode=#{currency}
&arrivalDate=#{check_in}
&departureDate=#{check_out}
&includeDetails=true
&includeRoomImages=true
&room1=#{total_guests}
).join('')
Edit: Of course, spickermann makes a very good point above on better ways to accomplish this specifically for URLs. However, if you're not constructing a URL and just working with strings, the above methods should work fine.
You can extend strings in Ruby using the line continuation operator. Example:
request_url = "http://api.abc.com/v3/avail?" \
"&rev=#{ENV['REV']}" \
"&key=#{ENV['KEY']}"

ruby regex with quotes

I'm trying to pass more than one regex parameter for parts of a string that needs to be replaced. Here's the string:
str = "stands in hall "Let's go get to first period everyone" Students continue moving to seats."
Here is the expected string:
str = "stands in hall "Let's go get to first period everyone" Students continue moving to seats."
This is what I tried:
str.gsub(/'|"/, "'" => "\'", """ => "\"")
This is what I got:
"stands in hall \"Let's go get to first period everyone\" Students continue moving to seats."
How do I get the quotes in while sending in two regex parameters using gsub?
This is an HTML unescaping problem.
require 'cgi'
CGI.unescape_html(str)
This gives you the correct answer.
From my comments on this question:
Your updated version is correct. The only reason the slashes are in your final line of code is that it's an escape sequence so that you don't mistakenly think the first slash is used to terminate the string. Try assigning your output and printing it:
str1 = str.gsub(/'|"/, "'" => "\'", """ => "\"")
puts str1
and you'll see that the slashes are gone when str1 is printed using puts.
The difference is that autoevaluating variables within irb (which is what I assume you're doing to execute this sample code) automatically calls the inspect method, which for string variables shows the string in its entirety.
Because I did not understand unescaping characters I found an alternative solution that might be the "rails-way"
Can you use <%= raw 'some_html' %>
My final solution ended up being this instead of messy regex and requiring CGI
<%= raw evidence_score.description %>
Unescaping HTML string in Rails

Regex in Ruby: expression not found

I'm having trouble with a regex in Ruby (on Rails). I'm relatively new to this.
The test string is:
http://www.xyz.com/017010830343?$ProdLarge$
I am trying to remove "$ProdLarge$". In other words, the $ signs and anything between.
My regular expression is:
\$\w+\$
Rubular says my expression is ok. http://rubular.com/r/NDDQxKVraK
But when I run my code, the app says it isn't finding a match. Code below:
some_array.each do |x|
logger.debug "scan #{x.scan('\$\w+\$')}"
logger.debug "String? #{x.instance_of?(String)}"
x.gsub!('\$\w+\$','scl=1')
...
My logger debug line shows a result of "[]". String is confirmed as being true. And the gsub line has no effect.
What do I need to correct?
Use /regex/ instead of 'regex':
> "http://www.xyz.com/017010830343?$ProdLarge$".gsub(/\$\w+\$/, 'scl=1')
=> "http://www.xyz.com/017010830343?scl=1"
Don't use a regex for this task, use a tool designed for it, URI. To remove the query:
require 'uri'
url = URI.parse('http://www.xyz.com/017010830343?$ProdLarge$')
url.query = nil
puts url.to_s
=> http://www.xyz.com/017010830343
To change to a different query use this instead of url.query = nil:
url.query = 'scl=1'
puts url.to_s
=> http://www.xyz.com/017010830343?scl=1
URI will automatically encode values if necessary, saving you the trouble. If you need even more URL management power, look at Addressable::URI.

Why does this regex check return true for this string?

I need a regex that will determine if a string is a tweet URL. I've got this
Regexp.new(/http:|https:\/\/(twitter\.com\/.*\/status\/.*|twitter\.com\/.*\/statuses\/.*|www\.twitter\.com\/.*\/status\/.*|www\.twitter\.com\/.*\/statuses\/.*|mobile\.twitter\.com\/.*\/status\/.*|mobile\.twitter\.com\/.*\/statuses\/.*)/i)
Why does it return true for the following?
"http://i.stack.imgur.com/QdOS0.jpg".match(Regexp.new(/http:|https:\/\/(twitter\.com\/.*\/status\/.*|twitter\.com\/.*\/statuses\/.*|www\.twitter\.com\/.*\/status\/.*|www\.twitter\.com\/.*\/statuses\/.*|mobile\.twitter\.com\/.*\/status\/.*|mobile\.twitter\.com\/.*\/statuses\/.*)/i))? true : false
=> true
http: will always match a URL starting with http:
Try the following:
/https?:\/\/(twitter\.com\/.*\/status\/.*|twitter\.com\/.*\/statuses\/.*|www\.twitter\.com\/.*\/status\/.*|www\.twitter\.com\/.*\/statuses\/.*|mobile\.twitter\.com\/.*\/status\/.*|mobile\.twitter\.com\/.*\/statuses\/.*)/i
The question mark will make the s optional, thus matching http or https.
Your regex could be abbreviated like :
#^https?://(:?www\.|mobile\.)?twitter\.com/.*?/status(:?es)?/.*#i
explanation:
# regex delimiter
^ start of line
https? http or https
:// ://
(:? start of non capture group
www\.|mobile\. www. or mobile.
)? end of group
twitter\.com/ twitter.com
.*? any number of any char not greedy
/status /status
(:?es)? non capture group that contains possibly `es`
/.* / followed by any number of any char
$ end of string
#i delimiter and case insensitive
No need for regular expressions here (as usual).
require 'uri'
uri = URI.parse("http://www.twitter.com/status/12345")
p uri.host.split('.')[-2] == 'twitter' # returns true
More docs at: http://ruby-doc.org/stdlib/
You should group your OR-Clauses, like this:
(http:|https:)
Additionally, it wouldn't hurt to specify beginning and end of it:
^(http:|https:).*$
The start of your regex specifies an option of just 'http:', which naturally matches the URL you are testing. Depending on how strict you need your check to be, you could just remove the http/https parts from the start of the regex.
While many other answers show you a better regex, the answer is because /foo|bar/ will match either foo or bar, and what you wrote was /http:|.../, hence all URLs will be matched.
See #giraff's answer for how you could have written the alternation to do what you expect, or #M42's or #Koraktor's answers for a better regexp.
And as posted in the comments, note that you can write a regex literal as %r{...} instead of /.../, which is nice when you want to use / characters in your regex without escaping them.

Resources