%dd, %ff .... invalid byte sequence in UTF-8 - ruby-on-rails

In my rails app, when I add %dd or %ff in url parameter, why it returns invalid byte sequence in UTF-8?
I have a regex ^[a-zA-Z0-9_]+$ to catch if string includes letters + numbers + underscores only. Then when I add %dd, or %ff in my url parameter, it returns invalid byte sequence in UTF-8 error.
What does %dd and %ff means?
UPDATE:
My controller:
def search
regex = '^[a-zA-Z0-9_]+$'
#search = params[:search]
unless #search.match(alpha_num_under_regex).nil?
#users = User.find_by_name(#search)
render 'api/v1/users/show', status: 200, formats: :json
else
#users = []
render 'api/v1/users/show', status: 422, formats: :json
end
My URL:
localhost:3000/api/v1/users/show?search=%dd
When params search=%d it return Bad Request which is ok. But when I added another d, search=%dd or search=a%dd, it returns Action Controller: Exception caught - invalid byte sequence in UTF-8.
The question is, how can I pass invalid byte sequence in UTF-8 error?

From Wiki:
Percent-encoding, also known as URL encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI) under certain circumstances. Although it is known as URL encoding it is, in fact, used more generally within the main Uniform Resource Identifier (URI) set, which includes both Uniform Resource Locator (URL) and Uniform Resource Name (URN). As such, it is also used in the preparation of data of the application/x-www-form-urlencoded media type, as is often used in the submission of HTML form data in HTTP requests.
The query search=%dd is according to above treated/interpreted as search=<BYTE_WITH_ORD_VALUE_0xDD>. Ruby expects this string to be UTF-8, but 0xDD is not a valid UTF-8 symbol.
To avoid this problem and pass what was intended, one should URL-escape the search query explicitly by substituting % ⇒ %25 (the latter is apparently the percent-encoded percent sign itself.)
localhost:3000/api/v1/users/show?search=%25dd
the above will send %dd query to rails.
NB to be safe, one should build url queries according to the common rule, specified in the article linked above:
[List of reserved characters]
Other characters in a URI must be percent encoded.

Related

Rails how to response JSON in ISO-8859-1

I want my app to response with body utf-8 and iso-8859-1 encoded
per requests with Accept-Charset="utf-8" or Accept-Charset="iso-8859-1".
The response body is always JSON.
In my controller, when I doing this
render(json: data, status: :created)
It response with Content-Type="application/json; charset=utf-8" as well.
But how to make a response with body iso-8859-1 encoded when request Accept-Charset="iso-8859-1"?
In order to do this, you can use the method force_encoding and encoding for example
data = {'name'=>'raghav'}.to_json
data.encoding #This would return what encoding the value as #<Encoding:UTF-8>
new_data = data.force_encoding('ISO-8859-1') #This would force the encoding
new_data.encoding #<Encoding:ISO-8859-1>
Also to do this on the specific case you can always read the request.headers hash to determine the encoding.
There is also another method called encode the main difference between these are force_encoding changes the way string is being read from bytes, and encode changes the way string is written without changing the output (if possible)

Generating Oauth authorization token using base64 encoding

I am trying to follow the guide to generate Oauth authentication tokens for YAHOO DSP API.
Base64 encoding is a way of encoding binary data into text so that it can be easily transmitted across a network without error.
In this step, you will take the client ID and client secret that the YDN console generated for you and encode them using the base64 protocol. You can use an online encoding service like base64encode.org.
No matter which service you use, ensure that no spaces are appended to the CLIENT_ID and CLIENT_SECRET keys and separate the CLIENT_ID and CLIENT_SECRET with a colon, i.e. CLIENT_ID:CLIENT_SECRET.
The generated value will now be referenced as ENCODED(CLIENT_ID:CLIENT_SECRET) in this guide.
An example is given:
CLIENT_ID = dj0yJmk9N2pIazlsZk1iTzIxJmQ9WVdrOWVEUmpVMFpWTXpRbWNHbzlNQS0tJnM9Y29uc3VtZXJzZWNyZXQmeD00NA–
CLIENT_SECRET= a7e13ea3740b933496d88755ff341bfb824805a6
AUTHORIZATION = ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5BLS06YTdlMTNlYTM3NDBiOTMzNDk2ZDg4NzU1ZmYzNDFiZmI4MjQ4MDVhNg==
Using the recommended website I get the wrong AUTHORIZATION.
I have tried both encoding the whole thing at once ie. encode(CLIENT_ID:CLIENT_SECRET), and each element individually encode(CLIENT_ID):encode(CLIENT_SECRET).
Attempt encoding whole thing:
ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5B4oCTOiBhN2UxM2VhMzc0MGI5MzM0OTZkODg3NTVmZjM0MWJmYjgyNDgwNWE2
Attempt encoding each element:
ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5B4oCT:YTdlMTNlYTM3NDBiOTMzNDk2ZDg4NzU1ZmYzNDFiZmI4MjQ4MDVhNg==
Expected result:
ZGoweUptazlOMnBJYXpsc1prMWlUekl4Sm1ROVdWZHJPV1ZFVW1wVk1GcFdUWHBSYldOSGJ6bE5RUzB0Sm5NOVkyOXVjM1Z0WlhKelpXTnlaWFFtZUQwME5BLS06YTdlMTNlYTM3NDBiOTMzNDk2ZDg4NzU1ZmYzNDFiZmI4MjQ4MDVhNg==
The difference between 'each element' and the expected result is only a few characters corresponding to the end of client_ID and the colon.
B4oCT: should be BLS06.
Links to full documentation:
https://developer.yahoo.com/dsp/api/docs/authentication/tokens.html
https://developer.yahoo.com/dsp/api/docs/traffic/info/sandbox.html
Update:
The final character of Client_ID is '–' . This is some sort of non-standard character that is interpreted as two dashes i.e.'--' in utf-8 and windows 1258.
One different, TO NOTE is, that when you decrypt the expected output you will get your client id as
dj0yJmk9N2pIazlsZk1iTzIxJmQ9WVdrOWVEUmpVMFpWTXpRbWNHbzlNQS0tJnM9Y29uc3VtZXJzZWNyZXQmeD00NA--
instead of
dj0yJmk9N2pIazlsZk1iTzIxJmQ9WVdrOWVEUmpVMFpWTXpRbWNHbzlNQS0tJnM9Y29uc3VtZXJzZWNyZXQmeD00NA–
NOTE, there are two "-" at the end.
OAuth client auth token is always generated using Base64 encoding with following format
Base64_Encoding(CLIENT_ID:CLIENT_SECRET)
Most of the usage perform this Base64 encoding with encoding type as "UTF-8".
It looks like, Yahoo requires this token with different encoding. On "https://www.base64encode.org/" if you try to encode your "CLIENT_ID:CLIENT_SECRET" with "Windows-1254" as destination charset, you will receive the expected result. So, it looks like both encoding and decoding here is done keeping "Windows-1254" charset in place.

JSON.parse appends , to string and then complains invalid token at ',

I'm parsing a webpage with nokogiri and then iterating through css selectors until I find the I'm looking for Then I run a regex to match the javascript portion only, and then try to parse it with JSON.parse but that returns error invalid token at ',{ ... If I run puts on the matched data it shows it without the prepended comma but the error occurs when I run JSON.parse JSON::ParserError: 822: unexpected token at ',{"skuAttr":"200007763:201336106;491:200004763#145cm","skuPropIds":"
file=File.open('product.html')
doc=Nokogiri::HTML.parse(file)
doc.css("script").each do |page|
if page.text=~/skuProducts/
skudata = page.text[/var skuProducts=\[(.+?)\];/, 1]
puts skudata
parsed = JSON.load(skudata)
end
end
If you're consistently seeing that comma prefixed and the rest of the JSON string looks valid... then why not just remove that leading comma and then try the JSON parse?

Swapping Rails 4 ParamsParser removes params body

I'm trying to follow this solution to add a params parser to my rails app, but all that happens is that I now get the headers but no parameters from the body of the JSON request at all. In other words, calling params from within the controller returns this:
{"controller"=>"residences", "action"=>"create",
"user_email"=>"wjdhamilton#wibble.com",
"user_token"=>"ayAJ8kDUKjCiy1r1Mxzp"}
but I expect this as well:
{"data"=>{"type"=>"residences",
"attributes"=>{"name-number"=>"The Byre",
"street"=>"Next Door",
"town"=>"Just Dulnain Bridge",
"postcode"=>"PH1 3SY",
"country-code"=>""},
"relationships"=>{"residence-histories"=>{"data"=>nil},
"occupants"=>{"data"=>nil}}}}
Here is my initializer, which as you can see is almost identical to the one in the other post:
Rails.application.config.middleware.swap(
::ActionDispatch::ParamsParser, ::ActionDispatch::ParamsParser,
::Mime::Type.lookup("application/vnd.api+json") => Proc.new { |raw_post|
# Borrowed from action_dispatch/middleware/params_parser.rb except for
# data.deep_transform_keys!(&:underscore) :
data = ::ActiveSupport::JSON.decode(raw_post)
data = {:_json => data} unless data.is_a?(::Hash)
data = ::ActionDispatch::Request::Utils.deep_munge(data)
# Transform dash-case param keys to snake_case:
data = data.deep_transform_keys(&:underscore)
data.with_indifferent_access
}
)
Can anyone tell me where I'm going wrong? I'm running Rails 4.2.7.1
Update 1: I decided to try and use the Rails 5 solution instead, the upgrade was overdue anyway, and now things have changed slightly. Given the following request:
"user_email=mogwai%40balnaan.com
&user_token=_1o3Kpzo4gTdPC2bivy
&format=json
&data[type]=messages&data[attributes][sent-on]=2014-01-15
&data[attributes][details]=Beautiful+Shetland+Pony
&data[attributes][message-type]=card
&data[relationships][occasion][data][type]=occasions
&data[relationships][occasion][data][id]=5743
&data[relationships][person][data][type]=people
&data[relationships][person][data][id]=66475"
the ParamsParser middleware only receives the following hash:
"{user":{"email":"mogwai#balnaan.com","password":"0h!Mr5M0g5"}}
Whereas I would expect it to receive the following:
{"user_email"=>"mogwai#balnaan.com", "user_token"=>"_1o3Kpzo4gTdPC2b-ivy", "format"=>"5743", "data"=>{"type"=>"messages", "attributes"=>{"sent-on"=>"2014-01-15", "details"=>"Beautiful Shetland Pony", "message-type"=>"card"}, "relationships"=>{"occasion"=>{"data"=> "type"=>"occasions", "id"=>"5743"}}, "person"=>{"data"=>{"type"=>"people", "id"=>"66475"}}}}, "controller"=>"messages", "action"=>"create"}
The problem was caused by the tests that I had written. I had not added the Content-Type to the requests in the tests, and had not explicitly converted the payload to JSON like so (in Rails 5):
post thing_path, params: my_data.to_json, headers: { "Content-Type" => "application/vnd.api+json }
The effects of this were twofold: Firstly, since params parsers are mapped to specific media types then withholding the media type meant that rails assumed its default media type (in this case application/json) so the parser was not used to process the body of the request. What confused me was that it still passed the headers to the parser. Once I fixed that problem, I was then faced with the body in the format of the request above. That is where the explicit conversion to JSON is required. I could have avoided all of this if I had just written accurate tests!

encrypted query string in emails rails

My app sends out an email with a URL in it. The url contains a query string attribute that is encrypted. I CGI escaped the encrypted value so that symbols like + * . etc are escaped. The escaped URL appears in the email as expected, but when we click on the link, the encrypted values are decrypted.
For Example, the url in the email is as follows
http://development.com/activate/snJAmJxkMo3WZ1sG27Aq?album_id=2&email=5M%2BjE1G6UB26tw/Ah%2Bzr1%2BJSSxeAoP6j&owner_id=4
email=5M%2BjE1G6UB26tw/Ah%2Bzr1%2BJSSxeAoP6j
when we click on this link the url in the browser appears as
http://development.com/activate/snJAmJxkMo3WZ1sG27Aq?album_id=2&email=5M+jE1G6UB26tw/Ah+zr1+JSSxeAoP6j&owner_id=4
email=5M+jE1G6UB26tw/Ah+zr1+JSSxeAoP6j
The + is substituted with space. As a result
params[:email] = 5M jE1G6UB26tw/Ah zr1 JSSxeAoP6j
which gives me a 404.
Is there any way I can avoid this situation. How can I make the url in the browser also appear as
http://development.com/activate/snJAmJxkMo3WZ1sG27Aq?album_id=2&email=5M%2BjE1G6UB26tw/Ah%2Bzr1%2BJSSxeAoP6j&owner_id=4
in the browser?
In order to avoid this situation I Hex encoded the email attribute so that the it contains only alphabets and numbers. Used these are the methods to Hex encode and decode.
convert string2hex:
def hexdigest_to_string(string)
string.unpack('U'*string.length).collect {|x| x.to_s 16}.join
end
convert hex2string
def hexdigest_to_digest(hex)
hex.unpack('a2'*(hex.size/2)).collect {|i| i.hex.chr }.join
end

Resources