I'm facing some problems with UTF-8 and ActionMailer. My application has a form (contact) that when it is submitted, it sends an email to me. The problem is that when somebody enters some chars like öäüß, I receive the message encoded like for example
=?UTF-8?Q?funktioniert_oder_nicht.=0D=0A=0D=0Ameine_Stra=C3=9Fe_ist_die?=
=?UTF-8?Q?_Bratwurststra=C3=9Fe=0D=0A=0D=0A=C3=B6=C3=A4?=
As I understand, ActionMailer per default is utf-8 ready. Analyzing the log from my server, when the form is submitted, the params are already well encoded (it means I can read the äüö in my log)
Any idea about what should I change? should I change my application to support ISO-8859-1?
environment: ruby 1.9 and rails 3.1
You are getting the UTF-8 bytes escaped with quoted-printable.
ß is "\xC3\x9F" -> "=C3=9F"
String#unpack('M') will decode it:
$ ruby -e 'puts "Bratwurststra=C3=9Fe=0D=0A=0D=0A=C3=B6=C3=A4".unpack "M"'
Bratwurststraße
öä
`
Related
I see this a lot and haven't figured out a graceful solution. If user input contains invalid byte sequences, I need to be able to have it not raise an exception. For example:
# #raw_response comes from user and contains invalid UTF-8
# for example: #raw_response = "\xBF"
regex.match(#raw_response)
ArgumentError: invalid byte sequence in UTF-8
Numerous similar questions have been asked and the result appears to be encoding or force encoding the string. Neither of these work for me however:
regex.match(#raw_response.force_encoding("UTF-8"))
ArgumentError: invalid byte sequence in UTF-8
or
regex.match(#raw_response.encode("UTF-8", :invalid=>:replace, :replace=>"?"))
ArgumentError: invalid byte sequence in UTF-8
Is this a bug with Ruby 2.0.0 or am I missing something?
What is strange is it appear to be encoding correctly, but match continues to raise an exception:
#raw_response.encode("UTF-8", :invalid=>:replace, :replace=>"?").encoding
=> #<Encoding:UTF-8>
In Ruby 2.0 the encode method is a no-op when encoding a string to its current encoding:
Please note that conversion from an encoding enc to the same encoding enc is a no-op, i.e. the receiver is returned without any changes, and no exceptions are raised, even if there are invalid bytes.
This changed in 2.1, which also added the scrub method as an easier way to do this.
If you are unable to upgrade to 2.1, you’ll have to encode into a different encoding and back in order to remove invalid bytes, something like:
if ! s.valid_encoding?
s = s.encode("UTF-16be", :invalid=>:replace, :replace=>"?").encode('UTF-8')
end
Since you're using Rails and not just Ruby you can also use tidy_bytes. This works with Ruby 2.0 and also will probably give you back sensible data instead of just replacement characters.
I'm currently trying to ensure that my rails application handles POST methods with invalid UTF-8 data gracefully (e.g. \xFF) (using rack middleware)
Unfortunately writing tests for this is proving extremely difficult.
One thing I've tried is using Capybara to fill in one of my form fields with invalid UTF-8 and submit, however this causes the following output in the terminal amongst my test output - and it's not being printed by Rails!
error : string is not in utf-8
Is there another way that a POST containing invalid UTF-8 data can be emulated in order to validate that a 400 error (or similar) is displayed?
NB: I'm trying to avoid having to run against a separate running instance of the application (e.g. using 'curl' against it), but just run directly with Capybara (or similar)
You could try posting a file attachment which contains invalid utf-8 data, although this would also depend on your form itself. However, as you're testing a rather obscure edge-case, you could always create a form that is only accessible in dev/test environments, with the route also only available for testing.
This would at least allow you to target the code that handles the processing of the invalid utf-8 data in a safe, test-only way.
What do you mean by filling the form with invalid UTF-8? The characters you fill in the form do not have any encoding, they are encoded when the form is sent. This sentence would make sense for some encodings that cannot encode all characters out there but UTF-8 can.
If you want to send the byte \xFF to the server from a browser, it's as easy as pulling out the developer tools of that browser, editing the form's attributes to accept-charset="ISO-8859-1" writing ÿ somewhere on the form and pressing send. The ÿ will get encoded as a %FF which cannot be decoded as UTF-8:
As detailed in this ThoughtBot blog post, this worked for me in a unit test:
"hello joel\255".force_encoding('UTF-8')
Not sure how you'd convince capybara to do this though.
You should be able to do this by constructing a post with rack-test as opposed to using capybara.
For example the following request spec (spec/requests/utf8_spec.rb):
require 'spec_helper'
describe "Invalid UTF-8" do
it "handles invalid UTF-8 in post data gracefully" do
post "/users", :user => {:name => "Test \xFF boom"}
end
end
Produces the following when run:
1) Invalid UTF-8 handles invalid UTF-8 in post data gracefully
Failure/Error: post "/users", :user => {:name => "Test \xFF boom"}
ArgumentError:
invalid byte sequence in UTF-8
# ./app/controllers/users_controller.rb:17:in `create'
# ./spec/requests/utf8_spec.rb:5:in `block (2 levels) in <top (required)>'
Hi I have an oracle database that stores some data. It contains some non english text as well, e.g. “TEST”. The quote is not the english quote ". The problem is when I retrieve it from Rails 2.2.2 (Ruby 1.8.7), this database model's field value returns question marks in the erg views, so “TEST” becomes ?TEST?. However under Rails 3, it is showing correctly.
The code in the erb that displays the value is
User.first.description
I do set the encoding in the database.yml with the following but does not help
encoding: UTF8
collation: utf8_unicode_ci
Could it because Ruby 1.9 handles the encoding better than Ruby 1.8. Is there a way to fix this problem?
Yes, ruby 1.9 handles encoding different than 1.8. Also, Rails 3 makes encoding easier by trying to make sure everything is in UTF-8.
Most likely your problem is that the string was encoded with Latin-1 and Rails 2 tries to read it as a UTF-8 encoding. There are several monkey patches online that you can try for your database, or you can do a one time script to re-encode all of the fields in your database.
I recommend reading this, for further understanding on how encoding works with Ruby: Encodings, Unabridged (by Yehuda Katz)
I just randomly got this strange error via Rails 3, on heroku (postgres)
PGError: ERROR: invalid byte sequence for encoding "UTF8": 0x85 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". : INSERT INTO "comments" ("content") VALUES ('BTW∑I re-listened to the video' ......
The hint while nice isn't making anything click for me. Can I set encoding somewhere? Should I even mess with that? Anyone seen this and/or have any ideas on how to deal with this type of issue?
Thank you
From what I can gather, this is a problem where the string you're trying to insert into your PostgrSQL server isn't encoded with UTF-8. This is somewhat odd, because your Rails app should be configured to use UTF-8 by default.
There are a couple of ways you can try fix this (in order of what I recommend):
Firstly, make sure that config.encoding is set to "utf-8" in config/application.rb.
If you're using Ruby 1.9, you can try to force the character encoding prior to insertion with toutf8.
You can figure out what your string is encoded with, and manually set SET CLIENT_ENCODING TO 'ISO-8859-1'; (or whatever the encoding is) on your PostgeSQL connection before inserting the string. Don't forget to do RESET CLIENT_ENCODING; after the statement to reset the encoding.
If you're using Ruby 1.8 (which is more likely), you can use the iconv library to convert the string to UTF-8. See documentation here.
A more hackish solution is to override your getters and setters in the model (i.e. content and content=) encode and decode your string with Base64. It'd look something like this:
require 'base64'
class Comment
def content
Base64::decode64(self[:content])
end
def content=(value)
self[:content] = Base64::encode64(value)
end
end
text.force_encoding(charset).encode("UTF-8")
http://blog.zenlike.me/2013/04/06/sendgrid-parse-incoming-email-encoding-errors-for-rails-apps-using-postgresql/
I'm trying to migrate my app to Ruby 1.9, however ActiveRecord keeps retrieving records out of my MySQL database with an ASCII encoding, causing "incompatibility between utf-8 and ASCII" like errors. I've tried setting the "encoding: utf-8" in the database.yml file, and I've also tried putting " #coding: utf-8 " at the top the errant file with no luck. I thought it might be that the fields in my database were the issue, but even after converting everything over to utf-8, the incompatibility errors still exist.
Is there perhaps something else in MySQL that defines the encoding to ActiveRecord that I am missing here?
Apparently an issue with Ruby 1.9 and the mysql gem. See this question.
Should be solved by using the mysql2 gem.