Rails, sending mail to an address with accented characters - ruby-on-rails

I am sending emails via Rails, ActionMailer, 1.9 Ruby and rails 3.0
All is good, I am sending emails with accented characters in subject lines and body, without issue. My charset default is UTF-8.
However when I try to send an email to an address containing accented characters it is failing miserably. I first had errors about the email address being invalid and it needing to be fully qualified.
To get around that, I needed to specify the email address in the format '"" '.
However it is sending now, but the characters in the address on the mail client, appear as =?UTF-8?Q?.... which is correct, Rails is rightly encoding my UTF8 address into the header for me.
BUT
My mail client is not recognising this in its display, so it renders all garbled on screen. garbled as in the actual text =?UTF-8?Q?.... appears in the "To" field on the client.
The encoding is UTF8 etc. charset is UTF8, Transfer Encoding is quotable printable.
What am I missing? It is doing my head in!
Also, as a test, I sent an email from my mac mail client to an address with accented characters. This renders fine in my client, however the headings are totally different... as in the charset is an iso, the transfer encoding is base64.... so I am thinking I need to somehow change actionmailer to encode my mails differently? i.e. using iso and base64 encoding to get it to play nice?
I tried this but to no avail. I am either doing it wrong or completely missing the point here? From readong the various forums and sites on this, I need to encode the header fields in a certain way, but I am failing to find the answers I need to tell me exactly what that encoding is and more specifically how can I do this in Rails?
Please help! :-)

finally solved this, if you wrap the local part of the email in quotes, and leave the domain part unquoted it works a treat. Seems like Mailer is encoding the full email address if you dont wrap in quotes, and hence breaks the encoding over to the server.
e.g.
somébody#here.com wont work
where as
"somébody"#here.com will work
routes through fine and displays fine in all clients.

Currently not all mail servers support UTF-8 email addresses (aka SMTPUTF8 ) a lot of them will do crazy things (even content malformation of headers). Can you check to ensure that your encoding header made it all the way through the mail server and wasn't ripped out?
The MTA would have to support RFC6530 to support UTF-8 addresses so it may not be your applications fault.

Related

AFNetworking iOS JSON Parsing incorrect in just Lebanon

My application has a weird problem. I have a login webservice which is used to authenticate the users, it works well for everyone except for a tester who is in Lebanon. For her, the request always fails. It turns out that the json response is not getting parsed for her.
My first guess was that her network place is using a proxy server that converts json to html, so I asked her to switch to cellular network but this isn't solving the problem either.
Please refer to the debug message in the screenshot below.
Any suggestions on what must be wrong will be greatly helpful.
You'd really need the exact data that was received. JSON parsing is totally independent of any localisation. On the other hand, whatever service produced the JSON data may not. There is a good chance that being in the Lebanon, that customer receives non-ASCII data (which should be just fine), while other customers don't. It is possible that the server sends that data not in UTF-8 but say in some Windows encoding. That would be fine for ASCII but not for non-ASCII data. Or it could be that the server figures out that full UTF-8 is needed and not ASCII, and transmits a byte order marker, which is not legal JSON and would likely create the error message that you received.
To reproduce, I'd try to set up things so that non-ASCII data would be used. For example a username or password with non-ASCII data.

Rails: need advice about slug, URL, and CJK characters

I'm actually building a multi-language application which will support at least English and Japanese.
The application must be able to have URIs such as domain.com/username-slug. While this works fine with Latin characters, it does not (or rather, it looks ugly) using Japanese characters : domain.com/三浦パン屋
I was thinking of using a random number when the username is Japanese, such as :
def generate_token
self.slug = loop do
random_token = SecureRandom.uuid.gsub("-", "").hex.to_s[0..8]
break random_token unless self.class.exists?(slug: random_token)
end
end
But I don't know if this is such a good idea. I am looking for advice from people who have already faced this issue/case. Thoughts?
Thanks
TL;DR summary:
Use UTF-8 everywhere
For URIs, percent-escape all characters except the few permitted in URLs
Encourage your customers to use browsers which behave well with UTF-8 URLs
Here's the more detailed explanation. What you are after is a system of URLs for your website which have five properties:
When displayed in the location bar of your user's browser, the URLs are legible to the user and in the user's preferred language.
When the user types or pastes the legible text in their preferred language into the location bar of their browser, the browser forms a URL which your site's HTTP server can interpret correctly.
When displayed in a web page, the URLs are legible to the user and in the user's preferred language.
When supplied as a link target in an HTML link, forms a URL which the user's web browser can correctly send to your site, and which your site's HTTP server can interpret correctly
When your site's HTTP server receives these URLs, it passes the URL to your application in a way the application can interpret correctly.
RFC 3986 URI Generic Syntax, section 2 Characters says,
This specification does not mandate any particular character encoding
for mapping between URI characters and the octets used to store or
transmit those characters.... A percent-encoding mechanism is used to
represent a data octet in a component when that octet's corresponding
character is outside the allowed set or is being used as a
delimiter...
The URIs in question are http:// URIs, however, so the HTTP spec also governs. RFC 2616 HTTP/1.1, Section 3.4 Character Sets, says that the encoding (there named 'character set', for consistency with the MIME spec) is specified using MIME's charset tags.
What that boils down to is that the URIs can be in a wide variety of encodings, but you are responsible for being sure your web site code and your HTTP server agree on what encoding you will use. The HTTP protocol treats the URIs largely as opaque octet streams. In practice, UTF-8 is a good choice. It covers the entire Unicode character repertoire, it's an octet-based encoding, and it's widely supported. Percent-encoding is straightforward to add and remove, for instance by Ruby's URI::Escape method.
Let's turn next to the browser. You should find out with what browsers your users are visiting your site. Test the URL handling of these browsers by pasting in URLs with Japanese language path elements, and seeing what URL your web server presents to your Ruby code. My main browser, Firefox 16.0.2 on Mac OS X, interprets characters pasted into its location bar as UTF-8, and uses that encoding plus percent-escaping when passing the URL to an HTTP request. Similarly, when it encounters a URL for an HTTP page which has non-latin characters, it removes the percent encoding of the URL and treats the resulting octets as if they were UTF-8 encoded. If the browsers your users favour behave the same way, then UTF-8 URLs will appear in Japanese to your users.
Do your customers insist on using browsers that don't behave well with percent-encoded URLs and UTF-8 encoded URL parts? Then you have a problem. You might be able to figure out some other encoding which the browsers do work well with, say Shift-JIS, and make your pages and web server respect that encoding instead. Or, you might try encouraging your users to switch to browser which support UTF-8 well.
Next, let's look at your site's web pages. Your code has control over the encoding of the web pages. Links in your pages will have link text, which of course can be in Japanese, and a link target, which must be in some encoding comprehensible to your web server. UTF-8 is a good choice for the web page's encoding.
So, you don't absolutely have to use UTF-8 everywhere. The important thing is that you pick one encoding that works well in all three parts of your ecosystem: the customers' web browsers, your HTTP server, and your web site code. Your customers control one part of this ecosystem. You control the other two.
Encode your URL-paths ("username-slugs") in this encoding, then percent-escape those URLs. Author and code your pages to use this encoding. The user experience should then satisfy the five requirements above. And I predict that UTF-8 is likely to be a good encoding choice.

Using MFMailComposeViewController with UTF-8 domains and/or usernames

I'm trying to prepopulate the MFMailComposeViewController with an email address that has a UTF-8 domain (e.g. hello#闪闪发光.com ). However, when I call setToRecepients on my object, I get a message in the console that 'hello#闪闪发光.com is not a valid email address and the email controller comes up with an empty To field. If I use the same email address and just type it in directly, I get a warning that it is not a valid email address, but I am given the option to send anyway.
Is this just something not supported? UTF-8 domain may not be too common but they're definitely out there. I tried to encode the value with stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding but that didn't do it.
Any thoughts? Anybody run into this before? Workarounds?
I also just tried this in the latest 6.0 SDK using Apple's sample code with only the To address changed to UTF-8, same result.
I posted this on the Apple dev forums but I usually get a better response here so I'm trying here too.
You can try using NSUTF16StringEncoding instead of NSUTF8StringEncoding. Further you can refer the different types of encoding

ExactTarget e-mails inconsistently rendering Korean or Chinese characters

I have a small ASP.Net MVC site that is scraping content from a client site for inclusion in an ExactTarget-generated e-mail. If a content area uses hard coded Chinese or Korean characters, the e-mails render properly on all clients. When an area calls to the MVC site, using
%%before; httpget; 1 "http://mysite/contentarea/?parm1=One&parm2=Two"%%
the resulting html being sent out doesn't render consistently on all clients. GMail handles it ok, but Yahoo and Hotmail do not. The resulting characters make it look like an encoding issue. I have the MVC site spitting out utf-8 à la
Response.ContentEncoding = System.Text.Encoding.UTF8;
This is the first time I've really had to play with the encoding, so that may be part of my problem. :-)
I've looked at the wiki at http://wiki.memberlandingpages.com/ but it has not been much help. What I'd like do is define in the AMPscript that the incoming stream from the MVC site is encoded utf-8 (or whatever). I'm assuming having things explicitly laid out should address this, but I don't know if there's something about Hotmail or Yahoo that needs to be managed somehow as well. Thanks for any help!
The only way (that I've found) to set the character encoding in Exacttarget is to request that they turn on internationalization settings for your account. Email them to let them know that you need that turned on and they should be able to sort you out soon.
Here's some documentation on it:
http://wiki.memberlandingpages.com/010_ExactTarget/020_Content/International_Sends?highlight=internationalization
You'll then have a drop-down when creating an email to specify the character encoding. I banged my head against the wall on this one for a long time before finding that documentation page. Hope that helps!
I know this is old, but just in case people are still hunting. I think beyond globalization of ET the httpget function will default to a WindowsCodePage 1252 encoding unless the source page headers specify utf-8 encoding.
http://wiki.memberlandingpages.com/010_ExactTarget/020_Content/AMPscript/AMPscript_Syntax_Guide/HTTP_AMPscript_Functions
"NOTE: ExactTarget honors any character set returned in the HTTP headers via Content-Type. For example, you can use a UTF-8 encoded HTML file with Content-Type: text/html; charset=utf-8 included in the header. If the encoding is not specified in the header, the application assumes all returned data will be in the character set WindowsCodePage 1252. You can change this default by contacting Global Support."

UTF-8 characters mangled in HTTP Basic Auth username

I'm trying to build a web service using Ruby on Rails. Users authenticate themselves via HTTP Basic Auth. I want to allow any valid UTF-8 characters in usernames and passwords.
The problem is that the browser is mangling characters in the Basic Auth credentials before it sends them to my service. For testing, I'm using 'カタカナカタカナカタカナカタカナカタカナカタカナカタカナカタカナ' as my username (no idea what it means - AFAIK it's some random characters our QA guy came up with - please forgive me if it is somehow offensive).
If I take that as a string and do username.unpack("h*") to convert it to hex, I get: '3e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a83e28ba3e28fb3e28ba3e38a8' That seems about right for 32 kanji characters (3 bytes/6 hex digits per).
If I do the same with the username that's coming in via HTTP Basic auth, I get:
'bafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaacbafbbaac'. It's obviously much shorter. Using the Firefox Live HTTP Headers plugin, here's the actual header that's being sent:
Authorization: Basic q7+ryqu/q8qrv6vKq7+ryqu/q8qrv6vKq7+ryqu/q8o6q7+ryqu/q8qrv6vKq7+ryqu/q8qrv6vKq7+ryqu/q8o=
That looks like that 'bafbba...' string, with the high and low nibbles swapped (at least when I paste it into Emacs, base 64 decode, then switch to hexl mode). That might be a UTF16 representation of the username, but I haven't gotten anything to display it as anything but gibberish.
Rails is setting the content-type header to UTF-8, so the browser should be sending in that encoding. I get the correct data for form submissions.
The problem happens in both Firefox 3.0.8 and IE 7.
So... is there some magic sauce for getting web browsers to send UTF-8 characters via HTTP Basic Auth? Am I handling things wrong on the receiving end? Does HTTP Basic Auth just not work with non-ASCII characters?
I want to allow any valid UTF-8 characters in usernames and passwords.
Abandon all hope. Basic Authentication and Unicode don't mix.
There is no standard(*) for how to encode non-ASCII characters into a Basic Authentication username:password token before base64ing it. Consequently every browser does something different:
Opera uses UTF-8;
IE uses the system's default codepage (which you have no way of knowing, other than it's never UTF-8), and silently mangles characters that don't fit into to it using the Windows ‘guess a random character that looks a bit like the one you wanted or maybe just not’ secret recipe;
Mozilla uses only the lower byte of character codepoints, which has the effect of encoding to ISO-8859-1 and mangling the non-8859-1 characters irretrievably... except when doing XMLHttpRequests, in which case it uses UTF-8;
Safari and Chrome encode to ISO-8859-1, and fail to send the authorization header at all when a non-8859-1 character is used.
*: some people interpret the standard to say that either:
it should be always ISO-8859-1, due to that being the default encoding for including raw 8-bit characters directly included in headers;
it should be encoded using RFC2047 rules, somehow.
But neither of these proposals are on topic for inclusion in a base64-encoded auth token, and the RFC2047 reference in the HTTP spec really doesn't work at all since all the places it might potentially be used are explicitly disallowed by the ‘atom context’ rules of RFC2047 itself, even if HTTP headers honoured the rules and extensions of the RFC822 family, which they don't.
In summary: ugh. There is little-to-no hope of this ever being fixed in the standard or in the browsers other than Opera. It's just one more factor driving people away from HTTP Basic Authentication in favour of non-standard and less-accessible cookie-based authentication schemes. Shame really.
It's a known shortcoming that Basic authentication does not provide support for non-ISO-8859-1 characters.
Some UAs are known to use UTF-8 instead (Opera comes to mind), but there's no interoperability for that either.
As far as I can tell, there's no way to fix this, except by defining a new authentication scheme that handles all of Unicode. And getting it deployed.
HTTP Digest authentication is no solution for this problem, either. It suffers from the same problem of the client being unable to tell the server what character set it's using and the server being unable to correctly assume what the client used.
Have you tested using something like curl to make sure it's not a Firefox issue? The HTTP Auth RFC is silent on ASCII vs. non-ASCII, but it does say the value passed in the header is the username and the password separated by a colon, and I can't find a colon in the string that Firefox is reporting sending.
If you are coding for Windows 8.1, note that the sample in the documentation for HttpCredentialsHeaderValue is (wrongly) using UTF-16 encoding. Reasonably good fix is to switch to UTF-8 (as ISO-8859-1 is not supported by CryptographicBuffer.ConvertStringToBinary).
See http://msdn.microsoft.com/en-us/library/windows/apps/windows.web.http.headers.httpcredentialsheadervalue.aspx.
Here a workaround we used today to circumvent the issue of non-ascii characters in the password of a colleague:
curl -u "USERNAME:`echo -n 'PASSWORT' | iconv -f ISO-8859-1 -t UTF-8`" 'URL'
Replace USERNAME, PASSWORD and URL with your values. This example uses shell command substitution to transform the password character encoding to UTF-8 before executing the curl command.
Note: I used here a ` ... ` evaluation instead of ${ ... } because it doesn't fail if the password contains a ! character... [shells love ! characters ;-)]
Illustration of what happens with non-ASCII characters:
echo -n 'zz<zz§zz$zz-zzäzzözzüzzßzz' | iconv -f ISO-8859-1 -t UTF-8
I might be a total ignorant, but I came to this post while looking for a problem while sending a UTF8 string as a header inside an ajax call.
I could solve my problem by encoding in Base64 the string right before sending it. That means that you could with some simple JS convert the form to base64 right before submittting and that way it can be conevrted back on the server side.
This simple tools allowed me to have utf8 strings send as simple ASCII. I found that thanks to this simple sentence:
base64 (this encoding is designed to make binary data survive transport through transport layers that are not 8-bit clean). http://www.webtoolkit.info/javascript-base64.html
I hope this helps somehow. Just trying to give back a little bit to the community!

Resources