Base64 vs 8-bit Email Encoding - character-encoding

I am using PHPMailer to send out email. I tried adding DKIM to my message, but when the body of the email contained this character, DKIM would fail. (I could not copy/paste this character into this question, so I'm attaching a picture of it instead.) I am viewing the character in PHPStorm, and PHPStorm translates it as ZWNJ, the zero-width-joiner character.
Now, when I did this to my PHPMailer object:
$mail->Encoding = 'base64';
DKIM would work, even though that ZWNJ character was present.
My question is what is the difference between base64 for PHPMailer and 8bit encoding, the default encoding for PHPMailer? Is there any downside to setting the encoding to base64? As far as I could tell, they were exactly the same - except one would cause DKIM to fail.

Related

UTF-8 Chars in FTP Greeting

I tried to use Unicode characters in my FTP server's greeting, but the client seems to read them as two different characters each. Because of this, I need a way to encode them into UTF-8. For now, I have the greeting HTML encoded because I am displaying it on a webpage, but on any other client it will display the encoding. How can I set the greeting to be parsed as UTF-8? And if I can't, then is there a way I can parse the greeting correctly?
EDIT: Answered my own question, see below.
I found the answer to the question. It was actually UTF-8 encoded already, and I had to decode it from UTF-8. Here is what I did:
decodeURIComponent(escape(greeting))
Don't forget to replace the line breaks with <br> if you are displaying it on a webpage like I am!
decodeURIComponent(escape(greeting)).replace(/\n/g,'<br>')

Ignore � (non-UTF-8 characters) in email attachment or strip them from the attachment?

Users of our application are able to upload plain text files. These files might then be added as attachments to outgoing ActionMailer emails. Recently an attempt to send said email resulted in an invalid byte sequence in UTF-8 error. The email was not sent. This symbol, �, appears throughout the offending attachment.
We're using ActionMailer so although it ought to go without saying, here's representative code for the attachment action within the mailer class's method:
attachments['file-name.jpg'] = File.read('file-name.jpg')
From a business standpoint we don't care about the content of these text files. Ideally I'd like for our application to ignore the content and simply attach them to emails.
Is it possible to somehow tell Rails / ActionMailer to ignore the formatting? Or should I parse the incoming text file, stripping out non-UTF-8 characters?
I did search through like questions here on Stack Overflow but nothing addressed the problem I'm currently facing.
Edit: I did call #readlines on the file in a Rails console and found that the black diamond is a representation of \xA0. This is likely a non-breaking space in Latin1 (ISO 8859-1).
If Ruby is having problems reading the file and corrupting the characters during the read then try using File.binread. File.binread is inherited from IO
...
attachments['attachment.txt'] = File.binread('/path/to/file')
...
If your file already has corrupted characters then you can either find some process to 'uncorrupt' them, which is not fun, or strip them using by re-encoding from ASCII-8bit to UTF-8 stripping out the invalid characters.
...
attachments['attachment.txt'] = File.binread('/path/to/file')
.encode('utf-8', 'binary', invalid: :replace, undef: :replace)
...
(String#scrub does this but since you can't read it in as UTF-8 then you cant use it.)
With your edit, this seems pretty clear to me:
The file on your filesystem is encoded in latin1.
File.read uses the standard ruby encoding by default. If LANG contains something like "en_GB.utf8", File.read will associate the string with utf-8 encoding. You can verify this by logging the value of str.encoding (where str is the value of File.read).
File.read does not actually verify the encoding, it only slurps in the bytes and slaps on the encoding (like force_encoding).
Later, in ActionMailer, something wants to transcode the string, for whatever reason, and that fails as expected (and with the result you are noticing).
If your text files are encoded in latin1, then use File.read(path, encoding: Encoding::ISO_8859_1). This way, it may work. Let us know if it doesn't...
When reading the file at time of attachment, I can use the following syntax.
mail.attachments[file.file_name.to_s] = File.read(path_to_file).force_encoding("BINARY").gsub(0xA0.chr,"")
The important addition is the following, which goes after the call to File.read(...):
.force_encoding("BINARY").gsub(0xA0.chr,"")
The stripping and encoding ought to be done at time of file upload to our system, so this answer isn't the resolution. It's a short-term band-aid.

Changing charset when retrieving messages from mail server!

i'm currently creating a little mail client and facing a problem with charset.
I use indy's TIdIMAP4 component to retrieve data from mail-server. When i try to retrieve mail bodies then accent letters like ä, ü etc are converted to =E4, =FC respectively as it is using charset ISO-8859-1.
Content-Type: text/plain;
charset="ISO-8859-1"
Content-Transfer-Encoding:
quoted-printable
How can i make server to send me data in another charset, like utf-8? What would be the best solution for that problem?
Thanks in advance!
It is not the charset that is producing strings like =E4 and =FC, it is the Content-Transfer-Encoding instead. $E4 and $FC are the binary representations of ä and ü in ISO-8859-1, but they are 8-bit values. Email is still largely a 7-bit environment. Unless both clients and servers negotiate 8-bit transfers during their communications, then byte octets above $7F have to be encoded in a 7-bit compatible manner to pass through email gateways safely, especially legacy ones that still exist. quoted-printable is a commonly used 7-bit byte encoding in email for textual content. base64 is another one, but it is not human-readible so it tends to be used for binary data instead of textual data (though it can be used for text).
In any case, you cannot make the server deliver the email data to you in another encoding. The server is merely delivering the original email data as-is that was originally delivered to it by the sender. If you want the data in UTF-8, then you have to re-encode it yourself after downloading it. Indy will handle the decoding for you.

Weird charactors on HTML page

i am using Last.fm API to fetch some info of artists .I save info in DB and then display on my webpage.
But characters like “ (double quote) are shown as “ .
Example Artist info http://www.last.fm/music/David+Penn
and i got the first line as " Producer, arranger, dj and musician from Madrid-Spain. He has his own record company “Zen Recordsâ€, and ".
Mine Db is UTF-8 but i dunno why this error is still coming .
This seems to be a character encoding error. Confirm that you are reading the webpage as the correct encoding and are showing the results in the correct encoding.
You should be using UTF-8 all the way through. Check that:
your connection to the database is UTF-8 (using mysql_set_charset);
the pages you're outputting are marked as UTF-8 (<meta http-equiv="Content-Type" content="text/html;charset=utf-8">);
when you output strings from the database, you HTML-encode them using htmlspecialchars() and not htmlentities().
htmlentities HTML-encodes all non-ASCII characters, and by default assumes you are passing it bytes in ISO-8859-1. So if you pass it “ encoded as UTF-8 (bytes 0xE2, 0x80, 0x9C), you'd get “, instead of the expected “ or “. This can be fixed by passing in utf-8 as the optional $charset argument.
However it's usually easier to just use htmlspecialchars() instead, as this leaves non-ASCII characters alone, as raw bytes instead of HTML entity references. This results in a smaller page output, so is preferable as long as you're sure the HTML you're producing will keep its charset information (which you can usually rely on, except in context like sending snippets of HTML in a mail or something).
htmlspecialchars() does have an optional $charset argument too, but setting it to utf-8 is not critical since that results in no change of behaviour over the default ISO-8859-1 charset. If you are producing output in old-school multibyte encodings like Shift-JIS you do have to worry about setting this argument correctly, but today that's quite rare as most sane people use UTF-8 in preference.

How to use send an email with accents using actionmailer

My environment.rb is like this:
ActionMailer::Base.default_charset = "iso-8859-1"
which should be enough for accents, but here is how the message's subject is being sent:
Convite para participação de projeto
Does anyone know what I have to do to fix it?
Is your data in iso-8859-1? From the look of the error example, it seems to be two bytes per character (note the repetition of Ã). Since 8859-1 uses 1 byte per character, my guess is that your data is in utf-8 format.
Also check that your db is not doing any conversions on data going in or out.
I urge you to use unicode / utf-8 everywhere--database, html, emails, etc. It's what all of the kool-kids are using these days. 8859-1 is so last century!
Regarding emails
config.action_mailer.default_charset = "utf-8"
is what I use.
Just remove your setting and let Rails do the work using the default charset, UTF-8.
I work with emails and special characters all the time, there's no need to perform any kind of conversion or setting, at least not on Rails 3. As long as your strings contains the right characters, you'll be fine.
Just make sure the encoding error is not ocurring when reading the data from your database.

Resources