Detecting a "EBCDIC X'DF'" character in a file - character-encoding

Reading this document:
PK19612: SENDMAIL TIMESOUT - SENDING OUT AN EMAIL THAT CONTAINS A EBCDIC X'DF'
https://www.ibm.com/support/pages/apar/PK26459
The recommended fix is to:
Local fix
remove the x'DF' from the body of the email
How can a "X'DF'" character be detected under linux?
Thanks!

Related

Robot framework ImapLibrary: 'quoted-printable' is not a text encoding

I'm getting this LookupError when I try to use Get Links From Email keyword:
Open Mailbox server=imap.googlemail.com user=user#mail.com password=pass ${proWelcomeMail} = Wait for Email recipient=${USER_EMAIL} subject=Welcome
Open Link From Email ${proWelcomeMail}
...
Close Mailbox
Output:
ImapLibrary . Get Links From Email ${proWelcomeMail}
LookupError: 'quoted-printable' is not a text encoding; use codecs.decode() to handle arbitrary codecs
Is there a workaround for this please?
You can change the library's code by yourself because it's not fixed by the maintainer. Change the line 135 in file init.py "decode('quoted-printable')" to "decode('utf-8')".
Use the fork ImapLibrary2 which has fixed it already
https://pypi.org/project/robotframework-imaplibrary2/

Ruby How to convert back binary string from smsc

my app work with SMSC, and i need to get involve in sms before it send,
i try to send from the mobile that string
"hello this is test"
And when I check the smsc I got this as binary string of my text:
userData = "c8329bfd06d1d1e939283d07d1cb733a"
the encoding of this string is:
<Encoding:ASCII-8BIT>
I know that probably this userData is in GSM encoding in binary-string
so how can i get from userData back the clear text string ?
this question is for english lang, because in Hebrew I can get back the
string with this code:
[userData].pack('H*').force_encoding('utf-16be').encode('utf-8')
but in english i got error:
Encoding::InvalidByteSequenceError: "\xDA\xF3" followed by "u" on UTF-16BE
What I was try is to detect the binary string with ICU, and I got:
"ISO-8859-1" and the language that detected is: 'PT', that very strange cause my languages is English or Hebrew.
anyway i got lost with encoding stuff, so i try to encode to each name of list from Encoding.list
but without luck until now
thanks in advance
Shmulik
OK,
For who that also have this issue, i got the solution, thanks to someone from #ruby irc community (i missed his nickname)
The solution is:
for ascii chars that interpolate to binary:
You need that:
"c8329bfd06d1d1e939283d07d1cb733a".scan(/../).reverse_each.map { |h| h.to_i(16) }.pack('C*').unpack('B*')[0][2..-1].scan(/.{7}/).map.with_object("") { |x, s| s << x.to_i(2) }.reverse
Remember I sent this words in sms:
"hello this is test"
And that it has become in binary to:
"c8329bfd06d1d1e939283d07d1cb733a"
The reason that i got garbage in any encoding is, because the ascii chars is 7bits GSM, so only first 7bits represents the data but each another encoding uses at least 8bits, so that what the code actually do.
But this is just for ascii char set.
In another language like I use Hebrew, the SMS send as ucs2
So this code work for me:
[your_binary_string].pack('H*').force_encoding('utf-16be').encode('utf-8')
Very important to put the binary string in array
So that all for now.
If anybody want to translate and explain what exactly happen in the code for ascii char set, be my guest and welcome.
Shmulik

Firefox addon localization

I have problem with localization my addon. I followed this tutorial on Using Localized Strings in Preferences but I can't compile my addon because I use polish characters ć and others.
I've made locale folder and put there pl-PL.properties file with this content:
my_tag_title = Co robić?
and I got error:
Following locale file is not a valid UTF-8 file: C:\path\pl-PL.properties
'utf8' codec can't decode byte 0xe6 in position 22: invalid continuation byte"
Is there way to put special characters directly inside package.json?
How to solve this problem?
Make sure that the locale file is saved in UTF-8 format.

iconv C API: charset conversion from/to local encoding

I am using the iconv C API and I want iconv to detect the local encoding of the computer. Is that possible? Apparently it is because when I look in the source code, I find in the file iconv_open1.h that if the fromcode or tocode variables are empty strings ("") then the local encoding is used using the locale_charset() function call.
Someone also told me that in order to convert the locale encoding to unicode, all I needed was to use iconv_open ("UTF-8", "")
Unfortunately, I find no mention of this in the documentation.
And when I convert some iso-8859-1 text to the locale encoding (which is utf-8 on my machine), then during conversion I get errno=EILSEQ (illegal sequence). I checked and iconv_open returned no error.
If instead of the empty string in iconv_open I specify "utf-8", then I get no error. Obviously iconv failed to detect my current charset.
edit: I checked with a simple C program that puts(nl_langinfo(CODESET)) and I get ANSI_X3.4-1968 (which is ASCII). Apparently, I got a problem with charset detection.
edit: this should be related to Why is nl_langinfo(CODESET) different from locale charmap?
additional information: my program is written in Ada, and I bind at link-time to C functions. Apparently, the locale setting is not initialized the same way in the Ada runtime and C runtime.
I'll take the same answer as in Why is nl_langinfo(CODESET) different from locale charmap?
You need to first call
setlocale(LC_ALL, "");

Character Encoding issue in Rails v3/Ruby 1.9.2

I get this error sometimes "invalid byte sequence in UTF-8" when I read contents from a file. Note - this only happens when there are some special characters in the string. I have tried opening the file without "r:UTF-8", but still get the same error.
open(file, "r:UTF-8").each_line { |line| puts line.strip(",") } # line.strip generates the error
Contents of the file:
# encoding: UTF-8
290919,"SE","26","Sk‰l","",59.4500,17.9500,, # this errors out
290956,"CZ","45","HornÌ Bradlo","",49.8000,15.7500,, # this errors out
290958,"NO","02","Svaland","",58.4000,8.0500,, # this works
This is the CSV file I got from outside and I am trying to import it into my DB, it did not come with "# encoding: UTF-8" at the top, but I added this since I read somewhere it will fix this problem, but it did not. :(
Environment:
Rails v3.0.3
ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-darwin10.5.0]
Ruby has a notion of an external encoding and internal encoding for each file. This allows you to work with a file in UTF-8 in your source, even when the file is stored in a more esoteric format. If your default external encoding is UTF-8 (which it is if you're on Mac OS X), all of your file I/O is going to be in UTF-8 as well. You can check this using File.open('file').external_encoding. What you're doing when you opening your file and passing "r:UTF-8" is forcing the same external encoding that Ruby is using by default.
Chances are, your source document isn't in UTF-8 and those non-ascii characters aren't mapping cleanly to UTF-8 (if they were, you would either get the correct characters and no error, and if they mapped by incorrectly, you would get incorrect characters and no error). What you should do is try to determine the encoding of the source document, then have Ruby transcode the document on read, like so:
File.open(file, "r:windows-1251:utf-8").each_line { |line| puts line.strip(",") }
If you need help determining the encoding of the source, give this Python library a whirl. It's based on the automatic charset detection fallback that was in Seamonkey/Mozilla (and is possibly still in Firefox).
If you want to change your file encoding, you can use gem 'charlock holmes'
https://github.com/brianmario/charlock_holmes
$require 'charlock_holmes/string'
content = File.read('test2.txt')
if !content.is_utf8?
detection = CharlockHolmes::EncodingDetector.detect(content)
utf8_encoded_content = CharlockHolmes::Converter.convert content, detection[:encoding], 'UTF-8'
end
Then you can save your new content in a temp file and overwrite your original file.
Hope this help.

Resources