Decode Hebrew text with unidentified encoding - character-encoding

I have a very old Hebrew text file that I don't know what software was created and what its encoding is. The text appears to me like gibberish.
The text looks like this:
enter code here $€∆«¬ Õ–€–‘«¸ Õ–« ¬Œ∆ ‹«Œ≈«‹ √“À “…¨ ©«¬Œ∆ ‹«Œ≈«‹ ∆€–Àœ®¨ «–∆ €∆…»À⁄ «À«Œ≈ \
I would be very grateful if anyone could figure out the coding for me and how to translate it to Hebrew.

Related

Delphi 7 cyrillic characters not showing correctly

i recently asked (and paid) for translation of my Delphi app to support Macedonian (Cyrillic font) support.
I posted text to translate to my contracted translator, she sent me back translated strings. The text was extracted from all my .dfm and .pas files
when i replaced the original text with cyrillic translation, i can open .dfm fies also .pas files in my favourite Notepad++ (or notepad) , and i see translated characters correctly.
When i open these files in Delphi (as dpr file) , i see something like this:
Please someone tell me how to convert/display these strings in Delphi correctly.
I am using Macedonian regional settings, but it not helped me with this problem.
PS: Yes I am still using Delphi 7 because i love it / purchased this version.
UPDATE
Original text in Delphi:
original: ПОДГОТВИ КУТИИ ЗРДРУГИТЕ ЦЕÐТРÐЛИ
Correct text:
ПОДГОТВИ КУТИИ ЗА ДРУГИТЕ ЦЕНТРАЛИ
I noticed, when i change ParentFont property to false and font set to Verdana and Cyrillic (RUSSIAN_CHARSET) , then i copy/paste cyrillic text, it shows normally in Delphi
OK so i SOLVED that!
The solution is multi step one, and Notepad++ is needed:
1st step: Replace all fonts in .dfm with (for example) Verdana , or some font that allows Cyrillic support
2nd step: Replace all ParentFont = False to ParentFont = True
3rd step: In notepad++ Choose: Encoding -> Convert to ANSI
that's all, do this for all .dfm and .pas file (only 3rd step)
i am happy to not Listened David Heffernan and not gave up!
Your text file was UTF-8 encoded, whereas Delphi7 requires WinAnsi encoding, with codepage 1251 for Cyrillic characters.
You have the UTF8Decode() function in System.pas to make the conversion programmatically, if you prefer.

What encoding is this and how do I turn it into something I can see properly?

I'm writing a script that will operate on the subtitle files of a popular streaming service (Netfl*x).
The subtitle files have strange characters in them and I can't get them to render in a way that my text editors or web browser will display in a readable way. The xml encoding says UTF-8, but some characters are not readable.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<tt xmlns:tt="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xmlns:ttp="http://www.w3.org/ns/ttml#parameter" xmlns:tts="http://www.w3.org/ns/ttml#styling" ttp:tickRate="10000000" ttp:timeBase="media" xmlns="http://www.w3.org/ns/ttml">
<p>de 15 % la nuit dernière.</span></p>
<p>if youâve got things to doâ¦</span></p>
And in Vim:
This is what it looks like in the browser:
How can I convert this into something I can use?
I'll go out on a limb and say that file is UTF-8 encoded just fine, and you're merely looking at it using the wrong encoding. The character À encoded in UTF-8 is C3 80. C3 in ISO-8859-1 is Ã, which in your screenshot is followed by an 80. So looks like you're looking at a UTF-8 file using the (wrong) ISO-8859 encoding.
Use the correct encoding when opening the file.
My terminal is set to en_US.UTF-8, but was also rendering this supposedly UTF-8 encoded file incorrectly (sonné -> sonné). I was able to solve this by using iconv to encode the file in ISO8859-1.
iconv original.xml -t ISO8859-1 -o converted.xml
In the new file, the characters were properly rendered, although I don't quite understand why.

Ruby How to convert back binary string from smsc

my app work with SMSC, and i need to get involve in sms before it send,
i try to send from the mobile that string
"hello this is test"
And when I check the smsc I got this as binary string of my text:
userData = "c8329bfd06d1d1e939283d07d1cb733a"
the encoding of this string is:
<Encoding:ASCII-8BIT>
I know that probably this userData is in GSM encoding in binary-string
so how can i get from userData back the clear text string ?
this question is for english lang, because in Hebrew I can get back the
string with this code:
[userData].pack('H*').force_encoding('utf-16be').encode('utf-8')
but in english i got error:
Encoding::InvalidByteSequenceError: "\xDA\xF3" followed by "u" on UTF-16BE
What I was try is to detect the binary string with ICU, and I got:
"ISO-8859-1" and the language that detected is: 'PT', that very strange cause my languages is English or Hebrew.
anyway i got lost with encoding stuff, so i try to encode to each name of list from Encoding.list
but without luck until now
thanks in advance
Shmulik
OK,
For who that also have this issue, i got the solution, thanks to someone from #ruby irc community (i missed his nickname)
The solution is:
for ascii chars that interpolate to binary:
You need that:
"c8329bfd06d1d1e939283d07d1cb733a".scan(/../).reverse_each.map { |h| h.to_i(16) }.pack('C*').unpack('B*')[0][2..-1].scan(/.{7}/).map.with_object("") { |x, s| s << x.to_i(2) }.reverse
Remember I sent this words in sms:
"hello this is test"
And that it has become in binary to:
"c8329bfd06d1d1e939283d07d1cb733a"
The reason that i got garbage in any encoding is, because the ascii chars is 7bits GSM, so only first 7bits represents the data but each another encoding uses at least 8bits, so that what the code actually do.
But this is just for ascii char set.
In another language like I use Hebrew, the SMS send as ucs2
So this code work for me:
[your_binary_string].pack('H*').force_encoding('utf-16be').encode('utf-8')
Very important to put the binary string in array
So that all for now.
If anybody want to translate and explain what exactly happen in the code for ascii char set, be my guest and welcome.
Shmulik

how to convert unicode text to utf8 text readable?

I got a serious problem regarding Unicode and utf8,
I saved a paragraph of Arabic/Persian text file into notepad and saved it, now I saw my information like
Êæ Çíä ÓæÑÓ ÈÑäÇãå ÚÏÏ ÏáÎæÇåí Ñæ ÇÒ æÑæÏí ãííÑå æ Èå Øæá åãæä ÚÏÏ ãËáËí Ñæ ÑÓã ãí ˜äå
my question is how to get back my data, it is important for me to get this data back, thanks in advance
The paragraph was scrambled by saving as code page 1256 (Arabic/Persian), then interpreted as code page 1252 (Western Europe), and finally saved as Unicode text. You can use C# to reverse this procedure:
string scrambled = "Êæ Çíä ÓæÑÓ ÈÑäÇãå ÚÏÏ ÏáÎæÇåí Ñæ ÇÒ æÑæÏí ãííÑå æ " +
"Èå Øæá åãæä ÚÏÏ ãËáËí Ñæ ÑÓã ãí ˜äå";
byte[] bytes = Encoding.GetEncoding("windows-1252").GetBytes(scrambled);
string plainText = Encoding.GetEncoding("windows-1256").GetString(bytes);
Console.WriteLine(text);
The plain text output is:
"تو اين سورس برنامه عدد دلخواهي رو از ورودي ميگيره و به طول همون عدد مثلثي رو رسم مي کنه"
On Linux you can use Gedit to open it as a 1256 encoded file:
gedit shahnameh.txt --encoding WINDOWS-1256
You can do the same work via gui. You just need select the correct encoding from "open" dialog box when opening a file. It should be at the bottom of the open dialog.

BlackBerry webworks native dialog unicode

I have been searching for an answer all around the web, but couldn't find anything.
I am developing Blackberry Webworks application and the problem is with dialog and unicode. For example:
when I use simple javascript alert(unicodeMsg); unicode works fine, I can use any character including Russian or Lithuanian. The problem is that the alert box has title "JavaScript Alert" and it annoys a bit.
when I use native alert either phonegap or webworks like:
blackberry.ui.dialog.standardAskAsync(unicodeMsg,
blackberry.ui.dialog.D_OK, {
title : unicodeTitle,
size: blackberry.ui.dialog.SIZE_MEDIUM,
position : blackberry.ui.dialog.CENTER
});
it doesn't show any unicode characters. I tried pretty much everything (setting my document in utf-8, using \uxxxx characters, changing meta tags from utf-8 to windows-1257 but nothing works)
I suppose the problem is not with html or js documents neither with the script. Can someone help me?
You need to encode the unicode characters like so text: unescape(encodeURIComponent(unicodeStr)) .
There is an example here - http://blackberry-webworks.github.com/WebWorks-API-Docs/WebWorks-API-Docs-next-BB10/view/blackberry.invoke.html

Resources