I have a legacy app that stores values in DB as:
0x010000004C41668D6C5C30B39BB81AB3757E7FD75121F6446802D6120A11103C3C563633C6758A110500AC9635DB46B4363CF5FDEC6BC82FB0237412
This looks like a hex encoded string but when I tried to decode it into utf8 or something, it didn't work... Does anybody know what encoding starts with 0x01000000? And how to decode it?
Please, note that this is an Arabic word.
Any help is appreciated.
Thank you.
It could possibly be Unicode text compressed with SCSU or BOCU.
Related
Is it possible to set the character encoding in auto-py-to-exe, like is there a command to set ( for example) UTF-8 while converting the py to exe?
thank you very much for future answers!
I have sent a text file as an attachment from from whatsapp and then when opened the sent file in iPhone app I am seeing =EF=BB=BF in start which means it is BOM utf-8 file. My question is why '=' character is coming after every code instead of 0x?
Also all emoji are coming in this style =F0=9F=98=9D, how can I convert this into simple text in objective C? Any help is much appreciated.
Thanks
What you're seeing is quoted-printable encoding. This encoding scheme escapes non-printable ASCII or 8-bit characters as =xx where xx is the hex value of the byte. Quoted-printable is mainly used in email transmission. See the question Objective-C decode quoted printable text for tips on decoding.
I'm using Rails 3.2.1 and I have stuck on some problem for quite long.
I'm using oracle enhanced adapter and I have raw(16) (uuid) column and when I'm trying to display the data there is 2 situations:
1) I see the weird symbols
2) I'm getting incompatible character encoding: Ascii-8bit and utf-8.
In my application.rb file I added the
config.encoding = 'utf-8'
and in my view file I added
'#encoding=utf-8'
But so far nothing worked
I also tried to add html_safe but it failed .
How can I safely diaply my uuid data?
Thank you very much
Answer:
I used the unpack method to convert the
binary with those parameters
H8H4H4H4H12 and in the end joined the
array :-)
The RAW datatype is a string of bytes that can take any value. This includes binary data that doesn't translate to anything meaningful in ASCII or UTF-8 or in any character set.
You should really read Joel Spolsky's note about character sets and unicode before continuing.
Now, since the data can't be translated reliably to a string, how can we display it? Usually we convert or encode it, for instance:
we could use the hexadecimal representation where each byte is converted to two [0-9A-F] characters (in Oracle using the RAWTOHEX function). This is fine for display of small binary field such as RAW(16).
you can also use other encodings such as base 64, in Oracle with the UTL_ENCODE package.
I am trying to convert UTF-8 string into UCS-2 string.
I need to get string like "\uFF0D\uFF0D\u6211\u7684\u4E0A\u7F51\u4E3B\u9875".
I have googled for about a month by now, but still there is no reference about converting UTF-8 to UCS-2.
Please someone help me.
Thx in advance.
EDIT: okay, maybe my explanation was not good enough. Here is what I am trying to do.
I live in Korea, and I am trying to send a sms message using CTMessageCenter. I tried to send chinese simplified character through my app. And I get ???? Instead of proper characters. So I tried UTF-8, UTF-16, BE and LE as well. But they all return ??. Finally I found out that SMS uses UCS-2 and EUC-KR encoding in Korea. Weird, isn't it?
Anyway I tried to send string like \u4E3B\u9875 and it worked.
So I need to convert string into UCS-2 encoding first and get the string literal from those strings.
Wikipedia:
The older UCS-2 (2-byte Universal Character Set) is a similar
character encoding that was superseded by UTF-16 in version 2.0 of the
Unicode standard in July 1996.2 It produces a fixed-length format
by simply using the code point as the 16-bit code unit and produces
exactly the same result as UTF-16 for 96.9% of all the code points in
the range 0-0xFFFF, including all characters that had been assigned a
value at that time.
IBM:
Since the UCS-2 standard is limited to 65,535 characters, and the data
processing industry needs over 94,000 characters, the UCS-2 standard
is in the process of being superseded by the Unicode UTF-16 standard.
However, because UTF-16 is a superset of the existing UCS-2 standard,
you can develop your applications using the systems existing UCS-2
support as long as your applications treat the UCS-2 as if it were
UTF-16.
uincode.org:
UCS-2 is obsolete terminology which refers to a Unicode
implementation up to Unicode 1.1, before surrogate code points and
UTF-16 were added to Version 2.0 of the standard. This term should now
be avoided.
UCS-2 does not define a distinct data format, because UTF-16 and UCS-2
are identical for purposes of data exchange. Both are 16-bit, and have
exactly the same code unit representation.
So, using the "UTF8toUnicode" transformation in most language libraries will produce UTF-16, which is essentially UCS-2. And simply extracting the 16-bit characters from an Objective-C string will accomplish the same thing.
In other words, the solution has been staring you in the face all along.
UCS-2 is not a valid Unicode encoding. UTF-8 is.
It is therefore impossible to convert UTF-8 into UCS-2 — and indeed, also the reverse.
UCS-2 is dead, ancient history. Let it rot in peace.
I need to know what character will the SHA1 will generate for me?
Is it possible to know the characterset of the SHA1? Or if it's configurable, what's the default characterset of it?
Thank you.
SHA-1 doesn't generate text, it generates a binary hash (like most digests), so it doesn't have a charset (or care about the input's charset for that matter).
You can represent it as text (a string representation of the hex value, and base64 are popular) if you want, especially if you need to transfer it over the network or display it to users. That encoding is up to you.
I'm fairly sure it's just binary data rather than any character encoding. You could then encode that in Base64 if you like.
The hash algorithm SHA1 takes a stream of bytes as input, and calculates the 160-bits digest. Command line versions output the digest as a hexadecimal string. No charsets involved.