Which encoding replaces 'é' by '\351'? - character-encoding

I'm trying to create pdf from fdf file but I have problem with some UTF-8 characters. I would like to replace them but I don't know which encoding to find. Which encoding can replace UTF-8 characters:
é by \351
č by \226
Thank you for all answers.

Related

Which codepage is 0x81 = ü, 0x94 = ö, 0x9A = Ü?

I've got a CSV file, which has a character encoding which I can't identify. From it's content (German language entries) I could find the following characters matching some 1-byte character encodings:
0x81 = ü
0x94 = ö
0x9A = Ü
Which Codepage is this? Is there any website where you can maybe lookup code pages by known entries?
I was assuming this could be WINDOWS-1252 or ISO-8859-1, but it's neither of them.
As I found out by some more trial and error the encoding is "CP 437" or also called "DOS". Weird to see such an encoding used nowadays.

What scheme is used to encode unicode characters in a .url shortcut?

What scheme is used to encode unicode characters in a windows url shortcut?
For example, a new shortcut for url "http://Ψαℕ℧▶" produces a .url file with the text:
[{000214A0-0000-0000-C000-000000000046}]
Prop3=19,2
[InternetShortcut]
IDList=
URL=http://?aN??/
[InternetShortcut.A]
URL=http://?aN??/
[InternetShortcut.W]
URL=http://+A6gDsSEVIScltg-/
What is the algorithm to decode "+A6gDsSEVIScltg-" to "Ψαℕ℧▶"?
I am not asking for API code, but I would like to know the encoding scheme details.
Note: The encoding scheme is not utf-8 nor utf-16 nor ucs-2 and no %encoding.
+A6gDsSEVIScltg- is the UTF-7 encoded form of Ψαℕ℧▶.
The correct way to process a .url file is to use the IUniformResourceLocator and IPropertyStorage interfaces from the CLSID_InternetShortcut COM object. See Internet Shortcuts on MSDN for details.
The answer (utf-7) allowed me to successfully develop the url conversion routine.
Let me summarize the steps:
To obtain the unicode url from a InternetShortcut.W found in a .url file.
. Pass ascii chars until crlf, after making them internet safe.
. A none escaped + character starts a utf-7 formatted unicode sequence:
. Collect 6-bit nibbles from base64 coded ascii
. Per collected 16 bits, convert the 16 bits to utf-8 (1,2, or 3 chars)
. Pass the utf8 generated characters as %hh
. Continue until the occurrence of a "-" character
. The bit collector should be zero

Lua hex string to ASCII?

I'm wanting to convert a hex string to ASCII character, (for the game ROBLOX).
Here's the page for the ASCII icon:
http://www.fileformat.info/info/unicode/char/25ba/index.htm
Although I'm not even sure that Lua supports that icon.
EDIT:
Turns out ROBLOX doesn't support UTF-8 symbols at all due to their 'chat filtering'.
Strings in Lua are encoding-agnostic and you can just use the character in the string:
print"►"
Alternatively:
Output the Unicode code directly with print"\u{25BA}".
Output the UTF-8 encoding directly with print"\xE2\x96\xBA".
Output the UTF-8 encoding directly with print"\226\150\186".

Convert Utf8 to Unicode

I have a text file. I should convert it to Utf8. After converting, all the numbers in the file are converted to question marks. For example 1380 is converted to 4 question marks like this: '????'.
I'm using delphi 2009.
This is my code for converting:
RichEdit1.Lines.LoadFromFile(OpenDialog1.FileName,TEncoding.UTF8);
How can i correct this conversion?
You should use TEncoding.Unicode if your file is in UTF-16LE ("Unicode") format.
Or you should convert your file to UTF-8 before loading it into RichEdit.

What encoding type of these text?

When I search in Google by Thai language. Google will convert like these.
%E0%B8%A0%E0%B8%B2%E0%B8%A9%E0%B8%B2%E0%B9%84%E0%B8%97%E0%B8%A2
URL Encoding: See http://www.w3schools.com/tags/ref_urlencode.asp
It's a URL encoding in which all
non-alphanumeric characters except
-_. are replaced with a percent (%)
sign followed by two hex digits and
spaces encoded as plus (+) signs. It
is encoded the same way that the
posted data from a WWW form is
encoded, that is the same way as in
application/x-www-form-urlencoded
media type.
(Information copied from http://php.net/manual/en/function.urlencode.php)
UTF-8 + URL Encoding.

Resources