Strange character before pound symbol in Titanium Studio - character-encoding

In Titanium Studio, I am storing a one character value in an SQLite database (which uses UTF-8 encoding). When I store a pound symbol (£), it stores fine, but when I read it back, I get ¬£ instead. Strangely enough, the string length still reports to be 1, in spite of two characters being visible. The main problem is that this character forms part of a filename that gets sent to a Windows Server. So, while in Titanium, despite the extra character, everything works, when the filename gets sent to Windows, we get another strange character. I tried converting the character using Ti.Buffer, but when I decode, I still get the same characters back.
var tipo_v='';
var buf = Ti.createBuffer({length:1024});
var l = Ti.Codec.encodeString({
source: Vtipo_visita,
dest: buf,
});
buf.length= l;
tipo_v = Ti.Codec.decodeString({
source: buf,
charset: Ti.Codec.CHARSET_ASCII
});
The variable Vtipo_visita has the ¬£ value. After the call to decodeString(), tipo_v has the value √Ǭ.
I also tried using CHATSET_ISO_LATIN_1, but it didn't make any difference. How can I get this character to display correctly without the extra character in front.
As a final note, I found that simply doing
String.fromCharCode(163)
outputs the two characters in the Debugger, instead of just one. Thanks for any suggestions.

Related

How to replace these extended ascii codes?

I am opening up .txt files but when they are loaded on Xojo weird characters like these (’ , â€ک) show up.
I've tried DefineEncoding and ConvertEncoding but it still doesn't seem to work.
output.text = output.text.DefineEncoding(Encodings.WindowsANSI)
output.text = output.text.ConvertEncoding(Encodings.UTF8)
You may have to define the encoding already at time of loading, not afterwards, or you'll get UTF8 chara from loading that you will then mess up with your posted code. So, pass the encoding to the Read function or load the data as a binary file, not as a text file.

Ruby How to convert back binary string from smsc

my app work with SMSC, and i need to get involve in sms before it send,
i try to send from the mobile that string
"hello this is test"
And when I check the smsc I got this as binary string of my text:
userData = "c8329bfd06d1d1e939283d07d1cb733a"
the encoding of this string is:
<Encoding:ASCII-8BIT>
I know that probably this userData is in GSM encoding in binary-string
so how can i get from userData back the clear text string ?
this question is for english lang, because in Hebrew I can get back the
string with this code:
[userData].pack('H*').force_encoding('utf-16be').encode('utf-8')
but in english i got error:
Encoding::InvalidByteSequenceError: "\xDA\xF3" followed by "u" on UTF-16BE
What I was try is to detect the binary string with ICU, and I got:
"ISO-8859-1" and the language that detected is: 'PT', that very strange cause my languages is English or Hebrew.
anyway i got lost with encoding stuff, so i try to encode to each name of list from Encoding.list
but without luck until now
thanks in advance
Shmulik
OK,
For who that also have this issue, i got the solution, thanks to someone from #ruby irc community (i missed his nickname)
The solution is:
for ascii chars that interpolate to binary:
You need that:
"c8329bfd06d1d1e939283d07d1cb733a".scan(/../).reverse_each.map { |h| h.to_i(16) }.pack('C*').unpack('B*')[0][2..-1].scan(/.{7}/).map.with_object("") { |x, s| s << x.to_i(2) }.reverse
Remember I sent this words in sms:
"hello this is test"
And that it has become in binary to:
"c8329bfd06d1d1e939283d07d1cb733a"
The reason that i got garbage in any encoding is, because the ascii chars is 7bits GSM, so only first 7bits represents the data but each another encoding uses at least 8bits, so that what the code actually do.
But this is just for ascii char set.
In another language like I use Hebrew, the SMS send as ucs2
So this code work for me:
[your_binary_string].pack('H*').force_encoding('utf-16be').encode('utf-8')
Very important to put the binary string in array
So that all for now.
If anybody want to translate and explain what exactly happen in the code for ascii char set, be my guest and welcome.
Shmulik

SQLITE UTF-16 Encoding Issues

OK, I've been pulling my hair out for a couple of days on this issue. There are a couple of technologies at use here, first I'm using Unreal Engine 4 to develop an iOS game and I'm linking to a static lib of sqlite3, that I create the Database for on Windows.
On windows everything works fine, I create the database, and if you do Pragma encoding; it shows UTF-16LE.
However, when on IOS everything falls apart. First of all, if I even try to create a empty database in iOS using sqlite3_open16 function, it will create a database with a bunch of junk at the end of the name, and if I open it, and do pragma encoding it will say UTF-8 (empty database with no tables).
If I try to connect to my existing one, I will have success 'randomly' sometimes, I think this has to do again with the weird characters that are appearing at the end of my string which I suspect is encoding issues.
The function being used to open the database is this:
bool Open(const TCHAR* ConnectionString)
{
int32 Result = sqlite3_open16(ConnectionString, &DbHandle);
return Result == SQLITE_OK;
}
Which works fine in windows but has the issues above in ios.
According to their documentation they use USC-2. From what I can tell in the sqlite source, it will use UTF-16LE. Do I need to do something to convert between these two? Or is there something else I might be missing here? Does anyone have any ideas? I'm hoping someone who might not be familiar with UE4 might still have some guesses.
edit: a list of things I've tried:
Use the UTF-8 Functions SQLITE these appear to work fine. UE4 has a function TCHAR_TO_UTF8 and that worked.
Try to use Objective C to ensure the encoding of UTF-16LE, this gave me the 'random' success I describe above. Besides not only appearing to only randomly work with the weird random text at the end of the string sometimes - anytime I try to pull data out of the database now, it comes back as mostly random question marks '????' with the occasional chinese character. The function I used to do this with is:
const TCHAR* UChimeraSqlDatabase::UTF16_To_PlatformEncoding(FString UTF16EncodedString)
{
#if PLATFORM_IOS
const TCHAR* EncodedString = (const TCHAR *)([[[NSString stringWithFString : UTF16EncodedString] dataUsingEncoding:NSUTF16LittleEndianStringEncoding] bytes]);
#else
const TCHAR* EncodedString = *UTF16EncodedString;
#endif
return EncodedString;
}
Tried using Unreals .AppendChar to add L'\0' to the end of the String, without including number 2's method, no success.
If you're seeing weird characters at the end of the file name when calling sqlite3_16, it sounds like your UTF16 file name was not NULL terminated.
To specify the encoding of the database, you can actually create it with any of the sqlite3_open functions, but the key is that as soon as the database is created, you must immediately set the encoding:
PRAGMA encoding = "UTF-16le";
Once the encoding has been set, you can't change it, so make sure to do this first thing after creating the database.

Using MSXML2.ServerXMLHTTP to access data from a web page returns truncated data in Lua

I am trying to download a source code file from a web site which works fine for small files, but a couple of larger ones get truncated.
The example below should be returning a file 146,135 bytes in size, but returns one of 141,194 bytes with a status of 200.
I have tried winhttp.winhttprequest.5.1 as well, but both seem to truncate at the same point.
I have also found quite a few people with similar problems, but have not been able to find a solution.
require('luacom')
http = luacom.CreateObject('MSXML2.ServerXMLHTTP')
http:Open("GET","http://www.family-historian.co.uk/wp-content/plugins/forced-download2/download.php?path=/wp-content/uploads/formidable/tatewise/&file=Map-Life-Facts3.fh_lua&id=190",true)
http:Send()
http:WaitForResponse(30)
print('Status: '..http.Status)
print('----------------------------------------------------------------')
headers = http:GetAllResponseHeaders()
data = http.Responsetext
print('Data Size = '..#data)
print('----------------------------------------------------------------')
print(headers)
I finally worked out what was going on so will post it here for others.
To avoid the truncation I needed to use ResponseBody and not ResponseText, what appears to be happening is the file is being sent in binary format, the ResponseText data is the same number of bytes as the ResponseBody one, but is in UTF-8 format, this means the number if special characters in the file (which are double byte in UTF-8 are dropped from the end of the ResponseText. I am not sure at what level the "mistake" in the length is made, but the way to avoid it is to use ResponseBody.

Patched Delphi library for unicode support in TPageProducer callbacks?

I've been using Delphi 2009 with the Indy library (10) that ships and have been upgrading a legacy application that makes heavy use of the TPageProducer. The legacy app was originally written for Delphi 5 / Indy 8.
I'm using the OnHTMLTag property of TPageProducer to specify a function that will handle the HTML transparent tags in my source. My problem was that if I put unicode (Simplified Chinese) characters in the TPageProducer.HTMLDoc property, when the OnHTMLTag callback was called, the TagParams argument contains ?? instead of the expected Chinese characters.
I traced this down to around line 2053 of HTTPApp.pas where we separate out the key / value pairs of the transparent tag:
procedure ExtractHeaderFields(Separators, WhiteSpace: TSysCharSet; Content: PChar;
Strings: TStrings; Decode: Boolean; StripQuotes: Boolean = False);
...
if Decode then
Strings.Add(string(HTTPDecode(AnsiString(DoStripQuotes(ExtractedField)))))
else
Strings.Add(DoStripQuotes(ExtractedField));
...
Everything is fine until we cast the string to an AnsiString and pass it to HTTPDecode, at which point my Strings list contains ?? as does my final TagParams and webpage.
Should there be a version of HTTPDecode that works with Strings instead of AnsiStrings? If so, where might I find this?
For now, I've just disabled the decode routine when I parse my tokens for the TPageProducer, but it isn't a nice fix and would prefer to have a version of this that works with wide characters (if that is even possible).

Resources