OK, I've been pulling my hair out for a couple of days on this issue. There are a couple of technologies at use here, first I'm using Unreal Engine 4 to develop an iOS game and I'm linking to a static lib of sqlite3, that I create the Database for on Windows.
On windows everything works fine, I create the database, and if you do Pragma encoding; it shows UTF-16LE.
However, when on IOS everything falls apart. First of all, if I even try to create a empty database in iOS using sqlite3_open16 function, it will create a database with a bunch of junk at the end of the name, and if I open it, and do pragma encoding it will say UTF-8 (empty database with no tables).
If I try to connect to my existing one, I will have success 'randomly' sometimes, I think this has to do again with the weird characters that are appearing at the end of my string which I suspect is encoding issues.
The function being used to open the database is this:
bool Open(const TCHAR* ConnectionString)
{
int32 Result = sqlite3_open16(ConnectionString, &DbHandle);
return Result == SQLITE_OK;
}
Which works fine in windows but has the issues above in ios.
According to their documentation they use USC-2. From what I can tell in the sqlite source, it will use UTF-16LE. Do I need to do something to convert between these two? Or is there something else I might be missing here? Does anyone have any ideas? I'm hoping someone who might not be familiar with UE4 might still have some guesses.
edit: a list of things I've tried:
Use the UTF-8 Functions SQLITE these appear to work fine. UE4 has a function TCHAR_TO_UTF8 and that worked.
Try to use Objective C to ensure the encoding of UTF-16LE, this gave me the 'random' success I describe above. Besides not only appearing to only randomly work with the weird random text at the end of the string sometimes - anytime I try to pull data out of the database now, it comes back as mostly random question marks '????' with the occasional chinese character. The function I used to do this with is:
const TCHAR* UChimeraSqlDatabase::UTF16_To_PlatformEncoding(FString UTF16EncodedString)
{
#if PLATFORM_IOS
const TCHAR* EncodedString = (const TCHAR *)([[[NSString stringWithFString : UTF16EncodedString] dataUsingEncoding:NSUTF16LittleEndianStringEncoding] bytes]);
#else
const TCHAR* EncodedString = *UTF16EncodedString;
#endif
return EncodedString;
}
Tried using Unreals .AppendChar to add L'\0' to the end of the String, without including number 2's method, no success.
If you're seeing weird characters at the end of the file name when calling sqlite3_16, it sounds like your UTF16 file name was not NULL terminated.
To specify the encoding of the database, you can actually create it with any of the sqlite3_open functions, but the key is that as soon as the database is created, you must immediately set the encoding:
PRAGMA encoding = "UTF-16le";
Once the encoding has been set, you can't change it, so make sure to do this first thing after creating the database.
Related
I am updating archaic code that creates memos. The code was written to use bookmarks inside of manually created template.doc files that aspose can write to. The problem comes from this chunk of code.
foreach (Addressee infoAddressee in ConfigManager.GetConfig().Addressees)
{
if (infoAddressee.Abbreviation == Memo.AddresseeAbbr.ToUpper() &&
infoAddressee.NeedsThisLine)
{
WriteMeString = "FOO BARR ________";
break;
}
}
if (WriteMeString != "")
{
builder.MoveToBookmark("BOOKMARK");
builder.Write(WriteMeString);
}
}
This works for me, but the two people who have tested this chunk of code have the "FOO BARR _______" line appear as "FOO BARR "
the seven underlines are replaced with spaces(the spacing exists on the word doc, but Stack overflow concatenates consecutive spaces). I am not sure what could cause this.
To test we need to copy the file from the remote dev environment into our local environment, I believe this to be the source of the issue, but i do not know for sure.
What I have already tried:
The testers and me are supplying the exact same input for the document.
The testers and I had a slightly different way to save the document and copy paste it over to the local environment, but doing it my way did not change anything.
I am unsure of what could do this for some users but not for others, any suggestions for things i could check out, be it literature with information on the subject or proposed solutions, would be greatly appreciated
I checked the scenario on my side and cannot reproduce the problem. Underscores are properly displayed in the output document. Here are few things to try.
Try setting bookmark text instead of moving to it and writing text.
doc.Range.Bookmarks["BOOKMARK"].Text = WriteMeString;
Try checking whether string is written correctly into the document.
builder.MoveToBookmark("BOOKMARK");
builder.Write("FOO BARR ________");
Assert.AreEqual("FOO BARR ________", builder.Document.Range.Bookmarks["BOOKMARK"].Text);
using (MemoryStream ms = new MemoryStream())
{
builder.Document.Save(ms, SaveFormat.Doc);
ms.Position = 0;
Document tempDoc = new Document(ms);
Assert.AreEqual("FOO BARR ________", tempDoc.Range.Bookmarks["BOOKMARK"].Text);
}
Compare the documents produced on your side and on the testers side yourself (I suppose, you have already done this, but just in case). Probably the documents are correct, but there is difference in viewer used on your and testers side.
Disclosure: I work at Aspose.Words team.
I am opening up .txt files but when they are loaded on Xojo weird characters like these (’ , â€ک) show up.
I've tried DefineEncoding and ConvertEncoding but it still doesn't seem to work.
output.text = output.text.DefineEncoding(Encodings.WindowsANSI)
output.text = output.text.ConvertEncoding(Encodings.UTF8)
You may have to define the encoding already at time of loading, not afterwards, or you'll get UTF8 chara from loading that you will then mess up with your posted code. So, pass the encoding to the Read function or load the data as a binary file, not as a text file.
my app work with SMSC, and i need to get involve in sms before it send,
i try to send from the mobile that string
"hello this is test"
And when I check the smsc I got this as binary string of my text:
userData = "c8329bfd06d1d1e939283d07d1cb733a"
the encoding of this string is:
<Encoding:ASCII-8BIT>
I know that probably this userData is in GSM encoding in binary-string
so how can i get from userData back the clear text string ?
this question is for english lang, because in Hebrew I can get back the
string with this code:
[userData].pack('H*').force_encoding('utf-16be').encode('utf-8')
but in english i got error:
Encoding::InvalidByteSequenceError: "\xDA\xF3" followed by "u" on UTF-16BE
What I was try is to detect the binary string with ICU, and I got:
"ISO-8859-1" and the language that detected is: 'PT', that very strange cause my languages is English or Hebrew.
anyway i got lost with encoding stuff, so i try to encode to each name of list from Encoding.list
but without luck until now
thanks in advance
Shmulik
OK,
For who that also have this issue, i got the solution, thanks to someone from #ruby irc community (i missed his nickname)
The solution is:
for ascii chars that interpolate to binary:
You need that:
"c8329bfd06d1d1e939283d07d1cb733a".scan(/../).reverse_each.map { |h| h.to_i(16) }.pack('C*').unpack('B*')[0][2..-1].scan(/.{7}/).map.with_object("") { |x, s| s << x.to_i(2) }.reverse
Remember I sent this words in sms:
"hello this is test"
And that it has become in binary to:
"c8329bfd06d1d1e939283d07d1cb733a"
The reason that i got garbage in any encoding is, because the ascii chars is 7bits GSM, so only first 7bits represents the data but each another encoding uses at least 8bits, so that what the code actually do.
But this is just for ascii char set.
In another language like I use Hebrew, the SMS send as ucs2
So this code work for me:
[your_binary_string].pack('H*').force_encoding('utf-16be').encode('utf-8')
Very important to put the binary string in array
So that all for now.
If anybody want to translate and explain what exactly happen in the code for ascii char set, be my guest and welcome.
Shmulik
In Titanium Studio, I am storing a one character value in an SQLite database (which uses UTF-8 encoding). When I store a pound symbol (£), it stores fine, but when I read it back, I get ¬£ instead. Strangely enough, the string length still reports to be 1, in spite of two characters being visible. The main problem is that this character forms part of a filename that gets sent to a Windows Server. So, while in Titanium, despite the extra character, everything works, when the filename gets sent to Windows, we get another strange character. I tried converting the character using Ti.Buffer, but when I decode, I still get the same characters back.
var tipo_v='';
var buf = Ti.createBuffer({length:1024});
var l = Ti.Codec.encodeString({
source: Vtipo_visita,
dest: buf,
});
buf.length= l;
tipo_v = Ti.Codec.decodeString({
source: buf,
charset: Ti.Codec.CHARSET_ASCII
});
The variable Vtipo_visita has the ¬£ value. After the call to decodeString(), tipo_v has the value √Ǭ.
I also tried using CHATSET_ISO_LATIN_1, but it didn't make any difference. How can I get this character to display correctly without the extra character in front.
As a final note, I found that simply doing
String.fromCharCode(163)
outputs the two characters in the Debugger, instead of just one. Thanks for any suggestions.
I have download and install KaZip2.0 on C++Builder2009 (with little minor changes => only set type String to AnsiString). I have write:
KAZip1->FileName = "test.zip";
KAZip1->CreateZip("test.zip");
KAZip1->Active = true;
KAZip1->Entries->AddFile("pack\\text.txt","xxx.txt");
KAZip1->Active = false;
KAZip1->Close();
now he create a test.zip with included xxx.txt (59byte original, 21byte packed). I open the archiv in WinRAR successful and want open the xxx.txt, but WinRAR says file is corrupt. :(
What is wrong? Can somebody help me?
Extract not working, because file is corrupt?
KAZip1->FileName = "test.zip";
KAZip1->Active = true;
KAZip1->Entries->ExtractToFile("xxx.txt","zzz.txt");
KAZip1->Active = false;
KAZip1->Close();
with little minor changes => only set
type String to AnsiString
Use RawByteString instead of AnsiString.
I have no idea how KaZip2.0 is implemented, but in general, to make a Delphi/C++ library that was designed without Unicode support in mind working properly you need to do two things:
Replace all Char with AnsiChar and all string to AnsiString
Replace all Win API calls with their Ansi variant, i.e. replace AWin32Function with AWin32FunctionA.
In Delphi < 2009, Char = AnsiChar, String = AnsiString, AWin32Function = AWin32FunctionA, but in Delphi >= 2009, by default, Char = WideChar, String = UnicodeString, AWin32Function = AWin32FunctionW.
WinRAR could be simply failing to recognize the header. Try opening it in Windows or some other zip programs.
with little minor changes => only set
type String to AnsiString
That's doesn't work always right, it may compile but it doesn't mean it will work right in D2009 or CB2009, you need to show the places that you convert Strings to AnsiStrings, specially the code deal with : Buffers, Streams and I/O.
It's not surprising that your code is wrong; KaZip has no documentation.
Proper code is:
//Create a new empty zip file
KAZip1->CreateZip("test.zip");
//Open our newly created zip file so we can add files to it
KAZIP1->Open("test.zip");
//Compress text.txt into xxx.txt
KAZip1->Entries->AddFile("pack\\text.txt","xxx.txt");
//Close the file stream
KAZip1->Close();