I am assigning a string to a custom type I have declared, which I Read/Write using the TTreeViews Node.Data property. I read and write to and from the node, something like this:
Read: RichEdit1.Lines.Text := TMyData(TreeView1.Selected.Data).MyString;
Write: TMyData(TreeView1.Selected.Data).MyString := RichEdit1.Lines.Text;
This works perfect for plain strings, I want to allow Rich Formatted text to be stored in the string, without losing the formatting. I managed to do this by using Streams on the RichEdit, because I am saving my database using the Freeware Zeos Lib (SQL) I get Unknown Token errors (likely from the RTF tags). How can I save without the errors?
UPDATE
I have managed to get it saving correctly without erroring now, using Base64 Encoding/Decoding as suggested by Sylverdrag. This encodes my strings removing the bad characters.
Check out http://delphi.about.com/od/adptips2003/a/bltip1203_5.htm
(My original answer was for C# - misread your question)
Related
I'm currently writing a PHP extension in C++ with the Zend API. Basically I make PHP_METHOD{..} wrappers around my native C++ interface methods and using "zend_parse_parameters(..)" to fetch the corresponding input arguments.
This extension contains methods which can take strings as arguments, such as a filename.
I know from http://php.net/manual/en/language.types.string.php#language.types.string.details that strings have no encoding in PHP, but still can I expect from the PHP programmer that he will use a function like "utf8_decode(..)" such that the input strings can be read by the extension correctly?
Or does the PHP Programmer expect that the extension detects the encoding from the php-script and handles strings accordingly?
Every help is highly appreciated! Thanks!
You are correct. Strings are just binary blobs in PHP. As the author of an extension. Your options:
Have the user hand your extension UTF-8: By far the best option. The user has to make the decision. Assert that the string is UTF-8 encodable and fail early.
Encode yourself: You cannot know the meaning of the string. As PHP strings are just binary blobs and have no encoding information you do not know what the intended string content is. It might as well just come from a Windows file with weird encoding and was concatenated with a complete different encoding. Worse, it might be UTF-8 encodable, but actually not UTF-8, in which way you interpret it wrongly, without the user knowing. Hence, solution 1, have the user pass UTF-8.
Alternative: Force the user to pass an input encoding.
Here is an example of the alterantive 3:
$obj = MyExtensionClass('UTF-8'); // force encoding
$obj->someMethod($inputStr); // try to convert now
The standard library uses approach 1. See json_encode as an example:
With IdFTP, the server i'm connecting to is not using UTF-8, but ANSI. There's nothing special about my code, i simply set Host, Username, Password and Connect to server. Then i call List method with no parameters. Iterating through DirectoryListing gives me incorrect results for file names. My sample directory name encoded in local code page (CP-1250) is:
aąęsśćńółżźz
I thought i'll be able to "fix" file name field by converting it to AnsiString and setting code page but it seems to be already broken - memory dump of DirectoryListing[I].FileName:
a ? ? s ? ? ? ?? ?? z
6100 FDFF FDFF 7300 FDFF FDFF FDFF 8FDB DFDF 7A00
Manipulating with GIdDefaultAnsiEncoding or IOHandler.DefStringEncoding (after Connect, before List) makes no difference. I don't want to mess in IdFTP or IdGlobal code because i'm using it with other projects that involve Unicode and these works perfectly. Delphi XE2 or XE7.
As you can see FData contains raw file name in a 2 bytes per char string:
Even if i set IOHandler.DefStringEncoding to any TIdTextEncoding that is FIsSingleByte = True, FMaxCharSize = 1. However it looks promising because #$009F is "ź" in CP-1250, but i'm not looking for a per server, temporary solution. I expected Indy to handle this correctly after setting IOHandler.DefStringEncoding and GIdDefaultAnsiEncoding based on server capabilities (UTF-8 or ANSI with specified encoding).
Total Commander connection log:
Your server supports the MLSD command. Total Commander is sending the MLSD command and not the older LIST command. This is good, because MLSD has a standardized format (see RFC 3659), which includes support for embedded charset information. If no charset is explicitly stated, UTF-8 must be used.
You did not show the command/response log for TIdFTP, but the fact that the TIdFTPListItem.Data property is showing MLSD formatted output data means TIdFTP.List() is also using the MLSD command (by calling TIdFTP.ExtListDir() internally). The output shown does not include an explicit charset attribute, so TIdFTP will decode the filename as UTF-8.
However, the raw filename data that is shown in the TIdFTPListItem.Data property is NOT the correct UTF-8 encoded form of the directory name you have shown (even when stored as a raw 8-bit encoded UnicodeString - which is what TIdFTP.ExtListDir() does internally before parsing it). So the problem is either:
your FTP server is not converting the directory name from CP-1250 to UTF-8 correctly in the first place. Considering that Total Commander appears to be able to handle the listing correctly, this is not likely.
TIdFTP is not storing the raw UTF-8 octet data correctly before parsing it. This is more likely.
Hard to say which is actually the case since you did not show the raw listing data that is actually being transmitted. And you did not specify which exact version of Delphi and Indy you are using, either. Assuming the server is transmitting UTF-8 correctly, you might simply be using an older Indy version that does not handle the UTF-8 transmission correctly. AFAIK, the current version available (10.6.2.5270 at the time of this writing) should be able to handle it, as long as you are using Delphi 2009 or later. If you can provide a Wireshark capture of the raw listing data, I can check if there are any logic issues in TIdFTP that need to be fixed or not.
My team was looking for quick solution that i had to provide. My solution is based on this post: http://forums2.atozed.com/viewtopic.php?p=32301#p32301 and this question: Converting UnicodeString to AnsiString
Once FTP listing is finished i do overwrite FileName property via function that extracts file name from Data, and then convert String to RawByteString with correct code page. Fix is applied only if server doesn't support UTF-8. This way i'm able to move around FTP - ChangeDir, Get, Put etc. without problems.
I have the following xml that I would like to read:
chinese xml - https://news.google.com/news/popular?ned=cn&topic=po&output=rss
korean xml - http://www.voanews.com/templates/Articles.rss?sectionPath=/korean/news
Currently, I try to use a luaxml to parse in the xml which contain the chinese character. However, when I print out using the console, the result is that the chinese character cannot be printed correctly and show as a garbage character.
I would like to ask if there is anyway to parse a chinese or korean character into lua table?
I don't think Lua is the issue here. The raw data the remote site sends is encoded using UTF-8, and Lua does no special interpretation of that—which means it should be preserved perfectly if you just (1) read from the remote site, and (2) save the read data to a file. The data in the file will contain CJK characters encoded in UTF-8, just like the remote site sent back.
If you're getting funny results like you mention, the fault probably lies either with the library you're using to read from the remote site, or perhaps simply with the way your console displays the results when you output to it.
I managed to convert the "ä¸ç¾" into chinese character.
I would need to do one additional step which has to convert all the the series of string by using this method from this link, http://forum.luahub.com/index.php?topic=3617.msg8595#msg8595 before saving into xml format.
string.gsub(l,"&#([0-9]+);", function(c) return string.char(tonumber(c)) end)
I would like to ask for LuaXML, I have come across this method xml.registerCode(decoded,encoded)
Under that method, it says that
registers a custom code for the conversion between non-standard characters and XML character entities
What do they mean by non-standard characters and how do I use it?
If I have the following data, what is the best option in terms of Database storage.
Here is text<br><br>Here is some more text
I see that I have 3 options:
Store in DB as it is then decode at runtime: <p>hello</p>
Decode and then store in DB: <p>hello</p>
Strip tags completely: Hello
Are there any big "No No's" with any of the above, just looking for some advice on best practice. Also worth noting that I will have absolutely no control over the data that I receive.
Depending on your requirement, I suggest to either strip the tags or store the unencoded version.
If you don't need the tags, the you can strip them and store the plain text.
If you need to preserve the tags and the formatting, then it's easier to save the unencoded version. Dealing with real tags it's much simpler.
Also, it's a view responsibility to encode the output. In fact, it strictly depends on where you are going to print the string.
In the console, for example, tags doesn't create any issue. It's just when you need to print the string into an HTML view. But fortunately Rails takes care of output sanitization for you, so you don't need to store the sanitized version in the database.
Convert the data to canonical form, and store that. That is, you should store <p>Hello</p> or Here is text<br><br>Here is some more text (though I doubt that's the decoding you intended for your example).
Then, you can search without having to worry about how it was encoded (Ö, Ö or Ö, for example?), and just encode it to whatever format is appropriate for display on rendering.
in my application ,i am sending data from my application to database
i am getting some odd characters in my database like this
i am sending my data like this
var
w:widestring;
u:utf8string;
begin
w:=data //data is function to get some info(string)
u:=utf8encode(w);
sendfn(u);
end;
i am using utf8_decode(my get data) in my php code before adding to my database.
and my database and tables collation is utf8_general_ci
can anyone help me in this issue
It's an educated guess, but does the Data function return an UTF-8 string instead of a WideString? I think the error could be in this Data function that you're calling, which returns the data in the wrong string format.
The php function utf8_decode converts characters from utf-8 to ISO-8859-1. If the path your data take (the browser or whatever component you use to send your data to the web server (http request, your php installation, your webpage and your database connection) from your delphi app to your database that is behind you webpage are able to support and configured to use utf-8 data you don't need the utf8_decode function, you can just insert your data the way it comes.
If you haven't already configured php to work with UTF-8, be aware that it is difficult and never works 100% (for me at least it never did), so maybe it would be better for you to use data in your locale encoding.