I'm working on a program in Delphi that has to conform to the ADC standard protocol. This protocol specifies that each line is terminated with a newline character (#10#13 OR sLineBreak). The problem is that the newline character doesn't seem to be surviving the trip from the server to the program. Reading the data from the socket seems to simply give it all as one massive line. I was thinking it was something to do with the way the program displayed debug messages (to a TMemo object), but a Pos(sLineBreak, Buf) always returns 0 (meaning it couldn't find the string).
My code:
procedure OnRead(Sender: TObject; Socket: TCustomWinSocket);
begin
//read all the data from the socket
while Socket.ReceiveLength > 0 do
Buf := Buf + Socket.ReceiveText;
//use only complete lines
while Pos(sLineBreak, Buf) > 0 do begin
//parsing stuff
end;
end;
Also, the server doesn't have to send the commands in different steps, it can send them all at once with newlines in them, hence the need to read the entire socket in, then go through it as opposed to assuming one command per socket read.
The protocol specification says "a newline (codepoint 0x0a) ends each message." That's a single character. In Delphi syntax, it's #10 or #$a.
The usual value for sLineBreak on Windows is #13#10 — the carriage return comes first, then the newline. The sequence #10#13 is not a line break on any platform I'm aware of.
Everything displays as one line in your memo control because you're receiving only newline characters, without the carriage returns, and TMemo expects both characters to end a line. It's the same as if you'd loaded a Unix-style text file in Notepad.
Hmm, the page you link seems to firmly state in the message syntax that new line is a single char:
eol ::= #x0a
while sLineBreak is two chars on Windows as you state.
Related
I'm working on some legacy software written in Delphi 7 which runs on Windows. I've minified the problem to the following program:
var f: text;
begin
assign(f, 'a.txt');
rewrite(f);
writeln(f, 'before' + chr(14) + 'after');
close(f);
assign(f, 'a.txt');
append(f);
close(f);
end.
I expect it to create a.txt file containing "before#14after#13#10" and then append nothing to it. However, after I run this program on Windows, I see before in a.txt instead, like if Delphi's append truncates the file. If I do not re-open the file, it shows before#14after#13#10 as expected.
If I write something (FooBar) in the re-opened file, it's appended, but as if the file was already truncated: beforeFooBar.
This effect does not occur with any other character between 0 and 32, even with 26 (which stands for EOF).
Is this a bug in Delphi or a well-defined behavior? What is so special about chr(14)?
Thanks to some friends from a chat and Sertac Akyuz from comments: it looks like a bug in Delphi 7.
It's supposed to have special handling of the EOF symbol (ASCII 26), quoting from here:
Note: If a Ctrl+Z (ASCII 26) is present in the last 128-byte block of the file, the current file position is set so that the next character added to the file overwrites the first Ctrl+Z in the block. In this way, text can be appended to a file that terminates with a Ctrl+Z.
Kind of CP/M backward compatibility, I guess.
However, there is a bug in the implementation of TextOpen for Windows (see Source/Rtl/Sys/System.pas from your Delphi 7 installation around line 4282):
##loop:
CMP EAX,EDX
JAE ##success
// if (f.Buffer[i] == eof)
CMP byte ptr [ESI].TTextRec.Buffer[EAX],eof
JE ##truncate
INC EAX
JMP ##loop
Here it says eof instead of cEof. Unfortunatley, that compiles for some reason and it even appeared on StackOverflow already. There is a label called ##eof and that's pretty much it.
Consequence: instead of having special case for 26, we have special case for 14. The exact reason is yet to be found.
Haven't needed to post here for a while, but I have a problem implementing filestreams.
When writing a string to filestream, the resultnig text file has extra spaces inserted between each character
So when running this method:
Function TDBImportStructures.SaveIVDataToFile(const AMeasurementType: integer;
IVDataRecordList: TIV; ExportFileName, LogFileName: String;
var ProgressInfo: TProgressInfo): Boolean; // AM
var
TempString: unicodestring;
ExportLogfile, OutputFile: TFileStream;
begin
ExportLogfile := TFileStream.Create(LogFileName, fmCreate);
TempString :=
'FileUploadTimestamp, Filename, MeasurementTimestamp, SerialNumber, DeviceID, PVInstallID,'
+ #13#10;
ExportLogfile.WriteBuffer(TempString[1], Length(TempString) * SizeOf(Char));
ExportLogfile.Free;
OutputFile := TFileStream.Create(ExportFileName, fmCreate);
TempString :=
'measurementdatetime,closestfiveseconddatetime,closesttenminutedatetime,deviceid,'
+ 'measuredmoduletemperature,moduletemperature,isc,voc,ff,impp,vmpp,iscslope,vocslope,'
+ 'pvinstallid,numivpoints,errorcode' + #13#10;
OutputFile.WriteBuffer(TempString[1], Length(TempString) * SizeOf(Char));
OutputFile.Free;
end;
(which is a stripped down test method, writing headers only). The resulting csv file for the 'OutPutFile' reads
'm e a s u r e d m o d u l e t e m p e r a t u r e, etcetera when viewed in wordpad, but not in excel, notepad, etc.
I'm guessing its the SizeOf(Char) statement which is wrong in a unicode context, but I'm not sure what would be the correct thing to insert here.
The 'ExportLogfile' seems to work ok but not the 'OutPutFile'
From what I've read elsewhere it is the writing in unicode which is the problem & not WordPad, see http://social.msdn.microsoft.com/Forums/en-US/7e040fd1-f399-4fb1-b700-9e7cc6117cc4/unicode-to-files-and-console-vs-notepad-wordpad-word-etc?forum=vcgeneral
Any suggestions folks?
many thanks, Brian
You are writing 16 bit UTF-16 encoded characters. And then viewing the text as if it were ANSI encoded text. This mismatch explains the behaviour. In fact you don't have extra spaces, those are zero bytes, interpreted as null characters.
You need to decide which encoding you wish to use. Which programs will read the file? Which text encoding are they expecting? Few programs that read csv files understand UTF-16.
A quick fix would be to switch to using AnsiString which would result in 8 bit text. But would not support international text. Do you need to support international text? Then perhaps you need UTF-8. Again you could perform a quick fix using Utf8String, but I think you should look deeper.
It's odd that you handle the text to binary conversion. It would be much simpler to use TStringList, calling Add to add lines, and then specify an encoding when saving the file.
List.Add(...);
List.Add(...);
// etc.
List.SaveToFile(FileName, TEncoding.UTF8);
A perhaps more elegant approach would be to use the TStreamWriter class. Supply an output stream (or filename) and an encoding when creating the object. And then call Write or WriteLine to add text.
Writer := TStreamWriter.Create(FileName, TEncoding.UTF8);
try
Writer.WriteLine(...);
// etc.
finally
Writer.Free;
end;
I've assumed UTF-8 here but you can easily specify a different encoding.
I'm writing a delimiter for some Excel spreadsheet data and I need to read the rightward arrow symbol and pilcrow symbol in a large string.
The pilcrow symbol, for row ends, was fairly simply, using the Chr function and the AnsiChar code 182.
The rightward arrow has been more tricky to figure out. There isn't an AnsiChar code for it. The Unicode value for it is '2192'. I can't, however, figure out how to make this into a string or char type for me to use in my function.
Any easy ways to do this?
You can't use the 2192 character directly. But since a STRING variable can't contain this value either (as thus your TStringList can't either), that doesn't matter.
What character(s) are the 2192 character represented as in your StringList AFTER you have read it in? Probably by these three separate characters: 0xE2 0x86 0x92 (in UTF-8 format). The simple solution, therefore, is to start by replacing these three characters with a single, unique character that you can then assign to the Delimiter field of the TStringList.
Like this:
.
.
.
<Read file into a STRING variable, say S>
S := ReplaceStr(S,#$E2#$86#$92,'|');
SL := TStringList.Create;
SL.Text := S;
SL.Delimiter := '|';
.
.
.
You'll have to select a single-character representation of your 3-byte UTF-8 Unicode character that doesn't occur in your data elsewhere.
You need to represent that character as a UTF-16 character. In Unicode Delphi you would do it like this:
Chr(2192)
which is of type WideChar.
However, you are using Delphi 7 which is a pre-Unicode Delphi. So you have to do it like this:
var
wc: WideChar;
....
wc := WideChar(2192);
Now, this might all be to no avail for you since it sounds a little like your code is working with 8 bit ANSI text. In which case that character cannot be encoded in any 8 bit ANSI character set. If you really must use that character, you'll need to use Unicode text.
Recently I've been informed by a reputable SO user, that TStringList has splitting bugs which would cause it to fail parsing CSV data. I haven't been informed about the nature of these bugs, and a search on the internet including Quality Central did not produce any results, so I'm asking. What are TStringList splitting bugs?
Please note, I'm not interested in unfounded opinion based answers.
What I know:
Not much... One is that, these bugs show up rarely with test data, but not so rarely in real world.
The other is, as stated, they prevent proper parsing of CSV. Thinking that it is difficult to reproduce the bugs with test data, I am (probably) seeking help from whom have tried using a string list as a CSV parser in production code.
Irrelevant problems:
I obtained the information on a 'Delphi-XE' tagged question, so failing parsing due to the "space character being considered as a delimiter" feature do not apply. Because the introduction of the StrictDelimiter property with Delphi 2006 resolved that. I, myself, am using Delphi 2007.
Also since the string list can only hold strings, it is only responsible for splitting fields. Any conversion difficulty involving field values (f.i. date, floating point numbers..) arising from locale differences etc. are not in scope.
Basic rules:
There's no standard specification for CSV. But there are basic rules inferred from various specifications.
Below is demonstration of how TStringList handles these. Rules and example strings are from Wikipedia. Brackets ([ ]) are superimposed around strings to be able to see leading or trailing spaces (where relevant) by the test code.
Spaces are considered part of a field and should not be ignored.
Test string: [1997, Ford , E350]
Items: [1997] [ Ford ] [ E350]
Fields with embedded commas must be enclosed within double-quote characters.
Test string: [1997,Ford,E350,"Super, luxurious truck"]
Items: [1997] [Ford] [E350] [Super, luxurious truck]
Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.
Test string: [1997,Ford,E350,"Super, ""luxurious"" truck"]
Items: [1997] [Ford] [E350] [Super, "luxurious" truck]
Fields with embedded line breaks must be enclosed within double-quote characters.
Test string: [1997,Ford,E350,"Go get one now
they are going fast"]
Items: [1997] [Ford] [E350] [Go get one now
they are going fast]
In CSV implementations that trim leading or trailing spaces, fields with such spaces must be enclosed within double-quote characters.
Test string: [1997,Ford,E350," Super luxurious truck "]
Items: [1997] [Ford] [E350] [ Super luxurious truck ]
Fields may always be enclosed within double-quote characters, whether necessary or not.
Test string: ["1997","Ford","E350"]
Items: [1997] [Ford] [E350]
Testing code:
var
SL: TStringList;
rule: string;
function GetItemsText: string;
var
i: Integer;
begin
for i := 0 to SL.Count - 1 do
Result := Result + '[' + SL[i] + '] ';
end;
procedure Test(TestStr: string);
begin
SL.DelimitedText := TestStr;
Writeln(rule + sLineBreak, 'Test string: [', TestStr + ']' + sLineBreak,
'Items: ' + GetItemsText + sLineBreak);
end;
begin
SL := TStringList.Create;
SL.Delimiter := ','; // default, but ";" is used with some locales
SL.QuoteChar := '"'; // default
SL.StrictDelimiter := True; // required: strings are separated *only* by Delimiter
rule := 'Spaces are considered part of a field and should not be ignored.';
Test('1997, Ford , E350');
rule := 'Fields with embedded commas must be enclosed within double-quote characters.';
Test('1997,Ford,E350,"Super, luxurious truck"');
rule := 'Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.';
Test('1997,Ford,E350,"Super, ""luxurious"" truck"');
rule := 'Fields with embedded line breaks must be enclosed within double-quote characters.';
Test('1997,Ford,E350,"Go get one now'#10#13'they are going fast"');
rule := 'In CSV implementations that trim leading or trailing spaces, fields with such spaces must be enclosed within double-quote characters.';
Test('1997,Ford,E350," Super luxurious truck "');
rule := 'Fields may always be enclosed within double-quote characters, whether necessary or not.';
Test('"1997","Ford","E350"');
SL.Free;
end;
If you've read it all, the question was :), what are "TStringList splitting bugs?"
Not much... One is that, these bugs show up rarely with test data, but not so rarely in real world.
All it takes is one case. Test data is not random data, one user with one failure case should submit the data and voilà, we've got a test case. If no one can provide test data, maybe there's no bug/failure?
There's no standard specification for CSV.
That one sure helps with the confusion. Without a standard specification, how do you prove something is wrong? If this is left to one's own intuition, you might get into all kinds of troubles. Here's some from my own happy interaction with government issued software; My application was supposed to export data in CSV format, and the government application was supposed to import it. Here's what got us into a lot of trouble several years in a row:
How do you represent empty data? Since there's no CSV standard, one year my friendly gov decided anything goes, including nothing (two consecutive commas). Next they decided only consecutive commas are OK, that is, Field,"",Field is not valid, should be Field,,Field. Had a lot of fun explaining to my customers that the gov app changed validation rules from one week to the next...
Do you export ZERO integer data? This was probably an bigger abuse, but my "gov app" decided to validate that also. At one time it was mandatory to include the 0, then it was mandatory NOT to include the 0. That is, at one time Field,0,Field was valid, next Field,,Field was the only valid way...
And here's an other test-case where (my) intuition failed:
1997, Ford, E350, "Super, luxurious truck"
Please note the space between , and "Super, and the very lucky comma that follows "Super. The parser employed by TStrings only sees the quote char if it immediately follows the delimiter. That string is parsed as:
[1997]
[ Ford]
[ E350]
[ "Super]
[ luxurious truck"]
Intuitively I'd expect:
[1997]
[ Ford]
[ E350]
[Super luxurious truck]
But guess what, Excel does it the same way Delphi does it...
Conclusion
TStrings.CommaText is fairly good and nicely implemented, at least the Delphi 2010 version I looked at is quite effective (avoids multiple string allocations, uses a PChar to "walk" the parsed string) and works about the same as Excel's parser does.
In the real world you'll need to exchange data with other software, written using other libraries (or no libraries at all), where people might have miss-interpreted some of the (missing?) rules of CSV. You'll have to adapt, and it'll probably not be a case of right-or-wrong but a case of "my clients need to import this crap". If that happens, you'll have to write your own parser, one that adapts to the requirements of the 3rd party app you'd be dealing with. Until that happens, you can safely use TStrings. And when it does happen, it might not be TString's fault!
I'm going to go out on a limb and say that the most common failure case is the embedded linebreak. I know most of the CSV parsing I do ignores that. I'll use 2 TStringLists, 1 for the file I'm parsing, the other for the current line. So I'll end up with code similar to the following:
procedure Foo;
var
CSVFile, ALine: TStringList;
s: string;
begin
CSVFile := TStringList.Create;
ALine := TStringList.Create;
ALine.StrictDelimiter := True;
CSVFile.LoadFromFile('C:\Path\To\File.csv');
for s in CSVFile do begin
ALine.CommaText := s;
DoSomethingInteresting(ALine);
end;
end;
Of course, since I'm not taking care to make sure that each line is "complete", I can potentially run into cases where the input contains a quoted linebreak in a field and I miss it.
Until I run into real world data where it's an issue, I'm not going to bother fixing it. :-P
Another example... this TStringList.CommaText bug exists in Delphi 2009.
procedure TForm1.Button1Click(Sender: TObject);
var
list : TStringList;
begin
list := TStringList.Create();
try
list.CommaText := '"a""';
Assert(list.Count = 1);
Assert(list[0] = 'a');
Assert(list.CommaText = 'a'); // FAILS -- actual value is "a""
finally
FreeAndNil(list);
end;
end;
The TStringList.CommaText setter and related methods corrupt the memory of the string that holds the a item (its null terminator character is overwritten by a ").
Already tried use TArray<String> split?
var
text: String;
arr: TArray<String>;
begin
text := '1997,Ford,E350';
arr := text.split([',']);
So arr would be:
arr[0] = 1997;
arr[1] = Ford;
arr[2] = E350;
I have incorrect result when converting file to string in Delphi XE. There are several ' characters that makes the result incorrect. I've used UnicodeFileToWideString and FileToString from http://www.delphidabbler.com/codesnip and my code :
function LoadFile(const FileName: TFileName): ansistring;
begin
with TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite) do
begin
try
SetLength(Result, Size);
Read(Pointer(Result)^, Size);
// ReadBuffer(Result[1], Size);
except
Result := '';
Free;
end;
Free;
end;
end;
The result between Delphi XE and Delphi 6 is different. The result from D6 is correct. I've compared with result of a hex editor program.
Your output is being produced in the style of the Delphi debugger, which displays string variables using Delphi's own string-literal format. Whatever function you're using to produce that output from your own program has actually been fixed for Delphi XE. It's really your Delphi 6 output that's incorrect.
Delphi string literals consist of a series of printable characters between apostrophes and a series of non-printable characters designated by number signs and the numeric values of each character. To represent an apostrophe, write two of them next to each other. The printable and non-printable series of characters can be written right not to each other; there's no need to concatenate them with the + operator.
Here's an excerpt from the output you say is correct:
#$12'O)=ù'dlû'#6't
There are four lone apostrophes in that string, so each one either opens or closes a series of printable characters. We don't necessarily know which is which when we start reading the string at the left because the #, $, 1, and 2 characters are all printable on their own. But if they represent printable characters, then the 0, ), =, and ù characters are in the non-printable region, and that can't be. Therefore, the first apostrophe above opens a printable series, and the #$12 part represents the character at code 18 (12 in hexadecimal). After the ù is another apostrophe. Since the previous one opened a printable string, this one must close it. But the next character after that is d, which is not #, and therefore cannot be the start of a non-printable character code. Therefore, this string from your Delphi 6 code is mal-formed.
The correct version of that excerpt is this:
#$12'O)=ù''dlû'#6't
Now there are three lone apostrophes and one set of doubled apostrophes. The problematic apostrophe from the previous string has been doubled, indicating that it is a literal apostrophe instead of a printable-string-closing one. The printable series continues with dlû. Then it's closed to insert character No. 6, and then opened again for t. The apostrophe that opens the entire string, at the beginning of the file, is implicit.
You haven't indicated what code you're using to produce the output you've shown, but that's where the problem was. It's not there anymore, and the code that loads the file is correct, so the only place that needs your debugging attention is any code that depended on the old, incorrect format. You'd still do well to replace your code with that of Robmil since it does better at handling (or not handling) exceptions and empty files.
Actually, looking at the real data, your problem is that the file stores binary data, not string data, so interpreting this as a string is not valid at all. The only reason it works at all in Delphi 6 is that non-Unicode Delphi allows you to treat binary data and strings the same way. You cannot do this in Unicode Delphi, nor should you.
The solution to get the actual text from within the file is to read the file as binary data, and then copy any values from this binary data, one byte at a time, to a string if it is a "valid" Ansi character (printable).
I will suggest the code:
function LoadFile(const FileName: TFileName): AnsiString;
begin
with TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite) do
try
SetLength(Result, Size);
if Size > 0 then
Read(Result[1], Size);
finally
Free;
end;
end;