I need to pass a lot of string values to a test procedure, the string parameters are
transferred as a commatext stringlist, code goes like below
[test]
[testcase(test1,'xxxx,yyyy,zzz, ........')]
procedure Test_transmitmany strings(S1, S2, S3, .... Sx String);
if my stringlist gets more than 255 char's I get the error below
[dcc64 Error] Unit_TClass.test.pas(197): E2056 String literals may have at most 255 elements
What is an elegant method to pass many strings to a test case?
I'm also not happy with writing the large stringlist in the testcase definition, looks pretty ugly.
Break the string up into multiple lines with no more than 255 characters on any one line. Then the compiler won't complain.
[testcase(test1,'xxxx,yyyy,zzz,'
+ ' ........')]
Related
I want to update a text like "Updating report (1 0f 5)". I thought format function will help me to do that. I want something like this
string := Format('Updating report ( %d of %d, [1], [2])', loop, count );
but it is not possible. I have an option to have loop and count stored in a string and concatenate everything. But is there any other way to achieve what i want?
Your syntax is wrong. The second parameter to the Format is an open array containing the arguments. So you need to wrap your list of arguments in what is known as an open array constructor.
An open array constructor is a sequence of expressions separated by commas and enclosed in brackets.
So, write the code like this:
str := Format('Updating report (%d of %d)', [loop, count]);
Recently I've been informed by a reputable SO user, that TStringList has splitting bugs which would cause it to fail parsing CSV data. I haven't been informed about the nature of these bugs, and a search on the internet including Quality Central did not produce any results, so I'm asking. What are TStringList splitting bugs?
Please note, I'm not interested in unfounded opinion based answers.
What I know:
Not much... One is that, these bugs show up rarely with test data, but not so rarely in real world.
The other is, as stated, they prevent proper parsing of CSV. Thinking that it is difficult to reproduce the bugs with test data, I am (probably) seeking help from whom have tried using a string list as a CSV parser in production code.
Irrelevant problems:
I obtained the information on a 'Delphi-XE' tagged question, so failing parsing due to the "space character being considered as a delimiter" feature do not apply. Because the introduction of the StrictDelimiter property with Delphi 2006 resolved that. I, myself, am using Delphi 2007.
Also since the string list can only hold strings, it is only responsible for splitting fields. Any conversion difficulty involving field values (f.i. date, floating point numbers..) arising from locale differences etc. are not in scope.
Basic rules:
There's no standard specification for CSV. But there are basic rules inferred from various specifications.
Below is demonstration of how TStringList handles these. Rules and example strings are from Wikipedia. Brackets ([ ]) are superimposed around strings to be able to see leading or trailing spaces (where relevant) by the test code.
Spaces are considered part of a field and should not be ignored.
Test string: [1997, Ford , E350]
Items: [1997] [ Ford ] [ E350]
Fields with embedded commas must be enclosed within double-quote characters.
Test string: [1997,Ford,E350,"Super, luxurious truck"]
Items: [1997] [Ford] [E350] [Super, luxurious truck]
Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.
Test string: [1997,Ford,E350,"Super, ""luxurious"" truck"]
Items: [1997] [Ford] [E350] [Super, "luxurious" truck]
Fields with embedded line breaks must be enclosed within double-quote characters.
Test string: [1997,Ford,E350,"Go get one now
they are going fast"]
Items: [1997] [Ford] [E350] [Go get one now
they are going fast]
In CSV implementations that trim leading or trailing spaces, fields with such spaces must be enclosed within double-quote characters.
Test string: [1997,Ford,E350," Super luxurious truck "]
Items: [1997] [Ford] [E350] [ Super luxurious truck ]
Fields may always be enclosed within double-quote characters, whether necessary or not.
Test string: ["1997","Ford","E350"]
Items: [1997] [Ford] [E350]
Testing code:
var
SL: TStringList;
rule: string;
function GetItemsText: string;
var
i: Integer;
begin
for i := 0 to SL.Count - 1 do
Result := Result + '[' + SL[i] + '] ';
end;
procedure Test(TestStr: string);
begin
SL.DelimitedText := TestStr;
Writeln(rule + sLineBreak, 'Test string: [', TestStr + ']' + sLineBreak,
'Items: ' + GetItemsText + sLineBreak);
end;
begin
SL := TStringList.Create;
SL.Delimiter := ','; // default, but ";" is used with some locales
SL.QuoteChar := '"'; // default
SL.StrictDelimiter := True; // required: strings are separated *only* by Delimiter
rule := 'Spaces are considered part of a field and should not be ignored.';
Test('1997, Ford , E350');
rule := 'Fields with embedded commas must be enclosed within double-quote characters.';
Test('1997,Ford,E350,"Super, luxurious truck"');
rule := 'Fields with embedded double-quote characters must be enclosed within double-quote characters, and each of the embedded double-quote characters must be represented by a pair of double-quote characters.';
Test('1997,Ford,E350,"Super, ""luxurious"" truck"');
rule := 'Fields with embedded line breaks must be enclosed within double-quote characters.';
Test('1997,Ford,E350,"Go get one now'#10#13'they are going fast"');
rule := 'In CSV implementations that trim leading or trailing spaces, fields with such spaces must be enclosed within double-quote characters.';
Test('1997,Ford,E350," Super luxurious truck "');
rule := 'Fields may always be enclosed within double-quote characters, whether necessary or not.';
Test('"1997","Ford","E350"');
SL.Free;
end;
If you've read it all, the question was :), what are "TStringList splitting bugs?"
Not much... One is that, these bugs show up rarely with test data, but not so rarely in real world.
All it takes is one case. Test data is not random data, one user with one failure case should submit the data and voilà, we've got a test case. If no one can provide test data, maybe there's no bug/failure?
There's no standard specification for CSV.
That one sure helps with the confusion. Without a standard specification, how do you prove something is wrong? If this is left to one's own intuition, you might get into all kinds of troubles. Here's some from my own happy interaction with government issued software; My application was supposed to export data in CSV format, and the government application was supposed to import it. Here's what got us into a lot of trouble several years in a row:
How do you represent empty data? Since there's no CSV standard, one year my friendly gov decided anything goes, including nothing (two consecutive commas). Next they decided only consecutive commas are OK, that is, Field,"",Field is not valid, should be Field,,Field. Had a lot of fun explaining to my customers that the gov app changed validation rules from one week to the next...
Do you export ZERO integer data? This was probably an bigger abuse, but my "gov app" decided to validate that also. At one time it was mandatory to include the 0, then it was mandatory NOT to include the 0. That is, at one time Field,0,Field was valid, next Field,,Field was the only valid way...
And here's an other test-case where (my) intuition failed:
1997, Ford, E350, "Super, luxurious truck"
Please note the space between , and "Super, and the very lucky comma that follows "Super. The parser employed by TStrings only sees the quote char if it immediately follows the delimiter. That string is parsed as:
[1997]
[ Ford]
[ E350]
[ "Super]
[ luxurious truck"]
Intuitively I'd expect:
[1997]
[ Ford]
[ E350]
[Super luxurious truck]
But guess what, Excel does it the same way Delphi does it...
Conclusion
TStrings.CommaText is fairly good and nicely implemented, at least the Delphi 2010 version I looked at is quite effective (avoids multiple string allocations, uses a PChar to "walk" the parsed string) and works about the same as Excel's parser does.
In the real world you'll need to exchange data with other software, written using other libraries (or no libraries at all), where people might have miss-interpreted some of the (missing?) rules of CSV. You'll have to adapt, and it'll probably not be a case of right-or-wrong but a case of "my clients need to import this crap". If that happens, you'll have to write your own parser, one that adapts to the requirements of the 3rd party app you'd be dealing with. Until that happens, you can safely use TStrings. And when it does happen, it might not be TString's fault!
I'm going to go out on a limb and say that the most common failure case is the embedded linebreak. I know most of the CSV parsing I do ignores that. I'll use 2 TStringLists, 1 for the file I'm parsing, the other for the current line. So I'll end up with code similar to the following:
procedure Foo;
var
CSVFile, ALine: TStringList;
s: string;
begin
CSVFile := TStringList.Create;
ALine := TStringList.Create;
ALine.StrictDelimiter := True;
CSVFile.LoadFromFile('C:\Path\To\File.csv');
for s in CSVFile do begin
ALine.CommaText := s;
DoSomethingInteresting(ALine);
end;
end;
Of course, since I'm not taking care to make sure that each line is "complete", I can potentially run into cases where the input contains a quoted linebreak in a field and I miss it.
Until I run into real world data where it's an issue, I'm not going to bother fixing it. :-P
Another example... this TStringList.CommaText bug exists in Delphi 2009.
procedure TForm1.Button1Click(Sender: TObject);
var
list : TStringList;
begin
list := TStringList.Create();
try
list.CommaText := '"a""';
Assert(list.Count = 1);
Assert(list[0] = 'a');
Assert(list.CommaText = 'a'); // FAILS -- actual value is "a""
finally
FreeAndNil(list);
end;
end;
The TStringList.CommaText setter and related methods corrupt the memory of the string that holds the a item (its null terminator character is overwritten by a ").
Already tried use TArray<String> split?
var
text: String;
arr: TArray<String>;
begin
text := '1997,Ford,E350';
arr := text.split([',']);
So arr would be:
arr[0] = 1997;
arr[1] = Ford;
arr[2] = E350;
I have incorrect result when converting file to string in Delphi XE. There are several ' characters that makes the result incorrect. I've used UnicodeFileToWideString and FileToString from http://www.delphidabbler.com/codesnip and my code :
function LoadFile(const FileName: TFileName): ansistring;
begin
with TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite) do
begin
try
SetLength(Result, Size);
Read(Pointer(Result)^, Size);
// ReadBuffer(Result[1], Size);
except
Result := '';
Free;
end;
Free;
end;
end;
The result between Delphi XE and Delphi 6 is different. The result from D6 is correct. I've compared with result of a hex editor program.
Your output is being produced in the style of the Delphi debugger, which displays string variables using Delphi's own string-literal format. Whatever function you're using to produce that output from your own program has actually been fixed for Delphi XE. It's really your Delphi 6 output that's incorrect.
Delphi string literals consist of a series of printable characters between apostrophes and a series of non-printable characters designated by number signs and the numeric values of each character. To represent an apostrophe, write two of them next to each other. The printable and non-printable series of characters can be written right not to each other; there's no need to concatenate them with the + operator.
Here's an excerpt from the output you say is correct:
#$12'O)=ù'dlû'#6't
There are four lone apostrophes in that string, so each one either opens or closes a series of printable characters. We don't necessarily know which is which when we start reading the string at the left because the #, $, 1, and 2 characters are all printable on their own. But if they represent printable characters, then the 0, ), =, and ù characters are in the non-printable region, and that can't be. Therefore, the first apostrophe above opens a printable series, and the #$12 part represents the character at code 18 (12 in hexadecimal). After the ù is another apostrophe. Since the previous one opened a printable string, this one must close it. But the next character after that is d, which is not #, and therefore cannot be the start of a non-printable character code. Therefore, this string from your Delphi 6 code is mal-formed.
The correct version of that excerpt is this:
#$12'O)=ù''dlû'#6't
Now there are three lone apostrophes and one set of doubled apostrophes. The problematic apostrophe from the previous string has been doubled, indicating that it is a literal apostrophe instead of a printable-string-closing one. The printable series continues with dlû. Then it's closed to insert character No. 6, and then opened again for t. The apostrophe that opens the entire string, at the beginning of the file, is implicit.
You haven't indicated what code you're using to produce the output you've shown, but that's where the problem was. It's not there anymore, and the code that loads the file is correct, so the only place that needs your debugging attention is any code that depended on the old, incorrect format. You'd still do well to replace your code with that of Robmil since it does better at handling (or not handling) exceptions and empty files.
Actually, looking at the real data, your problem is that the file stores binary data, not string data, so interpreting this as a string is not valid at all. The only reason it works at all in Delphi 6 is that non-Unicode Delphi allows you to treat binary data and strings the same way. You cannot do this in Unicode Delphi, nor should you.
The solution to get the actual text from within the file is to read the file as binary data, and then copy any values from this binary data, one byte at a time, to a string if it is a "valid" Ansi character (printable).
I will suggest the code:
function LoadFile(const FileName: TFileName): AnsiString;
begin
with TFileStream.Create(FileName, fmOpenRead or fmShareDenyWrite) do
try
SetLength(Result, Size);
if Size > 0 then
Read(Result[1], Size);
finally
Free;
end;
end;
Delphi 2009 Win32.
The code below tries to add a 257 length string to a memo.
It compiles and runs fine, but nothing is added to the memo.
Memo1.Lines.Add('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
Looks like a compiler bug. Is it? Because if the string was 256 long, I'd get a compiler error and couldn't compile the app.
Any way to make the app break when the developer tries to do something like this?
I know I could split the string and make this code work, but my point is to prevent developers for using this invalid code without noticing.
Thanks
This is a Delphi 2009 bug with string literals, it should raise the same error as D2007.
Try latest version of Andreas IDE Fix pack, its supose to fix this bug.
http://andy.jgknet.de/blog/?page_id=246
I agree with Gamecat, however if your dealing with a string that large, I would break it
into muliple lines to assist in reading/editing.
if you are LITTERALLY trying to create 257 "a"'s then why not use the DupeString function in the StrUtils unit?
Memo.Lines.Add( DupeString('a',257) );
Much easier to read, and maintain later. If you are doing this in a loop and therefore are worried about performance, then assign the function result to a local variable and use the variable.
var
sLotsOfAs : string;
ix : integer;
begin
sLotsOfAs := DupeString('a',257);
for ix := 0 to 1000000 do
Memo.Lines.Add( sLotsOfAs );
end;
The string literal can be only 255 characters long. Not sure why they kept this limitation. But you can solve it using multiple literals:
Memo1.Lines.Add('i have 128 chars' + 'i also have 128 chars').
I don't know about D2009, but under Delphi 6 at least, string literals are limited to 255 characters, and the compiler diagnoses the error.
In Delphi 2007, I get:
[DCC Error] Unit1.pas(29): E2056 String literals may have at most 255 elements
In Delphi 5, I get:
[Error] Unit1.pas(29): String literals may have at most 255 elements
If the D2009 behavior is as you describe, then two things come to mind:
1 - They expanded the limit on the # of chars in a string, but the TMemo can still only accept up to 255.
or
2 - It's a plain old bug
As far as preventing it, the only thing I can think of is to make a regex to search for these strings in your .PAS files.
Suppose that for some perverse reason you want to display the raw byte contents of a UTF8String.
var
utf8Str : UTF8String;
begin
utf8Str := '€ąćęłńóśźż';
end;
(1) This doesn't do, it displays the readable form:
memo1.Lines.Add( RawByteString( utf8Str ));
// output: '€ąćęłńóśźż'
(2) This, however, does "work" - note the concatenation:
memo1.Lines.Add( 'x' + RawByteString( utf8Str ));
// output: 'x€ąćęłńóśźż'
I understand (1), though the compiler's forced coerction to UnicodeString seems to prevent ever displaying a RawByteString var as-is. However, why does the behavior change in (2)?
(3) Stranger still - let's reverse the concatenation:
memo1.Lines.Add( RawByteString( utf8Str ) + 'x' );
// output: '€ąćęłńóśźżx'
I've been reading up on the newfangled string types in Delphi and thought I understood how they work, but this is a puzzle.
RawByteString only exists to minimize the number of overloads required for functions that work with various flavours of AnsiStrings with different codepage affinities.
In general, don't declare variables of type RawByteString. Don't typecast values to that type. Don't do concatenations on variables of that type. About the only things you can do are:
Declaring a parameter of this type (the original intent)
Indexing on such a parameter
Searching in such a parameter
Intelligent operations that check the actual code page of the string, using the StringCodePage function.
For example, you'll note that the StringCodePage function itself uses RawByteString as its argument type. This way, it will work with any AnsiString, rather than doing a codepage translation before passing it as an argument.
For your case, things like concatenations are largely undefined. The behaviour changed between RTM and Update 2, but when the RTL string concatenation functions receive multiple strings with different code pages, there's no easy way for it to figure out what code page should be used for the final string. That's just one reason why you shouldn't concatenate them like you do here.
You cannot add a string to a TMemo "as is". You always need to so some kind of conversion to Unicode, because that's all TMemo knows about in Delphi 2009.
If you want to pretend that your UTF8String uses code page 1252, do this:
var
utf8Str : UTF8String;
Raw: RawByteString;
begin
utf8Str := '€ąćęłńóśźż';
Raw := utf8Str;
SetCodePage(Raw, 1252, False);
Memo.Lines.Add(Raw);
end;
For more details, see my article Using RawByteString Effectively