I'm writing a section of code to read in CSV files and parse information out of them (currently I just have the beginning part of the code which will read in the headers at the beginning of the file. When I try to compile this code I'm receiving an error on the line which takes the length of the line from file.
The error I'm recieving is: [Error] MCLRandomizer.pas(*): Missing operator or semicolon
while not EOF(csvFile) do begin
i :=0;
ReadLn(csvFile, line);
if lineOne = true then begin
length := Length(line); //error here
while length > 0 do begin
dx := Pos(',', line);
buffer := Copy(line, 0, dx-1);
headers[i] := buffer;
line := Copy(line, dx+1, length);
length := Length(line); //error here
end;
lineOne := false;
end;
end;
Pascal makes no difference between length and Length ... they both are LENGTH
Rename the variable, it messes up the function.
FTR: If you really, really want you can write
length := System.Length(line);
(assuming length is an Integer). I agree with the other posters that that would be a bad idea.
A solution I developed to read a csv file into a record structure (actually an array of records structures) is
program read_file_into_array_of_records;
{$APPTYPE CONSOLE}
uses
SysUtils, StrUtils;
type
Tscore = record
name : string [25];
marks : integer;
end;
var
input_file: TextFile;
file_record : string[100];
score : array [0..3] of Tscore;
index : integer;
// function that returns all text up to a comma or the end of the line
function get_value() : string;
var
comma_pos: integer;
value: string[100];
begin
comma_pos := Pos(',', file_record);
// if comma found cut out all text up to it
if comma_pos <> 0 then
begin
value := leftstr(file_record, comma_pos - 1);
delete(file_record, 1, comma_pos);
end
else
begin
// no comma found so just take everything that remains
value := file_record;
end;
get_value := value;
end;
// procedure to fill one record by breaking up the comma separated values
procedure fill_record (index: integer);
begin
// call the function get_value as many times as needed to get
// each comma separated value
score[index].name := get_value();
score[index].marks := strtoint(get_value());
end;
// procedure to fill array with contents of csv file
procedure fill_array ();
begin
index := 0;
while not EoF(input_file) do
begin
readln(input_file, file_record);
fill_record (index);
index := index + 1;
end;
end;
// procedure to display contents of array
procedure display_array ();
begin
for index := 0 to 3 do
begin
writeln(score[index].name, ' got ', score[index].marks, ' marks' );
end;
readln;
end;
// main prog
begin
assignfile(input_file, 'scores.csv');
reset(input_file);
fill_array ();
closefile(input_file);
display_array();
end.
the contents of scores.csv:
james,31
jane,23
toby,34
ruth,40
Moreover, Pascal strings at 1, not 0. (copy() statement)
Related
I need to pull numbers from a string and put them into a list, there are some rules to this however such as identifying if the extracted number is a Integer or Float.
The task sounds simple enough but I am finding myself more and more confused as time goes by and could really do with some guidance.
Take the following test string as an example:
There are test values: P7 45.826.53.91.7, .5, 66.. 4 and 5.40.3.
The rules to follow when parsing the string are as follows:
numbers cannot be preceeded by a letter.
If it finds a number and is not followed by a decimal point then the number is as an Integer.
If it finds a number and is followed by a decimal point then the number is a float, eg 5.
~ If more numbers follow the decimal point then the number is still a float, eg 5.40
~ A further found decimal point should then break up the number, eg 5.40.3 becomes (5.40 Float) and (3 Float)
In the event of a letter for example following a decimal point, eg 3.H then still add 3. as a Float to the list (even if technically it is not valid)
Example 1
To make this a little more clearer, taking the test string quoted above the desired output should be as follows:
From the image above, light blue colour illustrates Float numbers, pale red illustrates single Integers (but note also how Floats joined together are split into seperate Floats).
45.826 (Float)
53.91 (Float)
7 (Integer)
5 (Integer)
66 . (Float)
4 (Integer)
5.40 (Float)
3 . (Float)
Note there are deliberate spaces between 66 . and 3 . above due to the way the numbers were formatted.
Example 2:
Anoth3r Te5.t string .4 abc 8.1Q 123.45.67.8.9
4 (Integer)
8.1 (Float)
123.45 (Float)
67.8 (Float)
9 (Integer)
To give a better idea, I created a new project whilst testing which looks like this:
Now onto the actual task. I thought maybe I could read each character from the string and identify what are valid numbers as per the rules above, and then pull them into a list.
To my ability, this was the best I could manage:
The code is as follows:
unit Unit1;
{$mode objfpc}{$H+}
interface
uses
Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls;
type
TForm1 = class(TForm)
btnParseString: TButton;
edtTestString: TEdit;
Label1: TLabel;
Label2: TLabel;
Label3: TLabel;
lstDesiredOutput: TListBox;
lstActualOutput: TListBox;
procedure btnParseStringClick(Sender: TObject);
private
FDone: Boolean;
FIdx: Integer;
procedure ParseString(const Str: string; var OutValue, OutKind: string);
public
{ public declarations }
end;
var
Form1: TForm1;
implementation
{$R *.lfm}
{ TForm1 }
procedure TForm1.ParseString(const Str: string; var OutValue, OutKind: string);
var
CH1, CH2: Char;
begin
Inc(FIdx);
CH1 := Str[FIdx];
case CH1 of
'0'..'9': // Found a number
begin
CH2 := Str[FIdx - 1];
if not (CH2 in ['A'..'Z']) then
begin
OutKind := 'Integer';
// Try to determine float...
//while (CH1 in ['0'..'9', '.']) do
//begin
// case Str[FIdx] of
// '.':
// begin
// CH2 := Str[FIdx + 1];
// if not (CH2 in ['0'..'9']) then
// begin
// OutKind := 'Float';
// //Inc(FIdx);
// end;
// end;
// end;
//end;
end;
OutValue := Str[FIdx];
end;
end;
FDone := FIdx = Length(Str);
end;
procedure TForm1.btnParseStringClick(Sender: TObject);
var
S, SKind: string;
begin
lstActualOutput.Items.Clear;
FDone := False;
FIdx := 0;
repeat
ParseString(edtTestString.Text, S, SKind);
if (S <> '') and (SKind <> '') then
begin
lstActualOutput.Items.Add(S + ' (' + SKind + ')');
end;
until
FDone = True;
end;
end.
It clearly doesn't give the desired output (failed code has been commented out) and my approach is likely wrong but I feel I only need to make a few changes here and there for a working solution.
At this point I have found myself rather confused and quite lost despite thinking the answer is quite close, the task is becoming increasingly infuriating and I would really appreciate some help.
EDIT 1
Here I got a little closer as there is no longer duplicate numbers but the result is still clearly wrong.
unit Unit1;
{$mode objfpc}{$H+}
interface
uses
Classes, SysUtils, FileUtil, Forms, Controls, Graphics, Dialogs, StdCtrls;
type
TForm1 = class(TForm)
btnParseString: TButton;
edtTestString: TEdit;
Label1: TLabel;
Label2: TLabel;
Label3: TLabel;
lstDesiredOutput: TListBox;
lstActualOutput: TListBox;
procedure btnParseStringClick(Sender: TObject);
private
FDone: Boolean;
FIdx: Integer;
procedure ParseString(const Str: string; var OutValue, OutKind: string);
public
{ public declarations }
end;
var
Form1: TForm1;
implementation
{$R *.lfm}
{ TForm1 }
// Prepare to pull hair out!
procedure TForm1.ParseString(const Str: string; var OutValue, OutKind: string);
var
CH1, CH2: Char;
begin
Inc(FIdx);
CH1 := Str[FIdx];
case CH1 of
'0'..'9': // Found the start of a new number
begin
CH1 := Str[FIdx];
// make sure previous character is not a letter
CH2 := Str[FIdx - 1];
if not (CH2 in ['A'..'Z']) then
begin
OutKind := 'Integer';
// Try to determine float...
//while (CH1 in ['0'..'9', '.']) do
//begin
// OutKind := 'Float';
// case Str[FIdx] of
// '.':
// begin
// CH2 := Str[FIdx + 1];
// if not (CH2 in ['0'..'9']) then
// begin
// OutKind := 'Float';
// Break;
// end;
// end;
// end;
// Inc(FIdx);
// CH1 := Str[FIdx];
//end;
end;
OutValue := Str[FIdx];
end;
end;
OutValue := Str[FIdx];
FDone := Str[FIdx] = #0;
end;
procedure TForm1.btnParseStringClick(Sender: TObject);
var
S, SKind: string;
begin
lstActualOutput.Items.Clear;
FDone := False;
FIdx := 0;
repeat
ParseString(edtTestString.Text, S, SKind);
if (S <> '') and (SKind <> '') then
begin
lstActualOutput.Items.Add(S + ' (' + SKind + ')');
end;
until
FDone = True;
end;
end.
My question is how can I extract numbers from a string, add them to a list and determine if the number is integer or float?
The left pale green listbox (desired output) shows what the results should be, the right pale blue listbox (actual output) shows what we actually got.
Please advise Thanks.
Note I re-added the Delphi tag as I do use XE7 so please don't remove it, although this particular problem is in Lazarus my eventual solution should work for both XE7 and Lazarus.
Your rules are rather complex, so you can try to build finite state machine (FSM, DFA -Deterministic finite automaton).
Every char causes transition between states.
For example, when you are in state "integer started" and meet space char, you yield integer value and FSM goes into state " anything wanted".
If you are in state "integer started" and meet '.', FSM goes into state "float or integer list started" and so on.
The answer is quite close, but there are several basic errors. To give you some hints (without writing your code for you): Within the while loop you MUST ALWAYS increment (the increment should not be where it is otherwise you get an infinite loop) and you MUST check that you have not reached the end of the string (otherwise you get an exception) and finally your while loop should not be dependant on CH1, because that never changes (again resulting in an infinite loop). But my best advice here is trace through you code with the debugger - that is what it is there for. Then your mistakes would become obvious.
You have got answers and comments that suggest using a state machine, and I support that fully. From the code you show in Edit1, I see that you still did not implement a state machine. From the comments I guess you don't know how to do that, so to push you in that direction here's one approach:
Define the states you need to work with:
type
TReadState = (ReadingIdle, ReadingText, ReadingInt, ReadingFloat);
// ReadingIdle, initial state or if no other state applies
// ReadingText, needed to deal with strings that includes digits (P7..)
// ReadingInt, state that collects the characters that form an integer
// ReadingFloat, state that collects characters that form a float
Then define the skeleton of your statemachine. To keep it as easy as possible I chose to use a straight forward procedural approach, with one main procedure and four subprocedures, one for each state.
procedure ParseString(const s: string; strings: TStrings);
var
ix: integer;
ch: Char;
len: integer;
str, // to collect characters which form a value
res: string; // holds a final value if not empty
State: TReadState;
// subprocedures, one for each state
procedure DoReadingIdle(ch: char; var str, res: string);
procedure DoReadingText(ch: char; var str, res: string);
procedure DoReadingInt(ch: char; var str, res: string);
procedure DoReadingFloat(ch: char; var str, res: string);
begin
State := ReadingIdle;
len := Length(s);
res := '';
str := '';
ix := 1;
repeat
ch := s[ix];
case State of
ReadingIdle: DoReadingIdle(ch, str, res);
ReadingText: DoReadingText(ch, str, res);
ReadingInt: DoReadingInt(ch, str, res);
ReadingFloat: DoReadingFloat(ch, str, res);
end;
if res <> '' then
begin
strings.Add(res);
res := '';
end;
inc(ix);
until ix > len;
// if State is either ReadingInt or ReadingFloat, the input string
// ended with a digit as final character of an integer, resp. float,
// and we have a pending value to add to the list
case State of
ReadingInt: strings.Add(str + ' (integer)');
ReadingFloat: strings.Add(str + ' (float)');
end;
end;
That is the skeleton. The main logic is in the four state procedures.
procedure DoReadingIdle(ch: char; var str, res: string);
begin
case ch of
'0'..'9': begin
str := ch;
State := ReadingInt;
end;
' ','.': begin
str := '';
// no state change
end
else begin
str := ch;
State := ReadingText;
end;
end;
end;
procedure DoReadingText(ch: char; var str, res: string);
begin
case ch of
' ','.': begin // terminates ReadingText state
str := '';
State := ReadingIdle;
end
else begin
str := str + ch;
// no state change
end;
end;
end;
procedure DoReadingInt(ch: char; var str, res: string);
begin
case ch of
'0'..'9': begin
str := str + ch;
end;
'.': begin // ok, seems we are reading a float
str := str + ch;
State := ReadingFloat; // change state
end;
' ',',': begin // end of int reading, set res
res := str + ' (integer)';
str := '';
State := ReadingIdle;
end;
end;
end;
procedure DoReadingFloat(ch: char; var str, res: string);
begin
case ch of
'0'..'9': begin
str := str + ch;
end;
' ','.',',': begin // end of float reading, set res
res := str + ' (float)';
str := '';
State := ReadingIdle;
end;
end;
end;
The state procedures should be self explaining. But just ask if something is unclear.
Both your test strings result in the values listed as you specified. One of your rules was a little bit ambiguous and my interpretation might be wrong.
numbers cannot be preceeded by a letter
The example you provided is "P7", and in your code you only checked the immediate previous character. But what if it would read "P71"? I interpreted it that "1" should be omitted just as the "7", even though the previous character of "1" is "7". This is the main reason for ReadingText state, which ends only on a space or period.
There are so many basic errors in your code I decided to correct your homework, as it were. This is still not a good way to do it, but at least the basic errors are removed. Take care to read the comments!
procedure TForm1.ParseString(const Str: string; var OutValue,
OutKind: string);
//var
// CH1, CH2: Char; <<<<<<<<<<<<<<<< Don't need these
begin
(*************************************************
* *
* This only corrects the 'silly' errors. It is *
* NOT being passed off as GOOD code! *
* *
*************************************************)
Inc(FIdx);
// CH1 := Str[FIdx]; <<<<<<<<<<<<<<<<<< Not needed but OK to use. I removed them because they seemed to cause confusion...
OutKind := 'None';
OutValue := '';
try
case Str[FIdx] of
'0'..'9': // Found the start of a new number
begin
// CH1 := Str[FIdx]; <<<<<<<<<<<<<<<<<<<< Not needed
// make sure previous character is not a letter
// >>>>>>>>>>> make sure we are not at beginning of file
if FIdx > 1 then
begin
//CH2 := Str[FIdx - 1];
if (Str[FIdx - 1] in ['A'..'Z', 'a'..'z']) then // <<<<< don't forget lower case!
begin
exit; // <<<<<<<<<<<<<<
end;
end;
// else we have a digit and it is not preceeded by a number, so must be at least integer
OutKind := 'Integer';
// <<<<<<<<<<<<<<<<<<<<< WHAT WE HAVE SO FAR >>>>>>>>>>>>>>
OutValue := Str[FIdx];
// <<<<<<<<<<<<< Carry on...
inc( FIdx );
// Try to determine float...
while (Fidx <= Length( Str )) and (Str[ FIdx ] in ['0'..'9', '.']) do // <<<<< not not CH1!
begin
OutValue := Outvalue + Str[FIdx]; //<<<<<<<<<<<<<<<<<<<<<< Note you were storing just 1 char. EVER!
//>>>>>>>>>>>>>>>>>>>>>>>>> OutKind := 'Float'; ***** NO! *****
case Str[FIdx] of
'.':
begin
OutKind := 'Float';
// now just copy any remaining integers - that is all rules ask for
inc( FIdx );
while (Fidx <= Length( Str )) and (Str[ FIdx ] in ['0'..'9']) do // <<<<< note '.' excluded here!
begin
OutValue := Outvalue + Str[FIdx];
inc( FIdx );
end;
exit;
end;
// >>>>>>>>>>>>>>>>>>> all the rest in unnecessary
//CH2 := Str[FIdx + 1];
// if not (CH2 in ['0'..'9']) then
// begin
// OutKind := 'Float';
// Break;
// end;
// end;
// end;
// Inc(FIdx);
// CH1 := Str[FIdx];
//end;
end;
inc( fIdx );
end;
end;
end;
// OutValue := Str[FIdx]; <<<<<<<<<<<<<<<<<<<<< NO! Only ever gives 1 char!
// FDone := Str[FIdx] = #0; <<<<<<<<<<<<<<<<<<< NO! #0 does NOT terminate Delphi strings
finally // <<<<<<<<<<<<<<< Try.. finally clause added to make sure FDone is always evaluated.
// <<<<<<<<<< Note there are better ways!
if FIdx > Length( Str ) then
begin
FDone := TRUE;
end;
end;
end;
Here's a solution using regex. I implemented it in Delphi (tested in 10.1, but should also work with XE8), I'm sure you can adopt it for lazarus, just not sure which regex libraries work over there.
The regex pattern uses alternation to match numbers as integers or floats following your rules:
Integer:
(\b\d+(?![.\d]))
started by a word boundary (so no letter, number or underscore before - if underscores are an issue you could use (?<![[:alnum:]]) instead)
then match one or more digits
that are neither followed by digit nor dot
Float:
(\b\d+(?:\.\d+)?)
started by a word boundary (so no letter, number or underscore before - if underscores are an issue you could use (?<![[:alnum:]]) instead)
then match one or more digits
optionally match dot followed by further digits
A simple console application looks like
program Test;
{$APPTYPE CONSOLE}
uses
System.SysUtils, RegularExpressions;
procedure ParseString(const Input: string);
var
Match: TMatch;
begin
WriteLn('---start---');
Match := TRegex.Match(Input, '(\b\d+(?![.\d]))|(\b\d+(?:\.\d+)?)');
while Match.Success do
begin
if Match.Groups[1].Value <> '' then
writeln(Match.Groups[1].Value + '(Integer)')
else
writeln(Match.Groups[2].Value + '(Float)');
Match := Match.NextMatch;
end;
WriteLn('---end---');
end;
begin
ParseString('There are test values: P7 45.826.53.91.7, .5, 66.. 4 and 5.40.3.');
ParseString('Anoth3r Te5.t string .4 abc 8.1Q 123.45.67.8.9');
ReadLn;
end.
procedure ReverseArray(var A : array of string);
var I,J,L : integer;
begin
for I := Low(A) to High(A) do
begin
L := length(A[I]);
for J := L downto 1 do M := M + A[I];
end;
writeln(M);
end;
begin
for I := 1 to 4 do readln(T[I]);
ReverseArray(T);
sleep(40000);
end.
What I'm trying to do here basically is reverse every string in the array but I'm unable to do it , what the code above do is basically repeat the words depends on their length (I write 'bob' in the array , the procedure will give me 'bob' three times because the length is 3) ... not sure why it's not working properly and what I'm missing
Delphi has a ReverseString() function in the StrUtils unit.
uses
StrUtils;
type
TStrArray = array of string;
procedure ReverseArray(var A : TStrArray);
var
I: integer;
begin
for I := Low(A) to High(A) do
A[I] := ReverseString(A[I]);
end;
var
T: TStrArray;
I: Integer
begin
SetLength(T, 4);
for I := 1 to 4 do Readln(T[I]);
ReverseArray(T);
...
end.
A string is an array of char with some extra bells and whistles added.
So an array of string is a lot like an array of array of char.
If you want to reverse the string, you'll have to access every char and reverse it.
procedure ReverseArray(var A : array of string);
var
i,j,Len : integer;
B: string;
begin
for i := Low(A) to High(A) do begin
Len := length(A[i]);
SetLength(B, Len); //Make B the same length as A[i].
//B[Len] = A[i][1]; B[Len-1]:= A[i][2] etc...
for j := Len downto 1 do B[j]:= A[i][(Len-J)+1];
//Store the reversed string back in the array.
A[i]:= B;
//Because A is a var parameter it will be returned.
//Writeln(B); //Write B for debugging purposes.
end;
end;
var
i: integer;
Strings: array [0..3] of string;
begin
for i := 0 to 3 do readln(Strings[i]);
ReverseArray(Strings);
for i := 0 to 3 do writeln(Strings[i]);
WriteLn('Done, press a key...');
ReadLn;
end.
Some tips:
Do not use global variables like M but declare a local variable instead.
Don't do AStr:= AStr + AChar in a loop, if you can avoid it. If you know how long the result is going to be use the SetLength trick as shown in the code. It's generates much faster code.
Instead of a Sleep you can use a ReadLn to halt a console app. It will continue as soon as you press a key.
Don't put the writeln in your working routine.
Note the first element in a string is 1, but the first element in a array is 0 (unless otherwise defined); Dynamic arrays always start counting from zero.
Note that array of string in a parameter definition is an open array; a different thing from a dynamic array.
Single uppercase identifiers like T, K, etc are usually used for generic types, you shouldn't use them for normal variables; Use a descriptive name instead.
Come on! 'bob' is one of those words you shouldn't try to test a reverse routine. But the problem goes beyond that.
Your problem is in here
for J := L downto 1 do
M := M + A[I];
You are trying to add the whole string to the M variable instead of the character you are trying to access. So, it should be
for J := L downto 1 do
M := M + A[I][J];
Also you need to set M := '' inside the first loop where it will have nothing when you start accumulating characters in to it.
Third, move the writing part, WriteLn(M), inside the first loop where you get a nice, separated outputs.
Putting together, it is going to be:
for I := Low(A) to High(A) do
begin
L := length(A[I]);
M := '';
for J := L downto 1 do
M := M + A[I][J];
writeln(M);
end;
My preferred solution for this is
type
TStringModifier = function(const s: string): string;
procedure ModifyEachOf( var aValues: array of string; aModifier: TStringModifier );
var
lIdx: Integer;
begin
for lIdx := Low(aValues) to High(aValues) do
aValues[lIdx] := aModifier( aValues[lIdx] );
end;
and it ends up with
var
MyStrings: array[1..3] of string;
begin
MyStrings[1] := '123';
MyStrings[2] := '456';
MyStrings[3] := '789';
ModifyEachOf( MyStrings, SysUtils.ReverseString );
end;
uses
System.SysUtils, System.StrUtils;
var
Forwards, backwards : string;
begin
forwards:= 'abcd';
backwards:= ReverseString(forwards);
Writeln(backwards);
Readln;
end;
// dcba
I have an app that needs to do heavy text manipulation in a TStringList. Basically i need to split text by a delimiter ; for instance, if i have a singe line with 1000 chars and this delimiter occurs 3 times in this line, then i need to split it in 3 lines. The delimiter can contain more than one char, it can be a tag like '[test]' for example.
I've wrote two functions to do this task with 2 different approaches, but both are slow in big amounts of text (more then 2mbytes usually).
How can i achieve this goal in a faster way ?
Here are both functions, both receive 2 paramaters : 'lines' which is the original tstringlist and 'q' which is the delimiter.
function splitlines(lines : tstringlist; q: string) : integer;
var
s, aux, ant : string;
i,j : integer;
flag : boolean;
m2 : tstringlist;
begin
try
m2 := tstringlist.create;
m2.BeginUpdate;
result := 0;
for i := 0 to lines.count-1 do
begin
s := lines[i];
for j := 1 to length(s) do
begin
flag := lowercase(copy(s,j,length(q))) = lowercase(q);
if flag then
begin
inc(result);
m2.add(aux);
aux := s[j];
end
else
aux := aux + s[j];
end;
m2.add(aux);
aux := '';
end;
m2.EndUpdate;
lines.text := m2.text;
finally
m2.free;
end;
end;
function splitLines2(lines : tstringlist; q: string) : integer;
var
aux, p : string;
i : integer;
flag : boolean;
begin
//maux1 and maux2 are already instanced in the parent class
try
maux2.text := lines.text;
p := '';
i := 0;
flag := false;
maux1.BeginUpdate;
maux2.BeginUpdate;
while (pos(lowercase(q),lowercase(maux2.text)) > 0) and (i < 5000) do
begin
flag := true;
aux := p+copy(maux2.text,1,pos(lowercase(q),lowercase(maux2.text))-1);
maux1.add(aux);
maux2.text := copy(maux2.text,pos(lowercase(q),lowercase(maux2.text)),length(maux2.text));
p := copy(maux2.text,1,1);
maux2.text := copy(maux2.text,2,length(maux2.text));
inc(i);
end;
finally
result := i;
maux1.EndUpdate;
maux2.EndUpdate;
if flag then
begin
maux1.add(p+maux2.text);
lines.text := maux1.text;
end;
end;
end;
I've not tested the speed, but for academic purposes, here's an easy way to split the strings:
myStringList.Text :=
StringReplace(myStringList.Text, myDelimiter, #13#10, [rfReplaceAll]);
// Use [rfReplaceAll, rfIgnoreCase] if you want to ignore case
When you set the Text property of TStringList, it parses on new lines and splits there, so converting to a string, replacing the delimiter with new lines, then assigning it back to the Text property works.
The problems with your code (at least second approach) are
You are constantly using lowecase which is slow if called so many times
If I saw correctly you are copying the whole remaining text back to the original source. This is sure to be extra slow for large strings (eg files)
I have a tokenizer in my library. Its not the fastest or best but it should do (you can get it from Cromis Library, just use the units Cromis.StringUtils and Cromis.Unicode):
type
TTokens = array of ustring;
TTextTokenizer = class
private
FTokens: TTokens;
FDelimiters: array of ustring;
public
constructor Create;
procedure Tokenize(const Text: ustring);
procedure AddDelimiters(const Delimiters: array of ustring);
property Tokens: TTokens read FTokens;
end;
{ TTextTokenizer }
procedure TTextTokenizer.AddDelimiters(const Delimiters: array of ustring);
var
I: Integer;
begin
if Length(Delimiters) > 0 then
begin
SetLength(FDelimiters, Length(Delimiters));
for I := 0 to Length(Delimiters) - 1 do
FDelimiters[I] := Delimiters[I];
end;
end;
constructor TTextTokenizer.Create;
begin
SetLength(FTokens, 0);
SetLength(FDelimiters, 0);
end;
procedure TTextTokenizer.Tokenize(const Text: ustring);
var
I, K: Integer;
Counter: Integer;
NewToken: ustring;
Position: Integer;
CurrToken: ustring;
begin
SetLength(FTokens, 100);
CurrToken := '';
Counter := 0;
for I := 1 to Length(Text) do
begin
CurrToken := CurrToken + Text[I];
for K := 0 to Length(FDelimiters) - 1 do
begin
Position := Pos(FDelimiters[K], CurrToken);
if Position > 0 then
begin
NewToken := Copy(CurrToken, 1, Position - 1);
if NewToken <> '' then
begin
if Counter > Length(FTokens) then
SetLength(FTokens, Length(FTokens) * 2);
FTokens[Counter] := Trim(NewToken);
Inc(Counter)
end;
CurrToken := '';
end;
end;
end;
if CurrToken <> '' then
begin
if Counter > Length(FTokens) then
SetLength(FTokens, Length(FTokens) * 2);
FTokens[Counter] := Trim(CurrToken);
Inc(Counter)
end;
SetLength(FTokens, Counter);
end;
How about just using StrTokens from the JCL library
procedure StrTokens(const S: string; const List: TStrings);
It's open source
http://sourceforge.net/projects/jcl/
As an additional option, you can use regular expressions. Recent versions of Delphi (XE4 and XE5) come with built in regular expression support; older versions can find a free regex library download (zip file) at Regular-Expressions.info.
For the built-in regex support (uses the generic TArray<string>):
var
RegexObj: TRegEx;
SplitArray: TArray<string>;
begin
SplitArray := nil;
try
RegexObj := TRegEx.Create('\[test\]'); // Your sample expression. Replace with q
SplitArray := RegexObj.Split(Lines, 0);
except
on E: ERegularExpressionError do begin
// Syntax error in the regular expression
end;
end;
// Use SplitArray
end;
For using TPerlRegEx in earlier Delphi versions:
var
Regex: TPerlRegEx;
m2: TStringList;
begin
m2 := TStringList.Create;
try
Regex := TPerlRegEx.Create;
try
Regex.RegEx := '\[test\]'; // Using your sample expression - replace with q
Regex.Options := [];
Regex.State := [preNotEmpty];
Regex.Subject := Lines.Text;
Regex.SplitCapture(m2, 0);
finally
Regex.Free;
end;
// Work with m2
finally
m2.Free;
end;
end;
(For those unaware, the \ in the sample expression used are because the [] characters are meaningful in regular expressions and need to be escaped to be used in the regular expression text. Typically, they're not required in the text.)
I have written a Delphi function that loads data from a .dat file into a string list. It then decodes the string list and assigns to a string variable. The contents of the string use the '#' symbol as a separator.
How can I then take the contents of this string and then assign its contents to local variables?
// Function loads data from a dat file and assigns to a String List.
function TfrmMain.LoadFromFile;
var
index, Count : integer;
profileFile, DecodedString : string;
begin
// Open a file and assign to a local variable.
OpenDialog1.Execute;
profileFile := OpenDialog1.FileName;
if profileFile = '' then
exit;
profileList := TStringList.Create;
profileList.LoadFromFile(profileFile);
for index := 0 to profileList.Count - 1 do
begin
Line := '';
Line := profileList[Index];
end;
end;
After its been decoded the var "Line" contains something that looks like this:
example:
Line '23#80#10#2#1#...255#'.
Not all of the values between the separators are the same length and the value of "Line" will vary each time the function LoadFromFile is called (e.g. sometimes a value may have only one number the next two or three etc so I cannot rely on the Copy function for strings or arrays).
I'm trying to figure out a way of looping through the contents of "Line", assigning it to a local variable called "buffer" and then if it encounters a '#' it then assigns the value of buffer to a local variable, re-initialises buffer to ''; and then moves onto the next value in "Line" repeating the process for the next parameter ignoring the '#' each time.
I think I have been scratching around with this problem for too long now and I cannot seem to make any progress and need a break from it. If anyone would care to have a look, I would welcome any suggestions on how this might be achieved.
Many Thanks
KD
You need a second TStringList:
lineLst := TStringList.Create;
try
lineLst.Delimiter := '#';
lineLst.DelimitedText := Line;
...
finally
lineLst.Free;
end;
Depending on your Delphi version you can set lineLst.StrictDelimiter := true in case the line contains spaces.
You can do something like this:
program Project1;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils, StrUtils;
var
S : string;
D : string;
begin
S := '23#80#10#2#1#...255#';
for D in SplitString(S,'#') do //SplitString is in the StrUtils unit
writeln(D);
readln;
end.
You did not tag your Delphi version, so i don't know if it applies or not.
That IS version-specific. Please do!
In order of my personal preference:
1: Download Jedi CodeLib - http://jcl.sf.net. Then use TJclStringList. It has very nice split method. After that you would only have to iterate through.
function Split(const AText, ASeparator: string; AClearBeforeAdd: Boolean = True): IJclStringList;
uses JclStringLists;
...
var s: string; js: IJclStringList.
begin
...
js := TJclStringList.Create().Split(input, '#', True);
for s in js do begin
.....
end;
...
end;
2: Delphi now has somewhat less featured StringSplit routine. http://docwiki.embarcadero.com/Libraries/en/System.StrUtils.SplitString
It has a misfeature that array of string type may be not assignment-compatible to itself. Hello, 1949 Pascal rules...
uses StrUtils;
...
var s: string;
a_s: TStringDynArray;
(* aka array-of-string aka TArray<string>. But you have to remember this term exactly*)
begin
...
a_s := SplitString(input, '#');
for s in a_s do begin
.....
end;
...
end;
3: Use TStringList. The main problem with it is that it was designed that spaces or new lines are built-in separators. In newer Delphi that can be suppressed. Overall the code should be tailored to your exact Delphi version. You can easily Google for something like "Using TStringlist for splitting string" and get a load of examples (like #Uwe's one).
But you may forget to suppress here or there. And you may be on old Delphi,, where that can not be done. And you may mis-apply example for different Delphi version. And... it is just boring :-) Though you can make your own function to generate such pre-tuned stringlists for you and carefully check Delphi version in it :-) But then You would have to carefully free that object after use.
I use a function I've written called Fetch. I think I stole the idea from the Indy library some time ago:
function Fetch(var VString: string; ASeperator: string = ','): string;
var LPos: integer;
begin
LPos := AnsiPos(ASeperator, VString);
if LPos > 0 then
begin
result := Trim(Copy(VString, 1, LPos - 1));
VString := Copy(VString, LPos + 1, MAXINT);
end
else
begin
result := VString;
VString := '';
end;
end;
Then I'd call it like this:
var
value: string;
line: string;
profileFile: string;
profileList: TStringList;
index: integer;
begin
if OpenDialog1.Execute then
begin
profileFile := OpenDialog1.FileName;
if (profileFile = '') or not FileExists(profileFile) then
exit;
profileList := TStringList.Create;
try
profileList.LoadFromFile(profileFile);
for index := 0 to profileList.Count - 1 do
begin
line := profileList[index];
Fetch(line, ''''); //discard "Line '"
value := Fetch(line, '#')
while (value <> '') and (value[1] <> '''') do //bail when we get to the quote at the end
begin
ProcessTheNumber(value); //do whatever you need to do with the number
value := Fetch(line, '#');
end;
end;
finally
profileList.Free;
end;
end;
end;
Note: this was typed into the browser, so I haven't checked it works.
I have an attribute called HistoryText in a object that is stored as a string.
I want to show all rows in a grid. I should be able to delete and edit rows in the grid.
The format is:
16.5.2003-$-12:09-$-anna-$-Organization created
2.6.2005-$-13:03-$-jimmy-$-Organization edited
19.12.2005-$-13:33-$-madeleine-$-Organization edited
So each row have 4 fields, date, time, user, and message with a delimiter string as '-$-'.
As the delimiter a string and not a char it cannot be assigned to the stringlists delimiter property.
I have a routine to extract the string to a Stringlist:
procedure ParseDelimited(const aStringList: TStringList; const aOrgList, aDelimiter: string);
var
vDelimiterPos : integer;
vPartialStr : string;
vRemaingTxt : string;
vDelimiterLength : integer;
begin
vDelimiterLength := Length(aDelimiter);
if (AnsiRightStr(aOrgList, Length(aDelimiter)) = aDelimiter) then
vRemaingTxt := aOrgList
else
vRemaingTxt := aOrgList + aDelimiter;
aStringList.BeginUpdate;
aStringList.Clear;
try
while Length(vRemaingTxt) > 0 do
begin
vDelimiterPos := Pos(aDelimiter, vRemaingTxt);
vPartialStr := Copy(vRemaingTxt,0,vDelimiterPos-1);
aStringList.Add(vPartialStr);
vRemaingTxt := Copy(vRemaingTxt,vDelimiterPos+vDelimiterLength,MaxInt);
end;
finally
aStringList.EndUpdate;
end;
end;
and it seems to work fine. My problem is syncing the changes in the StringList back to the original String property ? There are so much historical data with this delimiter so I don't think change it to a TChar is a realistic option.
Update:
A clarification. I think I can manage to convert the String to a StringList with the method above. Then display it in the grid should not be so hard. The problem come when I want to convert the TStringList back to the original String property wih '-$-' as delimiter. I cannot do HistoryText := myStringList.Delimitedtext for example.
Second update:
I have solved it. You all got a +1 for fast answers and really trying to help. In summary how I did it.
Read from Historytext:
MyStringList.Text := Historytext;
Now each row have 3 delimiters of '-$-' and each line is separated by a linefeed as usual.
In a loop parse the Stringlist and show it in the grid. I don't bother about MyStringList anymore.
Let the user delete and edit rows in the grid.
When finished loop by row and columns in the grid and build a new string with the same format as original.
Assign that string to HistoryText.
So shift focus from StringList to the grid made it easier :)
Instead of Delimiter (a char) and DelimitedText, you can also use LineBreak (a string) and Text:
lst := TStringList.Create;
try
lst.LineBreak := '-$-';
lst.Text := '16.5.2003-$-12:09-$-anna-$-Organization created';
Memo1.Lines := lst; // or whatever you do with it
finally
lst.Free;
end;
Ans it works even the other way round.
Wild stab in the dark (it isn't very clear what you are asking):
Work through the grid rows:
For each row:
assign an empty string to a temporary string var
For each column add row/column value plus your delimiter to temporary string var
remove last delimiter from temporary string var (if it is non-empty)
add temporary string var to stringlist
Write stringlist's text property back to your HistoryText
const
Delimiter = '-$-';
var
row: Integer;
col: Integer;
SL: TStringList;
rowString: string;
begin
SL := TStringList.Create;
try
for row := 0 to StringGrid1.RowCount - 1 do begin
rowString := '';
for col := 0 to StringGrid1.ColCount - 1 do begin
rowString := StringGrid1.Cells[col, row] + Delimiter;
end;
if rowString <> '' then begin
rowString := Copy(rowString, 1, Length(rowString) - Length(Delimiter));
end;
SL.Add(rowString);
end;
HistoryText := SL.Text;
finally
SL.Free;
end;
end;
Using Uwe's solution of TStrings' LineBreak property:
var
row: Integer;
col: Integer;
SLRows: TStringList;
SLCols: TStringlist;
begin
SLRows := TStringList.Create;
try
SLCols := TStringList.Create;
try
SLCols.LineBreak := '-$-';
for row := 0 to StringGrid1.RowCount - 1 do begin
SLCols.Clear;
for col := 0 to StringGrid1.ColCount - 1 do begin
SLCols.Add(StringGrid1.Cells[col, row]);
end;
SLRows.Add(SLCols.Text);
end;
HistoryText := SLRows.Text;
finally
SLCols.Free;
end;
finally
SLRows.Free;
end;
end;