how do i extract first string from specific text

how do i extract first string from specific text - delphi

i need to extract string from a text as following example
Hi i have no name <z>empty</z>
i wanted to extract only text before <z> into array or string which is hi i have no name
i tried this function
procedure Split (const Delimiter: Char; Input: string; const Strings: TStrings);
begin
Assert(Assigned(Strings)) ;
Strings.Clear;
Strings.StrictDelimiter := true;
Strings.Delimiter := Delimiter;
Strings.DelimitedText := Input;
end;
but its only can split chars like ;,: etc.. i wanted to start split with this specific string <z>

As I read what you have written, you have a string and you want to ignore all text after the first occurrence of <z>. Use Pos and Copy for instance:
P := Pos('<z>', input);
if P = 0 then
output := input
else
output := Copy(input, 1, P-1);
Although something tells me that you really want an XML parser.

Just an example using the string helper routines:
function CutString(const input,pattern: String): String;
var
p: Integer;
begin
p := input.IndexOf(pattern);
if (p >= 0) then
Result := input.Substring(0,p)
else
Result := input;
end;
var
s: string;
begin
s := 'Hi i have no name<z>empty</z>';
s := CutString(s,'<z>');
WriteLn(s); // Outputs: 'Hi i have no name'
end.

Here is a solution for the one-liner-fan, the crack-nuts-with-sledgehammer-lover, the regex-enthusiast, the waste-performance-ignorant and the xml-parser-grump:
TRegEx.Replace('Hi i have no name <z>empty</z>', '((^.*)(?=<z>)|(^.*)(?!<z>)).*', '$1');

If the delimiter will be there for sure, you can do:
Result:= Copy(S, 1, Pos(Delimiter, S) - 1);
Otherwise user Davids answer or this (shorter but bad performance):
Result:= Copy(S, 1, Pos(Delimiter, S + Delimiter) - 1);

Related

Function to increment filename

I'm trying to make a function that can increment a filename.
If last char of the string is a number then increment it.
If last char is a letter then add _1 or _2 or _3(increment this also).
I have to be sure the filename is unique but i cannot use datetime inside filename because all the filenames must be <32 chars without extension.
EX: Apple_99.txt =>Apple_100
Ex: Apple_173 => Apple_174
EX: This_is_my_first_text.txt => This_is_my_first_text_1.txt
Ex: This_is_my_first_text_9.txt => This_is_my_first_text_10.txt
I need to use this in order to rename a file an then upload it to a ftp server.
I've found a function that can do something like this but it only works if the filename contains only uppercase.How can I modify this function in order to access lowercase an uppercase string?
Here is the function:
function IncStr(Str: String; Amount: Integer; Index: Integer = -1): String;
const
MIN_VAL = 65; // 'A'
MAX_VAL = 90; // 'Z'
var
Digit, ToAdd, ToCarry: Integer;
begin
if (Index = 0) and (Amount > 0) then
begin
Result := Char(MIN_VAL + Amount - 1) + Str;
Exit;
end;
if Index = -1 then Index := Length(Str);
ToCarry := 0;
Digit := Ord(Str[Index]);
while not (Digit in [MIN_VAL..MAX_VAL]) do
begin
Dec(Index);
Digit := Ord(Str[Index]);
end;
ToAdd := Digit + Amount;
while (ToAdd > MAX_VAL) do
begin
Dec(ToAdd, 26);
Inc(ToCarry);
end;
Result := Str;
Result[Index] := Char(ToAdd);
if (ToCarry > 0) then
Result := IncStr(Result, ToCarry, Index - 1);
end;
procedure TForm1.Button1Click(Sender: TObject);
var
S: String; // holds string to increment
C: Integer; // amount to increment by
begin
// make sure that Edit1 starts with a valid character
// i.e. 'A' to 'Z'
S := Edit1.Text;
C := StrtoIntDef(Edit2.Text, 0);
// test it, place result in Edit3
Edit3.Text := IncStr(S, C);
{
Example data:
Edit1 := AAZ
Edit2 := 2
= Edit3 := ABB
Edit1 := BZY
Edit2 := 3
= Edit3 := CAB
Edit1 := ZZZ
Edit2 := 1
= Edit3 := AAAA
Edit1 := AA-AC
Edit2 := 3
= Edit3 := AA-AF
Edit1 := AA/Z
Edit2 := 5
= Edit3 := AB/E
... etc
Here's one to try too :-)
Edit1 := ZZZ
Edit2 := 264172
}
end;
Thank you!

Like so many programming problems, the key is to break the problem down into small pieces. First of all, let's write a function to decode the original file name into its constituent parts:
procedure DecodeFileName(const Input: string; out Stem, Ext: string; out Number: Integer);
var
P: Integer;
begin
Ext := TPath.GetExtension(Input);
Stem := TPath.GetFileNameWithoutExtension(Input);
Number := 0;
P := Stem.LastIndexOf('_');
if P = -1 then begin
exit;
end;
if TryStrToInt(Stem.Substring(P+1), Number) then begin
Stem := Stem.Substring(0, P);
end;
end;
The following demonstrates how this works:
DecodeFileName('test.txt', Stem, Ext, Number);
Writeln(Stem, ', ', Number, ', ', Ext);
DecodeFileName('test_dd.txt', Stem, Ext, Number);
Writeln(Stem, ', ', Number, ', ', Ext);
DecodeFileName('test_23.txt', Stem, Ext, Number);
Writeln(Stem, ', ', Number, ', ', Ext);
The output is:
test, 0, .txt
test_dd, 0, .txt
test, 23, .txt
So now you can make a new filename like this:
function IncrementedFileName(const FileName: string): string;
var
Stem, Ext: string;
Number: Integer;
begin
DecodeFileName(FileName, Stem, Ext, Number);
Result := Format('%s_%d%s', [Stem, Number+1, Ext]);
end;
And then we can see how that performs:
Writeln(IncrementedFileName('test.txt'));
Writeln(IncrementedFileName('test_dd.txt'));
Writeln(IncrementedFileName('test_23.txt'));
Writeln(IncrementedFileName('test_28'));
The output is:
test_1.txt
test_dd_1.txt
test_24.txt
test_29
If you don't have access to the string helper methods then you can code it like this:
procedure DecodeFileName(const Input: string; out Stem, Ext: string; out Number: Integer);
var
P: Integer;
begin
Ext := TPath.GetExtension(Input);
Stem := TPath.GetFileNameWithoutExtension(Input);
Number := 0;
P := LastDelimiter('_', Stem);
if P = 0 then begin
exit;
end;
if TryStrToInt(Copy(Stem, P+1, MaxInt), Number) then begin
Stem := Copy(Stem, 1, P-1);
end;
end;
I have not executed this final function, so do not be surprised if it has errors.

Faster way to split text in Delphi TStringList

I have an app that needs to do heavy text manipulation in a TStringList. Basically i need to split text by a delimiter ; for instance, if i have a singe line with 1000 chars and this delimiter occurs 3 times in this line, then i need to split it in 3 lines. The delimiter can contain more than one char, it can be a tag like '[test]' for example.
I've wrote two functions to do this task with 2 different approaches, but both are slow in big amounts of text (more then 2mbytes usually).
How can i achieve this goal in a faster way ?
Here are both functions, both receive 2 paramaters : 'lines' which is the original tstringlist and 'q' which is the delimiter.
function splitlines(lines : tstringlist; q: string) : integer;
var
s, aux, ant : string;
i,j : integer;
flag : boolean;
m2 : tstringlist;
begin
try
m2 := tstringlist.create;
m2.BeginUpdate;
result := 0;
for i := 0 to lines.count-1 do
begin
s := lines[i];
for j := 1 to length(s) do
begin
flag := lowercase(copy(s,j,length(q))) = lowercase(q);
if flag then
begin
inc(result);
m2.add(aux);
aux := s[j];
end
else
aux := aux + s[j];
end;
m2.add(aux);
aux := '';
end;
m2.EndUpdate;
lines.text := m2.text;
finally
m2.free;
end;
end;
function splitLines2(lines : tstringlist; q: string) : integer;
var
aux, p : string;
i : integer;
flag : boolean;
begin
//maux1 and maux2 are already instanced in the parent class
try
maux2.text := lines.text;
p := '';
i := 0;
flag := false;
maux1.BeginUpdate;
maux2.BeginUpdate;
while (pos(lowercase(q),lowercase(maux2.text)) > 0) and (i < 5000) do
begin
flag := true;
aux := p+copy(maux2.text,1,pos(lowercase(q),lowercase(maux2.text))-1);
maux1.add(aux);
maux2.text := copy(maux2.text,pos(lowercase(q),lowercase(maux2.text)),length(maux2.text));
p := copy(maux2.text,1,1);
maux2.text := copy(maux2.text,2,length(maux2.text));
inc(i);
end;
finally
result := i;
maux1.EndUpdate;
maux2.EndUpdate;
if flag then
begin
maux1.add(p+maux2.text);
lines.text := maux1.text;
end;
end;
end;

I've not tested the speed, but for academic purposes, here's an easy way to split the strings:
myStringList.Text :=
StringReplace(myStringList.Text, myDelimiter, #13#10, [rfReplaceAll]);
// Use [rfReplaceAll, rfIgnoreCase] if you want to ignore case
When you set the Text property of TStringList, it parses on new lines and splits there, so converting to a string, replacing the delimiter with new lines, then assigning it back to the Text property works.

The problems with your code (at least second approach) are
You are constantly using lowecase which is slow if called so many times
If I saw correctly you are copying the whole remaining text back to the original source. This is sure to be extra slow for large strings (eg files)
I have a tokenizer in my library. Its not the fastest or best but it should do (you can get it from Cromis Library, just use the units Cromis.StringUtils and Cromis.Unicode):
type
TTokens = array of ustring;
TTextTokenizer = class
private
FTokens: TTokens;
FDelimiters: array of ustring;
public
constructor Create;
procedure Tokenize(const Text: ustring);
procedure AddDelimiters(const Delimiters: array of ustring);
property Tokens: TTokens read FTokens;
end;
{ TTextTokenizer }
procedure TTextTokenizer.AddDelimiters(const Delimiters: array of ustring);
var
I: Integer;
begin
if Length(Delimiters) > 0 then
begin
SetLength(FDelimiters, Length(Delimiters));
for I := 0 to Length(Delimiters) - 1 do
FDelimiters[I] := Delimiters[I];
end;
end;
constructor TTextTokenizer.Create;
begin
SetLength(FTokens, 0);
SetLength(FDelimiters, 0);
end;
procedure TTextTokenizer.Tokenize(const Text: ustring);
var
I, K: Integer;
Counter: Integer;
NewToken: ustring;
Position: Integer;
CurrToken: ustring;
begin
SetLength(FTokens, 100);
CurrToken := '';
Counter := 0;
for I := 1 to Length(Text) do
begin
CurrToken := CurrToken + Text[I];
for K := 0 to Length(FDelimiters) - 1 do
begin
Position := Pos(FDelimiters[K], CurrToken);
if Position > 0 then
begin
NewToken := Copy(CurrToken, 1, Position - 1);
if NewToken <> '' then
begin
if Counter > Length(FTokens) then
SetLength(FTokens, Length(FTokens) * 2);
FTokens[Counter] := Trim(NewToken);
Inc(Counter)
end;
CurrToken := '';
end;
end;
end;
if CurrToken <> '' then
begin
if Counter > Length(FTokens) then
SetLength(FTokens, Length(FTokens) * 2);
FTokens[Counter] := Trim(CurrToken);
Inc(Counter)
end;
SetLength(FTokens, Counter);
end;

How about just using StrTokens from the JCL library
procedure StrTokens(const S: string; const List: TStrings);
It's open source
http://sourceforge.net/projects/jcl/

As an additional option, you can use regular expressions. Recent versions of Delphi (XE4 and XE5) come with built in regular expression support; older versions can find a free regex library download (zip file) at Regular-Expressions.info.
For the built-in regex support (uses the generic TArray<string>):
var
RegexObj: TRegEx;
SplitArray: TArray<string>;
begin
SplitArray := nil;
try
RegexObj := TRegEx.Create('\[test\]'); // Your sample expression. Replace with q
SplitArray := RegexObj.Split(Lines, 0);
except
on E: ERegularExpressionError do begin
// Syntax error in the regular expression
end;
end;
// Use SplitArray
end;
For using TPerlRegEx in earlier Delphi versions:
var
Regex: TPerlRegEx;
m2: TStringList;
begin
m2 := TStringList.Create;
try
Regex := TPerlRegEx.Create;
try
Regex.RegEx := '\[test\]'; // Using your sample expression - replace with q
Regex.Options := [];
Regex.State := [preNotEmpty];
Regex.Subject := Lines.Text;
Regex.SplitCapture(m2, 0);
finally
Regex.Free;
end;
// Work with m2
finally
m2.Free;
end;
end;
(For those unaware, the \ in the sample expression used are because the [] characters are meaningful in regular expressions and need to be escaped to be used in the regular expression text. Typically, they're not required in the text.)

Reverse FloatToStrF Result

Is there a function to reverse FloatToStrF? By that I mean go back from currency money format to string format. For example:
Edit1.Text := FloatToStrF(10000, ffCurrency, 15, 4);
The result
Edit1.Text = '$10,000.0000'
I'm wondering if there is somthing like StrToFloatF so
Edit1.Text = '10000';
thank's

How about:
function RemoveAnythingButNumbers(aString: string): string;
var
C: Char;
begin
Result := '';
for C in aString do
begin
if C in ['0'..'9'] then Result := Result + C;
end;
end;
Note that I corrected the set of numbers.

How to detect if a string is Base64Encoded or not?

Which is the best method to detect if a string is Base64Encoded or not (using Delphi)?

Best you can do is try to decode it. If the decode fails then the input was not base64 encoded. It the string successfully decodes then the input might have been base64 encoded.

You can check if the string only contains Base64 valids chars
function StringIsBase64(const InputString : String ) : Boolean;
const
Base64Chars: Set of AnsiChar = ['A'..'Z','a'..'z','0'..'9','+','/','='];
var
i : integer;
begin
Result:=True;
for i:=1 to Length(InputString) do
{$IFDEF UNICODE}
if not CharInSet(InputString[i],Base64Chars) then
{$ELSE}
if not (InputString[i] in Base64Chars) then
{$ENDIF}
begin
Result:=False;
break;
end;
end;
The = char is used for padding so you can add an aditional valiation to the function for padded base64 strings checking if the length of the string is mod 4

In addition to RRUZ answer you can also check the length of the string (is it a multiple of 4).
function IsValidBase64(const aValue: string): Boolean;
var
i: Integer;
lValidChars: set of Char;
begin
Result := aValue <> '';
lValidChars := ['a'..'z', 'A'..'Z', '0'..'9', '/', '+'];
//length of string should be multiple of 4
if Length(aValue) mod 4 > 0 then
Result := False
else
for i := 1 to Length(aValue) do
begin
if aValue[i] = '=' then
begin
if i < Length(aValue) - 1 then
begin
Result := False;
Exit;
end
else
lValidChars := ['='];
end
else if not (aValue[i] in lValidChars) then
begin
Result := False;
Break;
end;
end;
end;
Please note that this code is Delphi 7 code and not adjusted for Unicode use.

As was already told here, there is no reliable verification if a certain string is Base64 encoded or not, so even when you consider the input as a valid Base64 encoded string, it doesn't mean the string is actually encoded that way. I'm posting here just another version of a validation function, which according to RFC 4648 verifies:
if the input string is not empty and its length is multiple of 4
if the input string contains at most two padding characters and only at the end of the string
if the input string contains only characters from the Base64 alphabet (see the Page 5, Table 1)
function IsValidBase64EncodedString(const AValue: string): Boolean;
const
Base64Alphabet = ['A'..'Z', 'a'..'z', '0'..'9', '+', '/'];
var
I: Integer;
ValLen: Integer;
begin
ValLen := Length(AValue);
Result := (ValLen > 0) and (ValLen mod 4 = 0);
if Result then
begin
while (AValue[ValLen] = '=') and (ValLen > Length(AValue) - 2) do
Dec(ValLen);
for I := ValLen downto 1 do
if not (AValue[I] in Base64Alphabet) then
begin
Result := False;
Break;
end;
end;
end;

DELPHI STRING: Pull a last name from a full name

I am trying to manipulate a string and pull only certain data from it. I need to do this on a record pulled from a database that gives me the full name of a person. I need to pull only the last name from the string and store it as a variable. Is there a way that I can do this?
Example: SQL query pulls the full field "Mary Ellen Jones" I need to extract only the Jones from the string so I can store it in a variable for further processing.
I thought maybe AnsiRightStr would work but the problem is needing to give it a set integer to pull from the right. Maybe a way to count the characters after the final space allowing me to use AnsiRightStr(string,int) for this? Any help at all is appreciated.
Additional thought: Would replacing the spaces with a delimiter say :: and then parsing that data into a Stringlist followed by allowing me to pull the last index of the string list be possible?
Several valid options have been presented so far. None of them address the situation if say the name is Something like "John St. James, Jr." Is this impossible?

you can use the LastDelimiter function to get the last space position and then with the copy function extract the substring.
uses
SysUtils;
var
Name : string;
p : Integer;
ShortName : string;
begin
Name:='Mary Ellen Jones';
//You can call trim to avoid problems with ending spaces in this case is not necesary, just is a test
//Name:=Trim(Name);
//get the last space position
p:=LastDelimiter(' ',Name);
//get the name
ShortName:=Copy(Name,p+1,length(Name)-p);
end;
or using a function
function GetLast(const Name:string) : string;
var
p : Integer;
begin
Result:=Trim(Name);
p:=LastDelimiter(' ',Result);
Result:=Copy(Result,p+1,length(Result)-p);
end;

function GetLastWord(const Str: string): string;
var
p: integer;
i: Integer;
const
SPACE = #$20;
begin
p := 1;
for i := length(Str) downto 1 do
if Str[i] = SPACE then
begin
p := i + 1;
break;
end;
result := Copy(Str, p, MaxInt);
end;
This will fail if the string ends with (an accidental) space, as 'Andreas Rejbrand '. This more robust version will handle this case too:
function GetLastWord(const Str: string): string;
var
p: integer;
i: Integer;
FoundNonSpace: boolean;
const
SPACE = #$20;
begin
p := 1;
FoundNonSpace := false;
for i := length(Str) downto 1 do
if (Str[i] = SPACE) and FoundNonSpace then
begin
p := i + 1;
break
end
else if Str[i] <> SPACE then
FoundNonSpace := true;
result := TrimRight(Copy(Str, p, MaxInt));
end;

What if the last name is say "St. James" any way to account for that?
Here's my approach.
Make a list of lastname-markers
Search that list in order of preference
As soon as a match is found, mark that as the start of last name
Return substring starting from that pos.
var
LastNameMarkers: TStringList = nil;
SuffixFix: TStringList = nil;
procedure InitLists;
begin
LastNameMarkers:= TStringList.Create;
//LastNameMarkers.LoadFromFile('c:\markers.txt');
LastNameMarkers.Add(' St.');
LastnameMarkers.Add(' Mc');
LastNameMarkers.Add(' '); //Marker of last resort.
SuffixFix:= TStringList.Create;
SuffixFix.Add(' Jr.');
SuffixFix.Add(' Sr.');
end;
function GetLastName(FullName: string): string;
var
i: integer;
start: integer;
found: boolean;
ReplaceWith: string;
begin
if LastNameMarkers = nil then InitLists;
//Fix suffixes
i:= 0;
found:= false;
while (i < SuffixFix.Count) and not found do begin
start:= pos(lower(LastNameMarkers[i]),lower(Fullname));
found:= Start > 0;
Inc(i);
end; {while}
if Found then begin
Dec(i);
ReplaceWith:= StringReplace(Suffix[i], ' ', '_',[]);
FullName:= StringReplace(FullName, SuffixFix[i], ReplaceWith,[]);
end; {if}
//Look for lastnames
i:= 0;
found:= false;
while (i < LastNameMarkers.Count) and not found do begin
start:= pos(LastNameMarkers[i],Fullname);
found:= Start > 0;
Inc(i);
end; {while}
if found then Result:= RightStr(FullName, Length(FullName)- Start + 2)
else Result:= '';
StringReplace(Result, '_', ' ',[]);
end;
I haven't dealt with upper and lowercase properly, but I hope you get the idea.

function TfrmCal.GetLastName(FullName: string): string;
var
i: integer;
found: boolean;
suffix: string;
marker: string;
begin
// Build the lists for the compare.
InitLists;
// Look at Suffixes and attach them to the LastName
i := 0;
found := False;
while (i < SuffixFix.Count) do
begin
if AnsiContainsStr(FullName, SuffixFix[i]) then
begin
suffix := '::' + trim(SuffixFix[i]);
FullName := ReplaceStr(FullName, SuffixFix[i], suffix);
found := True;
end;
inc(i);
if found then
break;
end;
// Look for LastName Markers
i := 0;
found := False;
while (i < LastNameMarkers.Count) do
begin
if AnsiContainsStr(FullName, LastNameMarkers[i]) then
begin
marker := trimright(LastNameMarkers[i]) + '::';
FullName := ReplaceStr(FullName, LastNameMarkers[i], marker);
found := True;
end;
inc(i);
if found then
break;
end;
FullName := GetLastWord(FullName);
FullName := ReplaceStr(FullName, '::', ' ');
LastNameMarkers.Clear;
SuffixFix.Clear;
Result := FullName;
end;
function TfrmCal.GetLastWord(const Str: string): string;
var
p: integer;
i: integer;
const
SPACE = #$20;
begin
p := 1;
for i := Length(Str) downto 1 do
if Str[i] = SPACE then
begin
p := i + 1;
break;
end;
Result := Copy(Str, p, MaxInt);
end;
These two functions together pull off what I needed to do. There is also the initlists function which is clunky and ugly and I need to work on so I didn't post it here.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

how do i extract first string from specific text - delphi

Here is a solution for the one-liner-fan, the crack-nuts-with-sledgehammer-lover, the regex-enthusiast, the waste-performance-ignorant and the xml-parser-grump: TRegEx.Replace('Hi i have no name <z>empty</z>', '((^.)(?=<z>)|(^.)(?!<z>)).*', '$1');

If the delimiter will be there for sure, you can do: Result:= Copy(S, 1, Pos(Delimiter, S) - 1); Otherwise user Davids answer or this (shorter but bad performance): Result:= Copy(S, 1, Pos(Delimiter, S + Delimiter) - 1);

Related

Function to increment filename

Faster way to split text in Delphi TStringList

Reverse FloatToStrF Result

How to detect if a string is Base64Encoded or not?

DELPHI STRING: Pull a last name from a full name

Categories

Resources

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

how do i extract first string from specific text - delphi

Here is a solution for the one-liner-fan, the crack-nuts-with-sledgehammer-lover, the regex-enthusiast, the waste-performance-ignorant and the xml-parser-grump: TRegEx.Replace('Hi i have no name <z>empty</z>', '((^.*)(?=<z>)|(^.*)(?!<z>)).*', '$1');

If the delimiter will be there for sure, you can do: Result:= Copy(S, 1, Pos(Delimiter, S) - 1); Otherwise user Davids answer or this (shorter but bad performance): Result:= Copy(S, 1, Pos(Delimiter, S + Delimiter) - 1);

Related

Function to increment filename

Faster way to split text in Delphi TStringList

Reverse FloatToStrF Result

How to detect if a string is Base64Encoded or not?

DELPHI STRING: Pull a last name from a full name

Categories

Resources

Here is a solution for the one-liner-fan, the crack-nuts-with-sledgehammer-lover, the regex-enthusiast, the waste-performance-ignorant and the xml-parser-grump: TRegEx.Replace('Hi i have no name <z>empty</z>', '((^.)(?=<z>)|(^.)(?!<z>)).*', '$1');