Difference between CompareStr and '=' for Strings in Delphi - delphi

I just want to know the difference between CompareStr and = for comparing strings in Delphi. Both yield the same result.
if(str2[i] = str1[i]) then
ShowMessage('Palindrome')
if(CompareStr(str2[i], str1[i]) = 0) then
ShowMessage('Palindrome')
Both show message Palindrome.

Use CompareStr not when you just want to see whether two strings are equal, but when you want to know how one string compares relative to another. It will return a value less than 0 if the first argument appears first, asciibetically, and it will return a value greater than zero if the first argument belongs after the second.
Without CompareStr, you might have code like this:
if str1[i] = str2[i] then begin
// They're equal
end else if str1[i] < str2[i] then begin
// str1 comes first
end else begin
// str2 comes first
end;
That compares str1 and str2 twice. With CompareStr, you can cut out one of the string comparisons and replace it with a cheaper integer comparison:
x := CompareStr(str1[i], str2[i]);
if x = 0 then begin
// They're equal
end else if x < 0 then begin
// str1 comes first
end else begin
// str2 comes first
end;
As Gerry's answer explains, the function is particularly useful in sorting functions, especially since it has the same interface as other comparison functions like CompareText and AnsiCompareStr. The sorting function is a template method, and each of the functions serves as a comparison strategy.
If all you want to do is test for equality, use the = operator — it's easier to read. Use CompareStr when you need the extra functionality it provides.

Assuming Str1 and Str2 are strings, rather than arrays (or lists) or string, the first version will be more efficient, as the second version will first copy str1[i] and str2[i] to two new strings, then call a function, with the associated overhead.
The first version will simply compare the single characters referred to by str1[i] and str2[i]
If you are only interested if strings are the same, use =. If you need to know if strings are the same, OR which string is greater, then use CompareStr.
CompareStr is particularly useful when sorting lists, e.g, with TList.Sort(CompareFunc) or TStringList.Sort(CompareFunc)
If you want case-insensitive comparisons, use CompareText.

The result is not the same when compared strings are not equal. The result of CompareStr or AnsiCompareStr is of Integer type showing some more information literally how those strings compare. Take a look at http://www.delphibasics.co.uk/RTL.asp?Name=AnsiCompareStr

Apart from the return value (integer versus boolean), from the code it says that for CompareStr "the compare operation is based on the 8-bit ordinal value of each character and is not affected by the current user locale". So it looks like CompareStr was originally part of the FastCode routines and is in essence an optimised Ansi version developed for performance reasons. I have always tended to go with "=", "<", ">", etc.

Related

Time Complexity Difference between Two Parsing Implementations Using Global Variables and Return Values

I'm trying to solve the following problem:
A string containing only lower-case letters can be encoded into NUM[encoded string] format. For example, aaa can be encoded into 3[a]. Given an encoded string, find its original string according to the following grammar.
S -> {E}
E -> NUM[S] | STR # NUM[S] means encoded, while STR means not.
NUM -> 1 | 2 | ... | 9
STR -> {LETTER}
LETTER -> a | b | ... | z
Note: in the above grammar {} represents "concatenate 0 or more times".
For example, given the encoded string 3[a2[c]], the result (original string) is accaccacc.
I think this can be parsed by recursive descent parsing, and there are two ways to implement it:
Method I: Let the parsing method to return the result string directly.
Method II: Use a global variable, and each parsing method can just append characters to it.
I'm wondering if the two methods share the same time complexity. Suppose the result string is of length t. Then for method II, I think its time complexity should be O(t) because we read and write every character in the result string exactly once. For method I, however, my intuition was that it could be slower because the same substring can be copied multiple times, depending on the depth of recursions. But I'm not able to figure out the exact time complexity to justify my intuition. Can anyone give a hint?
My first suggestion is that your parser should produce an abstract syntax tree rather than directly interpret the string, no matter whether you choose to write a recursive descent parser, a state-based parser or use a parser generator. This greatly enhances maintainability and allows you perform validation, analyses, and transformations much more easily.
Method I
If I understand you correctly, in Method I you have functions for each grammar construct that return an immutable string, which are then recursively repeated and concatenated. For example, for the top-level concatenation rule
S ::= E*
you would have an interpretation function that looks like this:
string interpretS(NodeS sNode) {
string result = "";
for (int i = 0; i < sNode.Expressions.Length; i++) {
result = result + interpretE(sNode.Expressions[i]);
}
return result;
}
... and similarly for the other rules. It is easy to see that the time complexity of Method I is O(n²) where n is the length of the output. (NB: It makes sense to measure the time complexity in terms of the output rather than the input, since the output length is exponential in the length of the input, and so any interpretation method must have time complexity at least exponential in the input, which is not very interesting.) For example, interpreting the input abcdef requires concatenating a and b, then concatenating the result with c, then concatenating that result with d etc., resulting in 1+2+3+4+5 steps. (See here for a more detailed discussion why repeated string concatenation with immutable strings has quadratic complexity.)
Method II
I interpret your description of Method II like this: instead of returning individual strings which have to be combined, you keep a reference to a mutable structure representing a string that supports appending. This could be a data structure like StringBuilder in Java or .NET, or just a dynamic-length list of characters. The important bit is that appending a string of length b to a string of length a can be done in O(b) (rather than O(a+b)).
Note that for this to work, you don't need a global variable! A cleaner solution would just pass the reference to the resulting structure through (this pattern is called accumulator parameter). So now we would have functions like these:
void interpretS2(NodeS sNode, StringBuilder accumulator) {
for (int i = 0; i < sNode.Expressions.Length; i++) {
interpretE2(sNode.Expressions[i], accumulator);
}
}
void interpretE2(NodeE eNode, StringBuilder accumulator) {
if (eNode is NodeNum numNode) {
for (int i = 0; i < numNode.Repetitions; i++) {
interpretS2(numNode.Expression, accumulator);
}
}
else if (eNode is NodeStr strNode) {
for (int i = 0; i < strNode.Letters.Length; i++) {
interpretLetter2(strNode.Letters[i], accumulator);
}
}
}
void interpretLetter2(NodeLetter letterNode, StringBuilder accumulator) {
accumulator.Append(letterNode.Letter);
}
...
As you stated correctly, here the time complexity is O(n), since at each step exactly one character of the output is appended to the accumulator, and no strings are ever copied (only at the very end, when the mutable structure is converted into the output string).
So, at least for this grammar, Method II is clearly preferable.
Update based on comment
Of course, my interpretation of Method I above is exceedingly naive. A more realistic implementation of the interpretS function would internally use a StringBuilder to concatenate the results from the subexpressions, resulting in linear complexity for the example given above, abcdef.
However, this wouldn't change the worst case complexity O(n²): consider the example
1[a1[b[1[c[1[d[1[e[1[f]]]]]]]]]]
Even the less naive version of Method I would first append f to e (1 step), then append ef to d (+ 2 steps), then append def to c (+ 3 steps) and so on, amounting to 1+2+3+4+5 steps in total.
The fundamental reason for the quadratic time complexity of Method I is that the results from the subexpressions are copied to create the new subresult to be returned.
Time Complexity is estimated by counting the number of elementary operations performed by an algorithm, supposing that each elementary operation takes a fixed amount of time to perform, see here. Of interest is, however, only how fast this number of operations increases, when the size of the input data set increases.
In your case, the size of the input data means the length of the string to be parsed.
I assume by your 1st method you mean that when a NUM is encountered, its argument is processed by the parser completely NUM times. In your example, when „3“ is read from the input string, „a2[c]“ is processed completely 3 times. Processing here means to transverse the syntax tree up to a leave, and append the leave value, here the „c“ to the output string.
I also assume by your 2nd method you mean that when a NUM is encountered, its argument is only evaluated once and all intermediate results are stored and re-used. In your example, when „3“ is read from the input string and stored, „a“ is read from the input string and stored, „2[c]“ is processed, i.e. „2“ is read from the input string and stored, and finally „c“ is processed. „c“ is due to the stored „2“ combined to „cc“, and due to the stored „a“ combined to „acc“. This is due to the stored „3“ combined then to „accaccacc“, and „accaccacc“ is output.
The question now is, what is the elementary operation that is relevant to the time complexity? My feeling is that in the 1st case, the stack operations during transversal of the syntax tree are important, while in the 2nd case, string copying operations are important.
Strictly speaking, one can thus not compare the time complexities of both algorithms.
If you are, however, interested in run times instead of time complexities, my guess is that the stack operations take more time than string copying, and that then method 2 is preferable.

Parsing procedure calls for a toy language

I have a certain toy language that defines, amongst others, procedures and procedure calls, using EBNF syntax:
program = procedure, {procedure} ;
procedure = "procedure", NAME, bracedblock ;
bracedBlock = "{" , statementlist , "}" ;
statementlist = statement, { statement } ;
statement = define | if | while | call | // others omitted for brevity ;
define = NAME, "=", expression, ";"
if = "if", conditionalblock, "then", bracedBlock, "else", bracedBlock
call = "call" , NAME, ";" ;
// other definitions omitted for brevity
A tokeniser for a program in this language has been implemented, and returns a vector of tokens.
Now, parsing said program without the procedure calls, is fairly straightforward: one can define a recursive descent parser using the above grammar directly, and simply parse through the tokens. Some further notes:
Each procedure may call any other procedure except itself, directly or indirectly (i.e. no recursion), and these need not necessarily be in the order of appearance in the source code (i.e. B may be defined after A, and A may call B, or vice versa).
Procedure names need to be unique, and 'reserved keywords' may be used as variable/procedure names.
Whitespace does not matter, at least amongst tokens of different type: similar to C/C++.
There is no scoping rule: all variables are global.
The concept of a 'line number' is important: each statement has one or more line numbers associated with it: define statements have only 1 line number each, for instance, whereas an if statement, which is itself a parent of two statement lists, has multiple line numbers. For instance:
LN CODE
procedure A {
1. a = 5;
2. b = 7;
3. c = 3;
4. 5. if (b < c) then { call C; } else {
6. call B;
}
procedure B {
7. d = 5;
8. while (d > 2) {
9. d = d + 1; }
}
procedure C {
10. e = 10;
11. f = 8;
12. call B;
}
Line numbers are continuous throughout the program; only procedure definitions and the else keyword aren't assigned line numbers. The line numbers are defined by grammar, rather than their position in source code: for instance, consider 'lines' 4 and 5.
There are some relationships that need to be set in a database given each statement and its line number, variables used, variables set, and child containers. This is a key consideration.
My question is therefore this: how can I parse these function calls, maintain the integrity of the line numbers, and set the relationships?
I have considered the 'OS' way of doing things: upon encounter of a procedure call, look ahead for a procedure that matches said called procedure, parse the callee, and unroll the call stack back to the caller. However, this ruins the line number ordering: if the above program were to be parsed this way, C would have line numbers 6 to 8 inclusive, rather than 10 to 12 inclusive.
Another solution is to parse the entire program once in order, maintain a toposort of procedure calls, and then parse a second time by following said toposort. This is problematic because of implementation details.
Is there a possibly better way to do this?
It's always tempting to try to completely process a program text in a single on-line pass. Unfortunately, it is practically never the simplest solution. Trying to do everything at once in a linear progression results in a kind of spaghetti of intertwined computations, and making it all work almost always involves unnecessary restrictions on the language which will later prove to be unfortunate.
So I'd encourage you to reconsider some of your design decisions. If you use the parser just to build up some kind of structural representation of the program -- whether it's an abstract syntax tree or a vector of three-address code, or some other alternative -- and then do further processing in a series of single-purpose passes over that structural representations, you'll likely find that the code is:
much simpler, because computations don't have to be intermingled;
more general, because each pass can be done in the most convenient order rather than restricting inputs to fit a linear ordering;
more readable and more maintainable.
Persisting data structures over multiple passes might increase storage requirements slightly. But the structures are unlikely to occupy enough storage that this will be noticeable. And it probably will not increase the computation time; indeed, it might even reduce the time because the individual passes are simpler and easier to optimise.

Performant, idiomatic way to concatenate two chars that are not in a list into a string

I've done most of my development in C# and am just learning F#. Here's what I want to do in C#:
string AddChars(char char1, char char2) => char1.ToString() + char2.ToString();
EDIT: added ToString() method to the C# example.
I want to write the same method in F# and I don't know how to do it other than this:
let addChars char1 char2 = Char.ToString(char1) + Char.ToString(char2)
Is there a way to add concatenate these chars into a string without converting both into strings first?
Sidenote:
I also have considered making a char array and converting that into a string, but that seems similarly wasteful.
let addChars (char1:char) (char2: char) = string([|char1; char2|])
As I said in my comment, your C# code is not going to do what you want ( i.e. concatenate the characters into a string). In C#, adding a char and a char will result in an int. The reason for this is because the char type doesn't define a + operator, so C# reverts to the nearest compatable type that does, which just happens to be int. (Source)
So to accomplish this behavior, you will need to do something similar to what you are already trying to do in F#:
char a = 'a';
char b = 'b';
// This is the wrong way to concatenate chars, because the
// chars will be treated as ints and the result will be 195.
Console.WriteLine(a + b);
// These are the correct ways to concatenate characters into
// a single string. The result of all of these will be "ab".
// The third way is the recommended way as it is concise and
// involves creating the fewest temporary objects.
Console.WriteLine(a.ToString() + b.ToString());
Console.WriteLine(Char.ToString(a) + Char.ToString(b));
Console.WriteLine(new String(new[] { a, b }));
(See https://dotnetfiddle.net/aEh1FI)
F# is the same way in that concatenating two or more chars doesn't result in a String. Unlike C#, it results instead in another char, but the process is the same - the char values are treated like int and added together, and the result is the char representation of the sum.
So really, the way to concatenate chars into a String in F# is what you already have, and is the direct translation of the C# equivalent:
let a = 'a'
let b = 'b'
// This is still the wrong way (prints 'Ã')
printfn "%O" (a + b)
// These are still the right ways (prints "ab")
printfn "%O" (a.ToString() + b.ToString())
printfn "%O" (Char.ToString(a) + Char.ToString(b))
printfn "%O" (String [| a;b |]) // This is still the best way
(See https://dotnetfiddle.net/ALwI3V)
The reason the "String from char array" approach is the best way is two-fold. First, it is the most concise, since you can see that that approach offers the shortest line of code in both languages (and the difference only increases as you add more and more chars together). And second, only one temporary object is created (the array) before the final String, whereas the other two methods involve making two separate temporary String objects to feed into the final result.
(Also, I'm not sure if it works this way as the String constructors are hidden in external sources, but I imagine that the array passed into the constructor would be used as the String's backing data, so it wouldn't end up getting wasted at all.) Strings are immutable, but using the passed array directly as the created String's backing data could result in a situation where a reference to the array could be held elsewhere in the program and jeopardize the String's immutability, so this speculation wouldn't fly in practice. (Credit: #CaringDev)
Another option you could do in F# that could be more idiomatic is to use the sprintf function to combine the two characters (Credit: #rmunn):
let a = 'a'
let b = 'b'
let s = sprintf "%c%c" a b
printfn "%O" s
// Prints "ab"
(See https://dotnetfiddle.net/Pp9Tee)
A note of warning about this method, however, is that it is almost certainly going to be much slower than any of the other three methods listed above. That's because instead of processing array or String data directly, sprintf is going to be performing more advanced formatting logic on the output. (I'm not in a position where I could benchmark this myself at the moment, but plugged into #TomasPetricek's benckmarking code below, I wouldn't be surprised if you got performance hits of 10x or more.)
This might not be a big deal as for a single conversion it will still be far faster than any end-user could possibly notice, but be careful if this is going to be used in any performance-critical code.
The answer by #Abion47 already lists all the possible sensible methods I can think of. If you are interested in performance, then you can run a quick experiment using the F# Interactive #time feature:
#time
open System
open System.Text
let a = 'a'
let b = 'b'
Comparing the three methods, the one with String [| a; b |] turns out to be about twice as fast as the methods involving ToString. In practice, that's probably not a big deal unless you are doing millions of such operations (as my experiment does), but it's an interesting fact to know:
// 432ms, 468ms, 472ms
for i in 0 .. 10000000 do
let s = a.ToString() + b.ToString()
ignore s
// 396ms 440ms, 458ms
for i in 0 .. 10000000 do
let s = Char.ToString(a) + Char.ToString(b)
ignore s
// 201ms, 171ms, 170ms
for i in 0 .. 10000000 do
let s = String [| a;b |]
ignore s

Is there a difference between slicing and an explicit reborrow when converting Strings to &strs?

Are the following two examples equivalent?
Example 1:
let x = String::new();
let y = &x[..];
Example 2:
let x = String::new();
let y = &*x;
Is one more efficient than the other or are they basically the same?
In the case of String and Vec, they do the same thing. In general, however, they aren't quite equivalent.
First, you have to understand Deref. This trait is implemented in cases where a type is logically "wrapping" some lower-level, simpler value. For example, all of the "smart pointer" types (Box, Rc, Arc) implement Deref to give you access to their contents.
It is also implemented for String and Vec: String "derefs" to the simpler str, Vec<T> derefs to the simpler [T].
Writing *s is just manually invoking Deref::deref to turn s into its "simpler form". It is almost always written &*s, however: although the Deref::deref signature says it returns a borrowed pointer (&Target), the compiler inserts a second automatic deref. This is so that, for example, { let x = Box::new(42i32); *x } results in an i32 rather than a &i32.
So &*s is really just shorthand for Deref::deref(&s).
s[..] is syntactic sugar for s.index(RangeFull), implemented by the Index trait. This means to slice the "whole range" of the thing being indexed; for both String and Vec, this gives you a slice of the entire contents. Again, the result is technically a borrowed pointer, but Rust auto-derefs this one as well, so it's also almost always written &s[..].
So what's the difference? Hold that thought; let's talk about Deref chaining.
To take a specific example, because you can view a String as a str, it would be really helpful to have all the methods available on strs automatically available on Strings as well. Rather than inheritance, Rust does this by Deref chaining.
The way it works is that when you ask for a particular method on a value, Rust first looks at the methods defined for that specific type. Let's say it doesn't find the method you asked for; before giving up, Rust will check for a Deref implementation. If it finds one, it invokes it and then tries again.
This means that when you call s.chars() where s is a String, what's actually happening is that you're calling s.deref().chars(), because String doesn't have a method called chars, but str does (scroll up to see that String only gets this method because it implements Deref<Target=str>).
Getting back to the original question, the difference between &*s and &s[..] is in what happens when s is not just String or Vec<T>. Let's take a few examples:
s: String; &*s: &str, &s[..]: &str.
s: &String: &*s: &String, &s[..]: &str.
s: Box<String>: &*s: &String, &s[..]: &str.
s: Box<Rc<&String>>: &*s: &Rc<&String>, &s[..]: &str.
&*s only ever peels away one layer of indirection. &s[..] peels away all of them. This is because none of Box, Rc, &, etc. implement the Index trait, so Deref chaining causes the call to s.index(RangeFull) to chain through all those intermediate layers.
Which one should you use? Whichever you want. Use &*s (or &**s, or &***s) if you want to control exactly how many layers of indirection you want to strip off. Use &s[..] if you want to strip them all off and just get at the innermost representation of the value.
Or, you can do what I do and use &*s because it reads left-to-right, whereas &s[..] reads left-to-right-to-left-again and that annoys me. :)
Addendum
There's the related concept of Deref coercions.
There's also DerefMut and IndexMut which do all of the above, but for &mut instead of &.
They are completely the same for String and Vec.
The [..] syntax results in a call to Index<RangeFull>::index() and it's not just sugar for [0..collection.len()]. The latter would introduce the cost of bound checking. Gladly this is not the case in Rust so they both are equally fast.
Relevant code:
index of String
deref of String
index of Vec (just returns self which triggers the deref coercion thus executes exactly the same code as just deref)
deref of Vec

Why are there parentheses and dots after an array's name instead of brackets?

When accessing an element in an array the square brackets are used like so:
{'X is an int and Numbers is an int array'}
X := Numbers[8];
However, While reading others' code I sometimes find the following syntax:
{'PBox , SBox1 , SBox2 are arrays of int , And X,Y are ints'}
Result := Result or PBox(.SBox1[X] or SBox2[Y].);
What does it mean to have parentheses after the array's name, as in PBox(someNumber)? Is this another way to access an array element?
What does the "." before SBox1 and after SBox2 mean? Both SBox1 and SBox2 are arrays. The code compiles without error but I don't know what those dots are for.
Yes, now I see what you do.
In fact, (. and .) are merely alternative ways (but very uncommon!) of writing [ and ] in Delphi.
If PBox is an array, then PBox[a] (or, equivalently, PBox(.a.)) would require a to be an integer, right? And if SBox1[x] and SBox2[Y] are integers, so is the bitwise or of them. (Bitwise or is an operation that takes two integers and returns a new integer.) Hence, PBox(.SBox1[X] or SBox2[Y].) is the (SBox1[X] or SBox2[Y])th element in the array PBox, that is, an integer. So it makes sense to compute the bitwise or between Result and this integer, which is what is done:
Result := Result or PBox(.SBox1[X] or SBox2[Y].);

Resources