What is the most efficient way to concatenate strings in Dart? - dart

Languages like Java, let you concatenate Strings using '+" operator.
But as strings are immutable, they advise one to use StringBuilder for efficiency if one is going to repeatedly concatenate a string.
What is the most efficient way to concatenate Strings in Dart ?
https://api.dart.dev/stable/2.9.1/dart-core/StringBuffer-class.html
StringBuffer can be used for concatenating strings efficiently.
Allows for the incremental building of a string using write*() methods. The strings are concatenated to a single string only when toString is called.
It appears that if one uses StringBuffer, one is postponing the performance hit till toString is called?

There are a number of ways to concatenate strings:
String.operator +: string1 + string2. This is the most straightforward. However, if you need to concatenate a lot of strings, using + repeatedly will create a lot of temporary objects, which is inefficient. (Also note that unlike other concatenation methods, + will throw an exception if either argument is null.)
String interpolation: '$string1$string2'. If you need to concatenate a fixed number of strings that are known in advance (such that you can use a single interpolating string), I would expect this to be reasonably efficient. If you need to incrementally build a string, however, this would have the same inefficiency as +.
StringBuffer. This is efficient if you need to concatenate a lot of strings.
Iterable.join: [string1, string2].join(). This internally uses a StringBuffer so would be equivalent.
If you need to concatenate a small, fixed number of strings, I would use string interpolation. It's usually more readable than using +, especially if there are string literals involved. Using StringBuffer in such cases would add some unnecessary overhead.

Here is what I understood:
It appears that performance will vary depending on your use case. If you use StringBuffer, and intend to concatenate a very large string, then as the concatenation occurs only when you call toString, it is at that point where you get the "performance hit".
But if you use "+", then each time you call "+" there is a performance hit. As strings are immutable, each time you concatenate two strings you are creating a new object which needs to be garbage collected.
Hence the answer seems to be to test for your situation, and determine which option makes sense.

Related

Can code injection in Lua be performed with just a variable definition?

Assuming I define a variable like this in Lua
local input = "..."
Where the ... comes from a user-provided string. Would that user be able to perform code injection just from a variable definition? Do I need to sanitize the string?
As a general rule, if you ever need to ask yourself if you need to sanitize your inputs, the correct answer is "yes".
As to this particular case, if you just copy/paste the user's string directly into the Lua source file, even in quotes like that, they will be able to execute arbitrary code. It's not even particularly difficult; they can provide some text"; my_code = 20; last = "end of string.
The best way to sanitize this is by using a long-form literal string with [[...]] syntax. But even that can be broken out, so you need to search through the given string for repeated sequences of the = character. Each time you find a sequence, note how many = characters are in that sequence. After searching, insert a number of = characters into your literal string that isn't one of the lengths found in the user string.
Of course, the internal implementation of Lua may have some limits on the length of the = sequence in a long-form literal string. In such a case, an external user could break your code by forcing you to use a longer sequence than the implementation supports. But it won't be able to cause arbitrary code execution; you'll just get a compile error.

Characters and Strings in Swift

Reading the documentation and this answer, I see that I can initialize a Unicode character in either of the following ways:
let narrowNonBreakingSpace: Character = "\u{202f}"
let narrowNonBreakingSpace = "\u{202f}"
As I understand, the second one would actually be a String. And unlike Java, both of them use double quotes (and not single quotes for characters). I've seen several examples, though, where the second form (without Character) is used even though the variable is only holding a single character. Is that people just being lazy or forgetting to write Character? Or does Swift take care of all the details and I don't need to bother with it? If I know I have a constant that contains only a single Unicode value, should I always use Character?
When a type isn't specified, Swift will create a String instance out of a string literal when creating a variable or constant, no matter the length. Since Strings are so prevalent in Swift and Cocoa/Foundation methods, you should just use that unless you have a specific need for a Character—otherwise you'll just need to convert to String every time you need to use it.
The Swift compiler will infer the type of the string to actually be character in the second case. Adding : Character is therefor not really needed. In this case I would add it though because it's easy to mistakenly assume that this Character is a String and another developer might try to treat it as such. However, the compiler would throw errors because of that since it inferred the type of this String to not be a String but a Character.
So in my opinion adding Character is not a matter of being lazy or forgetting it, it's a matter of trusting the compiler to correctly infer the type of this constant en to rely on the compiler throwing the correct error whenever I try to use this constant wrong.
Swift's compiler basically takes care of all the details and it doesn't really matter if you add Character, the compiler will (should) take care of it.

id values of different variables in python 3

I am able to understand immutability with python (surprisingly simple too). Let's say I assign a number to
x = 42
print(id(x))
print(id(42))
On both counts, the value I get is
505494448
My question is, does python interpreter allot ids to all the numbers, alphabets, True/False in the memory before the environment loads? If it doesn't, how are the ids kept track of? Or am I looking at this in the wrong way? Can someone explain it please?
What you're seeing is an implementation detail (an internal optimization) calling interning. This is a technique (used by implementations of a number of languages including Java and Lua) which aliases names or variables to be references to single object instances where that's possible or feasible.
You should not depend on this behavior. It's not part of the language's formal specification and there are no guarantees that separate literal references to a string or integer will be interned nor that a given set of operations (string or numeric) yielding a given object will be interned against otherwise identical objects.
I've heard that the C Python implementation does include a set of the first hundred or so integers as statically instantiated immutable objects. I suspect that other very high level language run-time libraries are likely to include similar optimizations: the first hundred integers are used very frequently by most non-trivial fragments of code.
In terms of how such things are implemented ... for strings and larger integers it would make sense for Python to maintain these as dictionaries. Thus any expression yielding an integer (and perhaps even floats) and strings (at least sufficiently short strings) would be hashed, looked up in the appropriate (internal) object dictionary, added if necessary and then returned as references to the resulting object.
You can do your own similar interning of any sorts of custom object you like by wrapping the instantiation in your own calls to your own class static dictionary.

What is the proper Lua pattern for quoted text?

I've been playing with this for an hour or tow and have found myself at a road block with the Lua pattern matching utilities. I am attempting to match all quoted text in a string and replace it if needed.
The pattern I have come up with so far is: (\?[\"\'])(.-)%1
This works in some cases but, not all cases:
Working: "This \"is a\" string of \"text to\" test with"
Not Working: "T\\\"his \"is\' a\" string\" of\' text\" to \"test\" wit\\\"h"
In the not working example I would like it to match to (I made a function that gets the matches I desire, I'm just looking for a pattern to use with gsub and curious if a lua pattern can do this):
string
a" string" of
is' a" string" of' text
test
his "is' a" string" of' text" to "test" wit
I'm going to continue to use my function instead for the time being, but am curious if there is a pattern I could/should be using and i'm just missing something with patterns.
(a few edits b/c I forgot about stackoverflows formating)
(another edit to make a non-html example since it was leading to assumptions that I was attempting to parse html)
Trying to match escaped, quoted text using regular expressions is like trying to remove the daisies (and only the daises) from a field using a lawnmower.
I made a function that gets the matches I desire
This is the correct move.
I'm curious if a lua pattern can do this
From a practical point of view, even if a pattern can do this, you don't want to. From a theoretical point of view, you are trying to find a double quote that is preceded by an even number of backslashes. This is definitely a regular language, and the regular expression you want would be something like the following (Lua quoting conventions)
[[[^\](\\)*"(.-[^\](\\)*)"]]
And the quoted string would be result #2. But Lua patterns are not full regular expressions; in particular, you cannot put a * after a parenthesized pattern.
So my guess is that this problem cannot be solved using Lua patterns, but since Lua patterns are not a standard thing in automata theory, I'm not aware of any body of proof technique that you could use to prove it.
The issue with escaped quotes is that, in general, if there's an odd number of backslashes before the quote, then it's escaped, and if there's an even number, it's not. I do not believe that Lua pattern-matching is powerful enough to represent this condition, so if you need to parse text like this, then you should seek another way. Perhaps you can iterate through the string and parse it, or you could find each quote in turn and read backwards, counting the backslashes until you find a non-backslash character (or the beginning of the string).
If you absolutely must use patterns for some reason, you could try doing this in a multi-step process. First, gsub for all occurrences of two backslashes in a row, and replace them with some sentinel value. This must be a value that does not already occur in the string. You could try something like "\001" if you know this string doesn't contain non-printable characters. Anyway, once you've replaced all sequences of two backslashes in a row, any backslashes left are escaping the following character. Now you can apply your original pattern, and then finally you can replace all instances of your sentinel value with two backslashes again.
Lua's pattern language is adequate for many simple cases. And it has at least one trick you don't find in a typical regular expression package: a way to match balanced parenthesis. But it has its limits as well.
When those limits are exceeded, then I reach for LPeg. LPeg is an implementation of a Parsing Expression Grammer for Lua, and was implemented by one of Lua's original authors so the adaptation to Lua is done quite well. A PEG allows specification of anything from simple patterns through complete language grammars to be written. LPeg compiles the grammar to a bytecode and executes it extremely efficiently.
you should NOT be trying to parse HTML with regular expressions, HTML and XML are NOT regular languages and can not be successfully manipulated with regular expressions. You should use a dedicated HTML parser. Here are lots of explanations why.

Are Delphi strings immutable?

As far as I know, strings are immutable in Delphi. I kind of understand that means if you do:
string1 := 'Hello';
string1 := string1 + " World";
first string is destroyed and you get a reference to a new string "Hello World".
But what happens if you have the same string in different places around your code?
I have a string hash assigned for identifying several variables, so for example a "change" is identified by a hash value of the properties of that change. That way it's easy for me to check to "changes" for equality.
Now, each hash is computed separately (not all the properties are taken into account so that to separate instances can be equal even if they differ on some values).
The question is, how does Delphi handles those strings? If I compute to separate hashes to the same 10 byte length string, what do I get? Two memory blocks of 10 bytes or two references to the same memory block?
Clarification: A change is composed by some properties read from the database and is generated by an individual thread. The TChange class has a GetHash method that computes a hash based on some of the values (but not all) resulting on a string. Now, other threads receive the Change and have to compare it to previously processed changes so that they don't process the same (logical) change. Hence the hash and, as they have separate instances, two different strings are computed. I'm trying to determine if it'd be a real improvement to change from string to something like a 128 bit hash or it'll be just wasting my time.
Edit: Version of Delphi is Delphi 7.0
Delphi strings are copy on write. If you modify a string (without using pointer tricks or similar techniques to fool the compiler), no other references to the same string will be affected.
Delphi strings are not interned. If you create the same string from two separate sections of code, they will not share the same backing store - the same data will be stored twice.
Delphi strings are not immutable (try: string1[2] := 'a') but they are reference-counted and copy-on-write.
The consequences for your hashes are not clear, you'll have to detail how they are stored etc.
But a hash should only depend on the contents of a string, not on how it is stored. That makes the whole question mute. Unless you can explain it better.
As others have said, Delphi strings are not generally immutable. Here are a few references on strings in Delphi.
http://blog.marcocantu.com/blog/delphi_super_duper_strings.html
http://conferences.codegear.com/he/article/32120
http://www.codexterity.com/delphistrings.htm
The Delphi version may be important to know. The good old Delphi BCL handles strings as copy-on-write, which basically means that a new instance is created when something in the string is changed. So yes, they are more or less immutable.

Resources