Why is there no implicit conversion from std::string_view to std::string? - c++17

There is an implicit conversion from std::string to std::string_view and it's not considered unsafe, even though this surely may cause a lot of dangling references if programmer is not careful.
On the other hand, there's no implicit conversion from std::string_view to std::string using same argument but in the completely opposite fashion: because programmer may be not careful.
It's lovely that C++ has a replacement for a raw const char* pointer, while making it super confusing and stripped to the bone:
Implicit const char* -> std::string: OK
Implicit std::string_view -> std::string: NOPE
Assignment std::string = const char* : OK
Assignment std::string = std::string_view: OK
Appending std::string += const char* : OK
Appending std::string += std::string_view: OK
Concatenation const char* + std::string: OK
Concatenation std::string_view + std::string: NOPE
Concatenation std::string + const char*: OK
Concatenation std::string + std::string_view: NOPE
Am I missing something or is this a total nonsense?
In the end, how useful is this string view without all the crucial pieces that make it similar to const char*? What's the point of integrating it into the ecosystem of stdlib while not making the last step to make it complete? After all, if we need an object that represents a piece of a string we could write our own. Actually, a lot of libraries already have done that, years ago. The whole point of making something standard is to make it useful for widest range of use cases, isn't it?
Are they going to fix this in C++23?

The problem is that std::string_view -> std::string makes a copy of the underlying memory, complete with heap allocation, whereas the implicit std::string -> std::string_view does not. If you've bothered to use a std::string_view in the first place then you obviously care about copies, so you don't want one to happen implicitly.
Consider this example:
void foo1(const std::string& x)
{
foo2(x);
}
void foo2(std::string_view x)
{
foo3(x);
}
void foo3(const std::string& x)
{
// Use x...
}
The function foo2 could've used a const std::string& parameter, but used a std::string_view so that it is more efficient if you pass in a string that isn't a std::string; no surprises there. But it's less efficient than if you'd just given it a const std::string& parameter!
When foo2 is called with a std::string argument (e.g. by foo1): When foo2 calls foo3, it creates a copy of the string. If it had a const std::string& argument, it could've used the object it already had.
When foo2 is called with a const char* argument: A std::string copy has to be made sooner or later; with a const std::string& parameter it gets made earlier, but overall there's exactly one copy either way.
Now imagine foo2 calls multiple functions like foo3, or calls foo3 in a loop; it's making exactly the same std::string object over and over. You'd want the compiler to notify you about this.

Because expensive implicit conversions are undesirable...
You only listed one implicit conversion out of your entire set of examples: const char* -> std::string; and you're asking for another one. In all of the other 'OK'-marked functions - allocations of memory are obvious/explicit:
Assignments: When you assign anything to an owning object with variable storage size, it is understood that an allocation may be necessary. (Unless it's a move-assignment, but never mind that.)
Append I: when you append to an owning object with variable storage size, it is even more obvious it will need to allocate to accommodate the additional data. (It might have enough reserved space, but nobody guarantees this about strings.)
Concatenations: In all three cases, memory is allocated for the result only, not for any intermediate object. Allocation for the result is obviously necessary, since the operands of the concatenation remain intact (and can't be assumed to hold the result anyway).
Implicit conversions in general have benefits and detriments. C++ is actually mostly stingy with these, and inherits most implicit conversions from C. But - const char* to std::string is an exception. As #ArthurTacca notes, it allocates memory. Now, the
C++ core guidelines say:
C.164: Avoid implicit conversion operators
Reason:
Implicit conversions can be essential (e.g., double to int)
but often cause surprises (e.g., String to C-style string).
and this is doubly the case when the unintended conversion performs expensive operations like allocation, which have side effects, and may make system calls.
PS - std::string_view has quite a few gotchas; see this CppCon 2018 talk by Victor Ciura:
Enough string_view to hang ourselves with
So remember it's not some kind of universal panacea; it's another class that you need to use with care, not carelessness.
... an explicit constructor is sufficient.
There actually does exist an explicit constructor of std::string from std::string_view. Use it - by passing std::string{my_string_view} to a function taking a string.

One of the reasons string views are useful in embedded environment is precisely because they don't do dynamic allocation and we get safety due to the length being passed as part of the view. So for me the lack of an implicit cast to std::string is not a deal breaker because that would require dynamic allocation. A silent conversion would be an error in this environment and have to be found and removed.

Related

Anyone have/know of a "universal" string class for C++Builder?

Has anyone built a "universal" string class for C++Builder that manages all of the conversions to/from ASCII and Unicode?
I had a vision of a class that would accept AnsiString, UnicodeString, WideString, char*, wchar_t*, std::string, and variant values, and would provide any of those back out. AND the copy constructor has to do a deep copy, not just provide a pointer to the same buffer space (as AnsiString and UnicodeString do).
I figure someone else besides me must have to pass strings to both old interfaces that use char* and new ones that use (wide) strings. If you have built, or know of, something you're willing to share, please let me know. Most of the time it's not too big a deal, until I have to pass a map<std::string, std::string>, then it starts getting ugly.
We do not, and will not, support any internationalization whatsoever, so I don't need to worry about encoding. I just want a class that will return my little ASCII strings in whatever format makes the compiler happy... sanely.
UPDATE: to address the comments:
So, std::map<std::string, std::string> is ugly, because you can't do:
parammap[AnsiString(widekey).c_str()] = AnsiString(widevalue).c_str();
Oh no no no. You have to do this:
AnsiString akey = widekey;
AnsiString aval = widevalue;
parammap[akey.c_str()] = aval.c_str();
The person who originally wrote this code tried to keep it as port-friendly as possible, so he standardized on char* for all of the function calls he wrote (circa 2000, it wasn't a bad assumption). Sometimes I was trying to convert everything to char *s before I realized that the function was then immediately turning around and converting it back to wide. There are multiple interface layers, and it took me a while to figure out how it all went together.
Add in some creative compiler bugs, where it would get confused, especially when pulling string values out of Variants. In some places, I had to do:
String wstr = passedvariant.AsType(varString);
String astr = wstr;
std::string key = astr.c_str();
Then life happened, we ended up starting the port over (for the 3rd time. Don't ask), and I finally got smart and wrapped the low-level library in a layer that does all of the conversions, and retooled the middle layers to deal in Strings, so the application layer can just use String except for that map. For the map<string, string>, I created a function to do the converting, so it was one line in a bunch of places instead of six (the three line conversion above for both key and value).
Lastly, I wasn't actually asking for anyone to make suggestions on how to make my code better. I was asking if anyone had or knew of a universal string class. Our code is the way it is for reasons, and I'm not rewriting all of it to make it prettier. I just wanted not to have to touch so many lines... again. It would have been so much nicer to have the compiler keep track of which format is needed and convert it.

The differences between `lua_pushstring` and `lua_pushliteral`

As per the documentation(https://www.lua.org/manual/5.3/manual.html#lua_pushliteral),
which says that:
This macro is equivalent to lua_pushstring, but should be used only when s is a literal string.
But I can't understand the explanation aforementioned at all.
As far as I can see, there is no difference from the the macro definition for lua_pushliteral:
#define lua_pushliteral(L, s) lua_pushstring(L, "" s)
The documentation for lua_pushliteral in Lua 5.4 is the same as 5.3, except it adds "(Lua may optimize this case.)". So while it is currently the same as calling lua_pushstring, the Lua devs are giving themselves the option to optimize it in the future.
EDIT: As an example, the doc for lua_pushstring says:
Lua will make or reuse an internal copy of the given string, so the memory at s can be freed or reused immediately after the function returns.
But a C string literal is read-only, so it's impossible for the C code to free or reuse the memory. Also, Lua strings are immutable. It's basically useless to copy one immutable object to another immutable object when you could just refer to the same memory from both places. That means one possible way to optimize lua_pushliteral would be to just not make the copy that lua_pushstring does.

unsafe casting in F# with zero copy semantics

I'm trying to achieve a static cast like coercion that doesn't result in copying of any data.
A naive static cast does not work
let pkt = byte_buffer :> PktHeader
FS0193: Type constraint mismatch. The type byte[] is not compatible with type PktHeader The type 'byte[]' is not compatible with the type 'PktHeader' (FS0193) (program)
where the packet is initially held in a byte array because of the way System.Net.Sockets.Socket.Receive() is defined.
The low level packet struct is defined something like this
[<Struct; StructLayout(LayoutKind.Explicit)>]
type PktHeader =
[<FieldOffset(0)>] val mutable field1: uint16
[<FieldOffset(2)>] val mutable field2: uint16
[<FieldOffset(4)>] val mutable field3: uint32
.... many more fields follow ....
Efficiency is important in this real world scenario because wasteful copying of data could rule out F# as an implementation language.
How do you achieve zero copy efficiencies in this scenario?
EDIT on Nov 29
my question was predicated on the implicit belief that a C/C++/C# style unsafe static cast is a useful construct, as if this is self evident. However, on 2nd thought this kind of cast is not idiomatic in F# since it is inherently an imperative language technique fraught with peril. For this reason I've accepted the answer by V.B. where SBE/FlatBuffers data access is promulgated as best practice.
A pure F# approach for conversion
let convertByteArrayToStruct<'a when 'a : struct> (byteArr : byte[]) =
let handle = GCHandle.Alloc(byteArr, GCHandleType.Pinned)
let structure = Marshal.PtrToStructure (handle.AddrOfPinnedObject(), typeof<'a>)
handle.Free()
structure :?> 'a
This is a minimum example but I'd recommend introducing some checks on the length of the byte array because, as it's written there, it will produce undefined results if you give it a byte array which is too short. You could check against Marshall.SizeOf(typeof<'a>).
There is no pure F# solution to do a less safe conversion than this (and this is already an approach prone to runtime failure). Alternative options could include interop with C# to use unsafe and fixed to do the conversion.
Ultimately though, you are asking for a way to subvert the F# type system which is not really what the language is designed for. One of the principle advantages of F# is the power of the type system and it's ability to help you produce statically verifiable code.
F# and very low-level performance optimizations are not best friends, but then... some smart people do magic even with Java, which doesn't have value types and real generic collections for them.
1) I am a big fan of a flyweight pattern lately. If you architecture allows for it, you could wrap a byte array and access struct members via offsets. A C# example here. SBE/FlatBuffers even have tools to generate a wrapper automatically from a definition.
2) If you could stay within unsafe context in C# to do the work, pointer casting is very easy and efficient. However, that requires pinning the byte array and keeping its handle for later release, or staying within fixed keyword. If you have many small ones without a pool, you could have problems with GC.
3) The third option is abusing .NET type system and cast a byte array with IL like this (this could be coded in F#, if you insist :) ):
static T UnsafeCast(object value) {
ldarg.1 //load type object
ret //return type T
}
I tried this option and even have a snippet somewhere if you need, but this approach makes me uncomfortable because I do not understand its consequences to GC. We have two objects backed by the same memory, what would happen when one of them is GCed? I was going to ask a new question on SO about this detail, will post it soon.
The last approach could be good for arrays of structs, but for a single struct it will box it or copy it anyway. Since structs are on the stack and passed by value, you will probably get better results just by casting a pointer to byte[] in unsafe C# or using Marshal.PtrToStructure as in another answer here, and then copy by value. Copying is not the worst thing, especially on the stack, but allocation of new objects and GC is the enemy, so you need byte arrays pooled and this will add much more to the overall performance than you struct casting issue.
But if your struct is very big, option 1 could still be better.

Are there any optimisations for retrieving of return value in Delphi?

I'm trying to find an elegant way to access the fields of some objects in some other part of my program through the use of a record that stores a byte and accesses fields of another record through the use of functions with the same name as the record's fields.
TAilmentP = Record // actually a number but acts like a pointer
private
Ordinal: Byte;
public
function Name: String; inline;
function Description: String; inline;
class operator Implicit (const Number: Byte): TAilmentP; inline;
End;
TSkill = Class
Name: String;
Power: Word;
Ailment: TAilmentP;
End;
class operator TAilmentP.Implicit (const Number: Byte): TAilmentP;
begin
Result.Ordinal := Number;
ShowMessage (IntToStr (Integer (#Result))); // for release builds
end;
function StrToAilment (const S: String): TAilmentP; // inside same unit
var i: Byte;
begin
for i := 0 to Length (Ailments) - 1 do
if Ailments [i].Name = S then
begin
ShowMessage (IntToStr (Integer (#Result))); // for release builds
Result := i; // uses the Implicit operator
Exit;
end;
raise Exception.Create ('"' + S + '" is not a valid Ailment"');
end;
Now, I was trying to make my life easier by overloading the conversion operator so that when I try to assign a byte to a TAilmentP object, it assigns that to the Ordinal field.
However, as I've checked, it seems that this attempt is actually costly in terms of performance since any call to the implicit "operator" will create a new TAilmentP object for the return value, do its business, and then return the value and make a byte-wise copy back into the object that called it, as the addresses differ.
My code calls this method quite a lot, to be honest, and it seems like this is slower than just assigning my value directly to the Ordinal field of my object.
Is there any way to make my program actually assign the value directly to my field through the use of ANY method/function? Even inlining doesn't seem to work. Is there a way to return a reference to a (record) variable, rather than an object itself?
Finally (and sorry for being off topic a bit), why is operator overloading done through static functions? Wouldn't making them instance methods make it faster since you can access object fields without dereferencing them? This would really come in handy here and other parts of my code.
[EDIT] This is the assembler code for the Implicit operator with all optimizations on and no debugging features (not even "Debug Information" for breakpoints).
add al, [eax] /* function entry */
push ecx
mov [esp], al /* copies Byte parameter to memory */
mov eax, [esp] /* copies stored Byte back to register; function exit */
pop edx
ret
What's even funnier is that the next function has a mov eax, eax instruction at start-up. Now that looks really useful. :P Oh yeah, and my Implicit operator didn't get inlined either.
I'm pretty much convinced [esp] is the Result variable, as it has a different address than what I'm assigning to. With optimizations off, [esp] is replaced with [ebp-$01] (what I assigning to) and [ebp-$02] (the Byte parameter), one more instruction is added to move [ebp-$02] into AL (which then puts it in [ebp-$01]) and the redundant mov instruction is still there with [epb-$02].
Am I doing something wrong, or does Delphi not have return-value optimizations?
Return types — even records — that will fit in a register are returned via a register. It's only larger types that are internally transformed into "out" parameters that get passed to the function by reference.
The size of your record is 1. Making a copy of your record is just as fast as making a copy of an ordinary Byte.
The code you've added for observing the addresses of your Result variables is actually hurting the optimizer. If you don't ask for the address of the variable, then the compiler is not required to allocate any memory for it. The variable could exist only in a register. When you ask for the address, the compiler needs to allocate stack memory so that it has an address to give you.
Get rid of your "release mode" code and instead observe the compiler's work in the CPU window. You should be able to observe how your record exists primarily in registers. The Implicit operator might even compile down to a no-op since the input and output registers should both be EAX.
Whether operators are instance methods or static doesn't make much difference, especially not in terms of performance. Instance methods still receive a reference to the instance they're called on. It's just a matter of whether the reference has a name you choose or whether it's called Self and passed implicitly. Although you wouldn't have to write "Self." in front of your field accesses, the Self variable still needs to get dereferenced just like the parameters of a static operator method.
All I'll say about optimizations in other languages is that you should look up the term named return-value optimization, or its abbreviation NRVO. It's been covered on Stack Overflow before. It has nothing to do with inlining.
Delphi is supposed to optimize return assignment by using pointers. This is also true for C++ and other OOP compiled languages. I stopped writing Pascal before operator overloading was introduced, so my knowledge is a bit dated. What follows is what I would try:
What I'm thinking is this... can you create an object on the heap (use New) and pass a pointer back from your "Implicit" method? This should avoid unnecessary overhead, but will cause you to deal with the return value as a pointer. Overload your methods to deal with pointer types?
I'm not sure if you can do it this with the built-in operator overloading. Like I mentioned, overloading is something I wanted in Pascal for nearly a decade and never got to play with. I think it's worth a shot. You might need to accept that you'll must kill your dreams of elegant type casting.
There are some caveats with inlining. You probably already know that the hint is disabled (by default) for debug builds. You need to be in release mode to profile / benchmark or modify your build settings. If you haven't gone into release mode (or altered build settings) yet, it's likely that your inline hints are being ignored.
Be sure to use const to hint the compiler to optimize further. Even if it doesn't work for your case, it's a great practice to get into. Marking what should not change will prevent all kinds of disasters... and additionally give the compiler the opportunity to aggressively optimize.
Man, I wish I know if Delphi allowed cross-unit inlining by now, but I simply don't. Many C++ compilers only inline within the same source code file, unless you put the code in the header (headers have no correlate in Pascal). It's worth a search or two. Try to make inlined functions / methods local to their callers, if you can. It'll at least help compile time, if not more.
All out of ideas. Hopefully, this meandering helps.
Now that I think about it, maybe it's absolutely necessary to have the return value in a different memory space and copied back into the one being assigned to.
I'm thinking of the cases where the return value may need to be de-allocated, like for example calling a function that accepts a TAilmentP parameter with a Byte value... I don't think you can directly assign to the function's parameters since it hasn't been created yet, and fixing that would break the normal and established way of generating function calls in assembler (i.e.: trying to access a parameter's fields before it's created is a no-no, so you have to create that parameter before that, then assign to it what you have to assign OUTSIDE a constructor and then call the function in assembler).
This is especially true and obvious for the other operators (with which you could evaluate expressions and thus need to create temporary objects), just not so much this one since you'd think it's like the assignment operator in other languages (like in C++, which can be an instance member), but it's actually much more than that - it's a constructor as well.
For example
procedure ShowAilmentName (Ailment: TAilmentP);
begin
ShowMessage (Ailment.Name);
end;
[...]
begin
ShowAilmentName (5);
end.
Yes, the implicit operator can do that too, which is quite cool. :D
In this case, I'm thinking that 5, like any other Byte, would be converted into a TAilmentP (as in creating a new TAilmentP object based on that Byte) given the implicit operator, the object then being copied byte-wise into the Ailment parameter, then the function body is entered, does it's job and on return the temporary TAilmentP object obtained from conversion is destroyed.
This is even more obvious if Ailment would be const, since it would have to be a reference, and constant one too (no modifying after the function was called).
In C++, the assignment operator would have no business with function calls. Instead, one could've used a constructor for TAilmentP which accepts a Byte parameter. The same can be done in Delphi, and I suspect it would take precedence over the implicit operator, however what C++ doesn't support but Delphi does is to have down-conversion to primitive types (Byte, Integer, etc.) since the operators are overloaded using class operators. Thus, a procedure like "procedure ShowAilmentName (Number: Byte);" would never be able to accept a call like "ShowAilmentName (SomeAilment)" in C++, but in Delphi it can.
So, I guess this is a side-effect of the Implicit operator also being like a constructor, and this is necessary since records can not have prototypes (thus you could not convert both one way and the other between two records by just using constructors).
Anyone else think this might be the cause?

Delphi; performance of passing const strings versus passing var strings

Quick one; am I right in thinking that passing a string to a method 'as a CONST' involves more overhead than passing a string as a 'VAR'? The compiler will get Delphi to make a copy of the string and then pass the copy, if the string parameter is declared as a CONST, right?
The reason for the question is a bit tedious; we have a legacy Delphi 5 utility whose days are truly numbered (the replacement is under development). It does a large amount of string processing, frequently passing 1-2Kb strings between various functions and procedures. Throughout the code, the 'correct' observation of using CONST or VAR to pass parameters (depending on the job in hand) has been adhered to. We're just looking for a few 'quick wins' that might shave a few microseconds off the execution time, to tide us over until the new version is ready. We thought of changing the memory manager from the default Delphi 5 one to FastMM, and we also wondered if it was worth altering the way the strings are passed around - because the code is working fine with the strings passed as const, we don't see a problem if we changed those declarations to var - the code within that method isn't going to change the string.
But would it really make any difference in real terms? (The program really just does a large amount of processing on these 1kb+ish strings; several hundred strings a minute at peak times). In the re-write these strings are being held in objects/class variables, so they're not really being copied/passed around in the same way at all, but in the legacy code it's very much 'old school' pascal.
Naturally we'll profile an overall run of the program to see what difference we've made but there's no point in actually trying this if we're categorically wrong about how the string-passing works in the first instance!
No, there shouldn't be any performance difference between using const or var in your case. In both cases a pointer to the string is passed as the parameter. If the parameter is const the compiler simply disallows any modifications to it. Note that this does not preclude modifications to the string if you get tricky:
procedure TForm1.Button1Click(Sender: TObject);
var
s: string;
begin
s := 'foo bar baz';
UniqueString(s);
SetConstCaption(s);
Caption := s;
end;
procedure TForm1.SetConstCaption(const AValue: string);
var
P: PChar;
begin
P := PChar(AValue);
P[3] := '?';
Caption := AValue;
end;
This will actually change the local string variable in the calling method, proof that only a pointer to it is passed.
But definitely use FastMM4, it should have a much bigger performance impact.
const for parameters in Delphi essentially means "I'm not going to mutate this, and I also don't care if this is passed by value or by reference - whichever is most efficient is fine by me". The bolded part is important, because it is actually observable. Consider this code:
type TFoo =
record
x: integer;
//dummy: array[1..10] of integer;
end;
procedure Foo(var x1: TFoo; const x2: TFoo);
begin
WriteLn(x1.x);
WriteLn(x2.x);
Inc(x1.x);
WriteLn;
WriteLn(x1.x);
WriteLn(x2.x);
end;
var
x: TFoo;
begin
Foo(x, x);
ReadLn;
end.
The trick here is that we pass the same variable both as var and as const, so that our function can mutate via one argument, and see if this affects the other. If you try it with code above, you'll see that incrementing x1.x inside Foo doesn't change x2.x, so x2 was passed by value. But try uncommenting the array declaration in TFoo, so that its size becomes larger, and running it again - and you'll see how x2.x now aliases x1.x, so we have pass-by-reference for x2 now!
To sum it up, const is always the most efficient way to pass parameter of any type, but you should not make any assumptions about whether you have a copy of the value that was passed by the caller, or a reference to some (potentially mutated by other code that you may call) location.
This is really a comment, but a long one so bear with me.
About 'so called' string passing by value
Delphi always passes string and ansistring (WideStrings and ShortStrings excluded) by reference, as a pointer.
So strings are never passed by value.
This can be easily tested by passing 100MB strings around.
As long as you don't change them inside the body of the called routine string passing takes O(1) time (and with a small constant at that)
However when passing a string without var or const clause, Delphi does three things.
Increase the reference count of the string.
put an implicit try-finally block around the procedure, so the reference count of the string parameter gets decreased again when the method exits.
When the string gets changed (and only then) Delphi makes a copy of the string, decreases the reference count of the passed string and uses the copy in the rest of the routine.
It fakes a pass by value in doing this.
About passing by reference (pointer)
When the string is passed as a const or var, Delphi also passes a reference (pointer), however:
The reference count of the string does not increase. (tiny, tiny speed increase)
No implicit try/finally is put around the routine, because it is not needed. This is part 1 why const/var string parameters execute faster.
When the string is changed inside the routine, no copy is make the actual string is changed. For const parameters the compiler prohibits string alternations. This is part 2 of why var/const string parameters work faster.
If however you need to create a local var to assign the string to; Delphi copies the string :-) and places an implicit try/finally block eliminating 99%+ of the speed gain of a const string parameter.
Hope this sheds some light on the issue.
Disclaimer: Most of this info comes from here, here and here
The compiler won't make a copy of the string when using const afaik. Using const saves you the overhead of incrementing/decrementing the refcounter for the string that you use.
You will get a bigger performanceboost by upgrading the memorymanager to FastMM, and, because you do a lot with strings, consider using the FastCode library.
Const is already the most efficient way of passing parameters to a function. It avoids creating a copy (default, by value) or even passing a pointer (var, by reference).
It is particularly true for strings and was indeed the way to go when computing power was limited and not to be wasted (hence the "old school" label).
IMO, const should have been the default convention, being up to the programmer to change it when really needed to by value or by var. That would have been more in line with the overall safety of Pascal (as in limiting the opportunity of shooting oneself in the foot).
My 2¢...

Resources