How does the Rust `String` type/`read_line` function know how much memory is needed without explicitly being told? - memory

In C, before using the scanf or gets "stdio.h" functions to get and store user input, the programmer has to manually allocate memory for the data that's read to be stored in. In Rust, the std::io::Stdin.read_line function can seemingly be used without the programmer having to manually allocate memory prior. All it needs is for there to be a mutable String variable to store the data it reads in. How does it do this seemingly without knowledge about how much memory will be required?

Well, if you want a detailed explanation, you can dig a bit into the read_line method which is part of the BufRead trait. Heavily simplified, the function look like this.
fn read_line(&mut self, target: &mut String)
loop {
// That method fills the internal buffer of the reader (here stdin)
// and returns a slice reference to whatever part of the buffer was filled.
// That buffer is actually what you need to allocate in advance in C.
let available = self.fill_buf();
match memchr(b'\n', available) {
Some(i) => {
// A '\n' was found, we can extend the string and return.
target.push_str(&available[..=i]);
return;
}
None => {
// No '\n' found, we just have to extend the string.
target.push_str(available);
},
}
}
}
So basically, that method extends the string as long as it does not find a \n character in stdin.
If you want to allocate a bit of memory in advance for the String that you pass to read_line, you can create it using String::with_capacity. This will not prevent the String to reallocate if it is not large enough though.

Related

Why doesn't File provide an interface to read the contents to return Uint8List data in dart?

I want to read the contents of a file piece by piece through an interface (instead of reading the whole file at once with readAsBytes()). openRead() seems to do the trick, but it returns a List<int> type. And I expect it to be Uint8List, because I want to do block operations on some of the contents.
If you convert the returned List<int> to Uint8List, it seems to make a copy of the contents, which is a big loss in efficiency.
Is this how it was designed?
Historically Dart used List<int> for sequences of bytes before a more specific Uint8List class was added. A Uint8List is a subtype of List<int>, and in most cases where a Dart SDK function returns a List<int> for a list of bytes, it's actually a Uint8List object. You therefore usually can just cast the result:
var file = File('/path/to/some/file');
var stream = file.openRead();
await for (var chunk in stream) {
var bytes = chunk as Uint8List;
}
If you are uncomfortable relying on the cast, you can create a helper function that falls back to creating a copy if and only if necessary.
There have been efforts to change the Dart SDK function signatures to use Uint8List types explicitly, and that has happened in some cases (e.g. File.readAsBytes). Such changes would be breaking API changes, so they cannot be done lightly. I don't know why File.openRead was not changed, but it's quite likely that the amount of breakage was deemed to be not worth the effort. (At a minimum, the SDK documentation should be updated to indicate whether it is guaranteed to return a Uint8List object. Also see https://github.com/dart-lang/sdk/issues/39947)
Alternatively, instead of using File.openRead, you could use File.open and then use RandomAccessFile.read, which is declared to return a Uint8List.

How to load/save wxString from/to wxStream or wxMemoryBuffer?

I have my own class (nBuffer) like wxMemoryBuffer and I use it to load/save custom data, it's more convenient than using streams because I have a lot of overloaded methods for different data types based on these:
class nBuffer
{ // ...
bool wr(void* buf, long unsigned int length);// write
bool rd(void* buf, long unsigned int length);// read
}
I'm trying to implemets methods to load/save wxString from/to this buffer.
With wxWidgets 2.8 I've used the next code (simplified):
bool nBuffer::wrString(wxString s)
{ // save string:
int32 lng=s.Length()*4;
wr(&lng,4);// length
wr(s.GetData(),lng);// string itself
return true;
}
bool nBuffer::rdString(wxString &s)
{ // load string:
uint32 lng;
rd(&lng,4);// length
s.Alloc(lng);
rd(s.GetWriteBuf(lng),lng);// string itself
s.UngetWriteBuf();
s=s.Left(lng/4);
return true;
}
This code is not good because:
Is assumes there are 4 bytes of data for each string character (it might be less),
With wxWidgets 3.0, wxString.GetData() returns wxCStrData instead of *void, so the compiler fails on wr(s.GetData(),lng); and I have no idea of how to convert it to a simple byte buffer.
Strange, but I found nothing googling that for hours... Also I've found nothing useful in wxWidgets docs.
The questions are:
That is the preferred, correct and safe way to convert wxString to byte buffer,
The same about converting the byte buffer back to wxString.
For arbitrary wxStrings you need to serialize them in either UTF-8 or UTF-16 format. The former is a de facto standard for data exchange, so I advise to use it, but you could prefer UTF-16 if you know that your data is biased to the sort of characters that take less space in it than in UTF-8 and if space saving is important for you.
Assuming you use UTF-8, serializing is done using utf8_str() method:
wxScopedCharBuffer const utf8 = s.utf8_str();
wr(utf8.data(), utf8.length());
Deserializing is as simple as using wxString::FromUTF8(data, length).
For UTF-16 you would use general mb_str(wxMBConvUTF16) and wxString(data, wxMBConvUTF16, length) methods, which could also be used with wxMBConvUTF8, but the UTF-8-specific methods above are more convenient and, in some build configurations, more efficient.

Using Corba string_dup versus using pointer to const

There is something I don't get, please enlighten me.
Is there a difference between the following (client side code)?
1) blah = (const char *)"dummy";
2) blah = CORBA::string_dup("dummy");
... just googling a bit I see string_dup() returns a char * so the 2 may be equivalent.
I was thinking 2) does 2 deep copies and not 1.
I'm firing the question anyway now, please briefly confirm.
Thanks!
const char* blah = "dummy";
The C++ compiler generates a constant array of characters, null-terminated, somewhere in a data section of your executable. blah gets a pointer to it.
char* blah = CORBA::string_dup("dummy");
The function string_dup() is called with an argument that is a pointer to that constant array of characters. string_dup() then allocates memory from the free store and copies the string data into the free-store-allocated memory. The pointer to the free-store memory is returned to the caller. It is the caller's job to dispose of the memory when finished with CORBA::string_free(). Technically the ORB implementation is allowed to use some special free-store, but most likely it is just using the standard heap / free-store that the rest of your application is using.
It is often much better to do this:
CORBA::String_var s = CORBA::string_dup("dummy");
The String_var's destructor will automatically call string_free() when s goes out of scope.

How can I load an unnamed function in Lua?

I want users of my C++ application to be able to provide anonymous functions to perform small chunks of work.
Small fragments like this would be ideal.
function(arg) return arg*5 end
Now I'd like to be able to write something as simple as this for my C code,
// Push the function onto the lua stack
lua_xxx(L, "function(arg) return arg*5 end" )
// Store it away for later
int reg_index = luaL_ref(L, LUA_REGISTRY_INDEX);
However I dont think lua_loadstring will do "the right thing".
Am I left with what feels to me like a horrible hack?
void push_lua_function_from_string( lua_State * L, std::string code )
{
// Wrap our string so that we can get something useful for luaL_loadstring
std::string wrapped_code = "return "+code;
luaL_loadstring(L, wrapped_code.c_str());
lua_pcall( L, 0, 1, 0 );
}
push_lua_function_from_string(L, "function(arg) return arg*5 end" );
int reg_index = luaL_ref(L, LUA_REGISTRY_INDEX);
Is there a better solution?
If you need access to parameters, the way you have written is correct. lua_loadstring returns a function that represents the chunk/code you are compiling. If you want to actually get a function back from the code, you have to return it. I also do this (in Lua) for little "expression evaluators", and I don't consider it a "horrible hack" :)
If you only need some callbacks, without any parameters, you can directly write the code and use the function returned by lua_tostring. You can even pass parameters to this chunk, it will be accessible as the ... expression. Then you can get the parameters as:
local arg1, arg2 = ...
-- rest of code
You decide what is better for you - "ugly code" inside your library codebase, or "ugly code" in your Lua functions.
Have a look at my ae. It caches functions from expressions so you can simply say ae_eval("a*x^2+b*x+c") and it'll only compile it once.

BlackBerry J2ME Efficient Coding GuideLines? Could Somebody elaborate this?

I found the folliwing code sample in BlackBerry Java Development, Best Practices. Could somebody explain what the below same code means? What is the this in the code sample poining to?
Avoiding StringBuffer.append (StringBuffer)
To append a String buffer to another, a BlackBerry® Java Application should use net.rim.device.api.util.StringUtilities.append( StringBuffer dst, StringBuffer src[, int offset, int length ] ).
Code sample
public synchronized StringBuffer append(Object obj) {
if (obj instanceof StringBuffer) {
StringBuffer sb = (StringBuffer)obj;
net.rim.device.api.util.StringUtilities.append( this, sb, 0, sb )
return this;
}
return append(String.valueOf(obj));
}
StringBuffer does not offer an overload for the append() method that takes another StringBuffer. This means developers are likely to use StringBuffer.append(String str) and call .toString() on the second StringBuffer. This requires the second buffer to be turned into a string, which is immutable, and then the characters from the string are appended to the first StringBuffer. Thus every character in the second buffer is touched twice, and there is the unnecessary allocation of the String just to transfer the characters to the first StringBuffer.
The efficient way of doing this would copy each character from the second buffer onto the end of the first. However, StringBuffer does not provide any easy way of doing this. Thus the recommendation is to use StringUtilities.append(StringBuffer, StringBuffer) which is able to directly read the characters from the second buffer without copying them into an intermediate collection.
This saves the runtime of the extra copying, the runtime needed to allocate a temporary String, and the memory needed to allocate a temporary string.
It means that the StringBuffer class is not implemented efficiently. Java Strings are supposed to be immutable, that's what StringBuffer is used for. However, the StringBuffer class you're using is not efficient when using StringBuffer.append() so you need to use net.rim.device.api.util.StringUtilities. That's what the code is doing, encapsulating the use of that class in a new append() method.

Resources