There is a question I have been wondering about for ages and I was hoping someone could give me an answer to rest my mind.
Let's assume that I have an input stream (like a file/socket/pipe) and want to parse the incoming data. Let's assume that each block of incoming data is split by a newline, like most common internet protocols. This application could just as well be parsing html, xml or any other smart data structure. The point is that the data is split into logical blocks by a delimiter rather than a fixed length. How can I buffer the data to wait for the delimiter to appear?
The answer seems simple enough: just have a large enough byte/char array to fit the entire thing.
But what if the delimiter comes after the buffer is full? This is actually a question about how to fit a dynamic block of data in a fixed size block. I can only really think of a few alternatives:
Increase the buffer size when needed. This may require heavy memory reallocation, and may lead to resource exhaustion from specially crafted streams (or perhaps even denial of service in the case of sockets where we want to protect ourselves against exhaustion attacks and drop connections that try to exhaust resources...and an attacker starts sending fake, oversized, packets to trigger the protection).
Start overwriting old data by using a circular buffer. Perhaps not an ideal method since the logical block would become incomplete.
Dump new data when the buffer is full. However, this way the delimiter will never be found, so this choice is obviously not a good option.
Just make the fixed size buffer damn large and assume all incoming logical data blocks is within its bounds...and if it ever fills, just interpret the full buffer as a logical block...
In either case I feel we must assume that the logical blocks will never exceed a certain size...
Any thoughts on this topic? Obviously there must be a way since the higher level languages offer some sort of buffering mechanisms with their readLine() stream methods.
Is there any "best way" to solve this or is there always a tradeoff? I really appreciate all thoughts and ideas on this topic since this question has been haunting me everytime I have needed to write a parser of some sort.
There are normally two techniques for this
1) What I think readline uses - if the buffer fills return the data without the delimiter on the end
2) When the buffer fills, remeber it filled, keep reading until you get the delimiter and report an error (or truncate the record at the buffer size)
Options (2) and (3) are out as you are losing data in both cases. Option (4) of a huge fixed size buffer would not solve the problem as it is just not possible to know what size is large enough? Is it all the physical memory + swap space + the free space available in all disks everywhere in the known universe?
Resizing the buffer looks like the best solution. Say realloc to twice the size and continue writing. There is always a chance of a specially constructed stream like a DoS that tries to bring down the system. My first thought was so set an arbitrarily large size as the max_size for the buffer. However, if we could do that, we could just set that as the size of the large buffer. So, resizing the buffer looks like the best option to me.
If the protocol or you do not define a upper bound for the length of each block then I don't see how you can prevent memory exhaustion edge cases.
Assuming that there is an upper bound using a fixed size block seems like a good approach for reasonably sized limits.
If the limits are high enough that a single fixed buffer will be inefficient then I would suggest using a data structure that is implemented internally as a linked list of fixed size buffers.
Why do you have to wait to start processing?
Generally alternative 4 is sound. It doesn't however, require an "assumption", rather a definition. You simply declare that blocks are smaller than 8K and be done with it. It's not difficult to do.
Further, there's alternative 5: Start processing partial buffers. This works unless you have designed a truly pathological protocol that sends critical data at the very end of the block.
HTML, XML, JSON/YAML, etc., can all be parsed incrementally. You don't require a delimeter to do useful processing.
Related
I am writing an application which sends and receives packages using UDP. However, the documentation of recv_from states:
If a message is too long to fit in the supplied buffer, excess bytes may be discarded.
Is there any way to receive all bytes and write them into a vector? Do I really have to allocate an array with the maximum packet length (which, as far as I know, is 65,507 bytes for IPv4) in order to be sure to receive all data? That seems a bit much for me.
Check out the next method in the docs, UdpSocket::peek_from (emphasis mine):
Receives a single datagram message on the socket, without removing it from the queue.
You can use this method to read a known fixed amount of data, such as a header which contains the length of the entire packet. You can use crates like byteorder to decode the appropriate part of the header, use that to allocate exactly the right amount of space, then call recv_from.
This does require that the protocol you are implementing always provides that total size information at a known location.
Now, is this a good idea?
As ArtemGr states:
Because extra system calls are much more expensive than getting some space from the stack.
And from the linked question:
Obviously at some point you will start wondering if doubling the number of system calls to save memory is worth it. I think it isn't.
With the recent Spectre / Meltdown events, now's a pretty good time to be be reminded to avoid extra syscalls.
You could, as suggested, just allocate a "big enough" array ahead of time. You'll need to track how many bytes you've actually read vs allocated though. I recommend something like arrayvec to make it easier.
You could instead implement a pool of pre-allocated buffers on the heap. When you read from the socket, you use a buffer or create a new one. When you are done with the buffer, you put it back in the pool for reuse. That way, you incur the memory allocation once and are only passing around small Vecs on the stack.
See also:
How can I create a stack-allocated vector-like container?
How large should my recv buffer be when calling recv in the socket library
How to read UDP packet with variable length in C
My system needs to store data in an EEPROM flash. Strings of bytes will be written to the EEPROM one at a time, not continuously at once. The length of strings may vary. I want the strings to be saved in order without wasting any space by continuing from the last write address. For example, if the first string of bytes was written at address 0x00~0x08, then I want the second string of bytes to be written starting at address 0x09.
How can it be achieved? I found that some EEPROM's write command does not require the address to be specified and just continues from lastly written point. But EEPROM I am using does not support that. (I am using Spansion's S25FL1-K). I thought about allocating part of memory to track the address and storing the address every time I write, but that might wear out flash faster. What is widely used method to handle such case?
Thanks.
EDIT:
What I am asking is how to track/save the address in a non-volatile way so that when next write happens, I know what address to start.
I never worked with this particular flash, but I've implemented something similar. Unfortunately, without knowing your constrains / priorities (memory or CPU efficient, how often write happens etc.) it is impossible to give a definite answer. Here are some techniques that you may want to consider. I don't know if they are widely used though.
Option 1: Write X bytes containing string length before the string. Then on initialization you could parse your flash: read the length n, jump n bytes forward; read the next byte. If it's empty (all ones for your flash according to the datasheet) then you got your first empty bit. Otherwise you've just read the length of the next string, so do the same over again.
This method allows you to quickly search for the last used sector, since the first byte of the used sector is guaranteed to have a value. The flip side here is overhead of extra n bytes (depending on the max string length) each time you write a string, and having to parse it to get the value (although this can only be done once on boot).
Option 2: Instead of prepending the size, append the unique "end-of-string" sequence, and then parse on boot for the last sequence before ones that represent empty flash.
Disadvantage here is longer parse, but you possibly could get away with just 1 byte-long overhead for each string.
Option 3 would be just what you already thought of: allocating a separate sector that would contain the value you need. To reduce flash wear you could also write these values back-to-back and search for the last one each time you boot. Also, you might consider the expected lifetime of the device that you program versus 100,000 erases that your flash can sustain (again according to the datasheet) - is wearing even a problem? That of course depends on how often data will be saved.
Hope that helps.
I need to be extremely concerned with speed/latency in my current multi-threaded project.
Cache access is something I'm trying to understand better. And I'm not clear on how lock-free queues (such as the boost::lockfree::spsc_queue) access/use memory on a cache level.
I've seen queues used where the pointer of a large object that needs to be operated on by the consumer core is pushed into the queue.
If the consumer core pops an element from the queue, I presume that means the element (a pointer in this case) is already loaded into the consumer core's L2 and L1 cache. But to access the element, does it not need to access the pointer itself by finding and loading the element either from either the L3 cache or across the interconnect (if the other thread is on a different cpu socket)? If so, would it maybe be better to simply send a copy of the object that could be disposed of by the consumer?
Thank you.
C++ principally a pay-for-what-you-need eco-system.
Any regular queue will let you choose the storage semantics (by value or by reference).
However, this time you ordered something special: you ordered a lock free queue.
In order to be lock free, it must be able to perform all the observable modifying operations as atomic operations. This naturally restricts the types that can be used in these operations directly.
You might doubt whether it's even possible to have a value-type that exceeds the system's native register size (say, int64_t).
Good question.
Enter Ringbuffers
Indeed, any node based container would just require pointer swaps for all modifying operations, which is trivially made atomic on all modern architectures.
But does anything that involves copying multiple distinct memory areas, in non-atomic sequence, really pose an unsolvable problem?
No. Imagine a flat array of POD data items. Now, if you treat the array as a circular buffer, one would just have to maintain the index of the buffer front and end positions atomically. The container could, at leisure update in internal 'dirty front index' while it copies ahead of the external front. (The copy can use relaxed memory ordering). Only as soon as the whole copy is known to have completed, the external front index is updated. This update needs to be in acq_rel/cst memory order[1].
As long as the container is able to guard the invariant that the front never fully wraps around and reaches back, this is a sweet deal. I think this idea was popularized in the Disruptor Library (of LMAX fame). You get mechanical resonance from
linear memory access patterns while reading/writing
even better if you can make the record size aligned with (a multiple) physical cache lines
all the data is local unless the POD contains raw references outside that record
How Does Boost's spsc_queue Actually Do This?
Yes, spqc_queue stores the raw element values in a contiguous aligned block of memory: (e.g. from compile_time_sized_ringbuffer which underlies spsc_queue with statically supplied maximum capacity:)
typedef typename boost::aligned_storage<max_size * sizeof(T),
boost::alignment_of<T>::value
>::type storage_type;
storage_type storage_;
T * data()
{
return static_cast<T*>(storage_.address());
}
(The element type T need not even be POD, but it needs to be both default-constructible and copyable).
Yes, the read and write pointers are atomic integral values. Note that the boost devs have taken care to apply enough padding to avoid False Sharing on the cache line for the reading/writing indices: (from ringbuffer_base):
static const int padding_size = BOOST_LOCKFREE_CACHELINE_BYTES - sizeof(size_t);
atomic<size_t> write_index_;
char padding1[padding_size]; /* force read_index and write_index to different cache lines */
atomic<size_t> read_index_;
In fact, as you can see, there are only the "internal" index on either read or write side. This is possible because there's only one writing thread and also only one reading thread, which means that there could only be more space at the end of write operation than anticipated.
Several other optimizations are present:
branch prediction hints for platforms that support it (unlikely())
it's possible to push/pop a range of elements at once. This should improve throughput in case you need to siphon from one buffer/ringbuffer into another, especially if the raw element size is not equal to (a whole multiple of) a cacheline
use of std::unitialized_copy where possible
The calling of trivial constructors/destructors will be optimized out at instantiation time
the unitialized_copy will be optimized into memcpy on all major standard library implementations (meaning that e.g. SSE instructions will be employed if your architecture supports it)
All in all, we see a best-in-class possible idea for a ringbuffer
What To Use
Boost has given you all the options. You can elect to make your element type a pointer to your message type. However, as you already raised in your question, this level of indirection reduces locality of reference and might not be optimal.
On the other hand, storing the complete message type in the element type could become expensive if copying is expensive. At the very least try to make the element type fit nicely into a cache line (typically 64 bytes on Intel).
So in practice you might consider storing frequently used data right there in the value, and referencing the less-of-used data using a pointer (the cost of the pointer will be low unless it's traversed).
If you need that "attachment" model, consider using a custom allocator for the referred-to data so you can achieve memory access patterns there too.
Let your profiler guide you.
[1] I suppose say for spsc acq_rel should work, but I'm a bit rusty on the details. As a rule, I make it a point not to write lock-free code myself. I recommend anyone else to follow my example :)
I'm writing a package which makes heavy use of buffers internally for temporary storage. I have a single global (but not exported) byte slice which I start with 1024 elements and grow by doubling as needed.
However, it's very possible that a user of my package would use it in such a way that caused a large buffer to be allocated, but then stop using the package, thus wasting a large amount of allocated heap space, and I would have no way of knowing whether to free the buffer (or, since this is Go, let it be GC'd).
I've thought of three possible solutions, none of which is ideal. My question is: are any of these solutions, or maybe ones I haven't thought of, standard practice in situations like this? Is there any standard practice? Any other ideas?
Screw it.
Oh well. It's too hard to deal with this, and leaving allocated memory lying around isn't so bad.
The problem with this approach is obvious: it doesn't solve the problem.
Exported "I'm done" or "Shrink internal memory usage" function.
Export a function which the user can call (and calling it intelligently is obviously up to them) which will free the internal storage used by the package.
The problem with this approach is twofold. First, it makes for a more complex, less clean interface to the user. Second, it may not be possible or practical for the user to know when calling such a function is wise, so it may be useless anyway.
Run a goroutine which frees the buffer after a certain period of the package going unused, or which shrinks the buffer (perhaps halving the length) whenever its size hasn't been increased in a while.
The problem with this approach is primarily that it puts unnecessary strain on the scheduler. Obviously a single goroutine isn't so bad, but if this were accepted practice, it wouldn't scale well if every package you imported were doing this under the hood. Also, if you have a time-sensitive application, you may not want code running when you're not aware of it (that is, you may assume that the package isn't doing any work when its functions are not being called - a reasonable assumption, I'd say).
So... any ideas?
NOTE: You can see the existing project here (the relevant code is only a few tens of lines).
A common approach to this is letting the client pass an existing []byte (or whatever) as an argument to some call/function/method. For example:
// The returned slice may be a sub-slice of dst if dst was large enough
// to hold the entire encoded block. Otherwise, a newly allocated slice
// will be returned. It is valid to pass a nil dst.
func Foo(dst []byte, whatever Bar) (ret []byte, err error)
(Example)
Another approach is to get a new []byte from a, for example cache and/or for example pool (if you prefer the later name for that concept) and rely on clients to return used buffers to such "recycle-bin".
BTW: You're doing it right by thinking about this. Where it's possible to reasonably reuse []byte buffers, there's a potential for lowering the GC load and thus making your program better performing. Sometimes the difference can be critical.
You could reslice your buffer at the end of every operation.
buffer = buffer[:0]
Then your function extendAndSliceBuffer would have the original backing array most likely available if it needs to grow. If not, you would suffer a new allocation, which you might get anyway when you do extendAndSliceBuffer.
Overall, I think a cleaner solution is to do like #jnml said and let the users pass their own buffer if they care about performance. If they don't care about performance, then you should not use a global var and simply allocate the buffer as you need and let it go when it gets out of scope.
I have a single global (but not exported) byte slice which I start
with 1024 elements and grow by doubling as needed.
And there's your problem. You shouldn't have a global like this in your package.
Generally the best approach is to have an exported struct with attached functions. The buffer should reside in this struct unexported. That way the user can instantiate it and let the garbage collector clean it up when they let go of it.
You also want to avoid requiring globals like this as it can hamper unit tests. A unit test should be able to instantiate the exported struct, as the user can, and do it each time for every test.
Also depending on what kind of buffer you need, bytes.Buffer may be useful as it already provides io.Reader and io.Writer functions. bytes.Buffer also automatically grows and shrinks its buffer. In buffer.go you'll see various calls to b.Truncate(0) that does the shrinking with the comment "reset to recover space".
It's generally really really bad form to write Go code that is not thread-safe. If two different goroutines call functions that modify the buffer at the same time, who knows what state the buffer will be in when they finish? Just let the user provide a scratch-space buffer if they decide that the allocation performance is a bottleneck.
I see the word "BUFFER" everywhere, but I am unable to grasp what it exactly is.
Would anybody please explain what is buffer in layman's language?
When is it used?
How is it used?
Imagine that you're eating candy out of a bowl. You take one piece regularly. To prevent the bowl from running out, someone might refill the bowl before it gets empty, so that when you want to take another piece, there's candy in the bowl.
The bowl acts as a buffer between you and the candy bag.
If you're watching a movie online, the web service will continually download the next 5 minutes or so into a buffer, that way your computer doesn't have to download the movie as you're watching it (which would cause hanging).
The term "buffer" is a very generic term, and is not specific to IT or CS. It's a place to store something temporarily, in order to mitigate differences between input speed and output speed. While the producer is being faster than the consumer, the producer can continue to store output in the buffer. When the consumer gets around to it, it can read from the buffer. The buffer is there in the middle to bridge the gap.
If you average out the definitions at http://en.wiktionary.org/wiki/buffer, I think you'll get the idea.
For proof that we really did "have to walk 10 miles thought the snow every day to go to school", see TOPS-10 Monitor Calls Manual Volume 1, section 11.9, "Using Buffered I/O", at bookmark 11-24. Don't read if you're subject to nightmares.
A buffer is simply a chunk of memory used to hold data. In the most general sense, it's usually a single blob of memory that's loaded in one operation, and then emptied in one or more, Perchik's "candy bowl" example. In a C program, for example, you might have:
#define BUFSIZE 1024
char buffer[BUFSIZE];
size_t len = 0;
// ... later
while((len=read(STDIN, &buffer, BUFSIZE)) > 0)
write(STDOUT, buffer, len);
... which is a minimal version of cp(1). Here, the buffer array is used to store the data read by read(2) until it's written; then the buffer is re-used.
There are more complicated buffer schemes used, for example a circular buffer, where some finite number of buffers are used, one after the next; once the buffers are all full, the index "wraps around" so that the first one is re-used.
Buffer means 'temporary storage'. Buffers are important in computing because interconnected devices and systems are seldom 'in sync' with one another, so when information is sent from one system to another, it has somewhere to wait until the recipient system is ready.
Really it would depend on the context in each case as there is no one definition - but speaking very generally a buffer is an place to temporarily hold something. The best real world analogy I can think of would be a waiting area. One simple example in computing is when buffer refers to a part of RAM used for temporary storage of data.
That a buffer is "a place to store something temporarily, in order to mitigate differences between input speed and output speed" is accurate, consider this as an even more "layman's" way of understanding it.
"To Buffer", the verb, has made its way into every day vocabulary. For example, when an Internet connection is slow and a Netflix video is interrupted, we even hear our parents say stuff like, "Give it time to buffer."
What they are saying is, "Hit pause; allow time for more of the video to download into memory; and then we can watch it without it stopping or skipping."
Given the producer / consumer analogy, Netflix is producing the video. The viewer is consuming it (watching it). A space on your computer where extra downloaded video data is temporarily stored is the buffer.
A video progress bar is probably the best visual example of this:
That video is 5:05. Its total play time is represented by the white portion of the bar (which would be solid white if you had not started watching it yet.)
As represented by the purple, I've actually consumed (watched) 10 seconds of the video.
The grey portion of the bar is the buffer. This is the video data that that is currently downloaded into memory, the buffer, and is available to you locally. In other words, even if your Internet connection where to be interrupted, you could still watch the area you have buffered.
Buffer is temporary placeholder (variables in many programming languages) in memory (ram/disk) on which data can be dumped and then processing can be done.
The term "buffer" is a very generic term, and is not specific to IT or CS. It's a place to store something temporarily, in order to mitigate differences between input speed and output speed. While the producer is being faster than the consumer, the producer can continue to store output in the buffer. When the consumer speeds up, it can read from the buffer. The buffer is there in the middle to bridge the gap.
A buffer is a data area shared by hardware devices or program processes that operate at different speeds or with different sets of priorities. The buffer allows each device or process to operate without being held up by the other. In order for a buffer to be effective, the size of the buffer and the algorithms for moving data into and out of the buffer.
buffer is a "midpoint holding place" but exists not so much to accelerate the speed of an activity as to support the coordination of separate activities.
This term is used both in programming and in hardware. In programming, buffering sometimes implies the need to screen data from its final intended place so that it can be edited or otherwise processed before being moved to a regular file or database.
Buffer is temporary placeholder (variables in many programming languages) in memory (ram/disk) on which data can be dumped and then processing can be done.
There are many advantages of Buffering like it allows things to happen in parallel, improve IO performance, etc.
It also has many downside if not used correctly like buffer overflow, buffer underflow, etc.
C Example of Character buffer.
char *buffer1 = calloc(5, sizeof(char));
char *buffer2 = calloc(15, sizeof(char));