Flash Memory Management

Flash Memory Management - memory

I'm collecting data on an ARM Cortex M4 based evaluation kit in a remote location and would like to log the data to persistent memory for access later.
I would be logging roughly 300 bytes once every hour, and would want to come collect all the data with a PC after roughly 1 week of running.
I understand that I should attempt to minimize the number of writes to flash, but I don't have a great understanding of the best way to do this. I'm looking for a resource that would explain memory management techniques for this kind of situation.
I'm using the ADUCM350 which looks like it has 3 separate flash sections (128kB, 256kB, and a 16kB eeprom).

For logging applications the simplest and most effective wear leveling tactic is to treat the entire flash array as a giant ring buffer.
define an entry size to be some integer fraction of the smallest erasable flash unit. Say a sector is 4K(4096 bytes); let the entry size be 256.
This is to make all log entries be sector aligned and will allow you to erase any sector without cuting a log entry in half.
At boot, walk the memory and find the first empty entry. this is the 'write_pointer'
when a log entry is written, simply write it to write_pointer and increment write_pointer.
If write_pointer is on a sector boundary erase the sector at write_pointer to make room for the next write. essentially this guarantees that there is at least one empty log entry for you to find at boot and allows you to restore the write_pointer.
if you dedicate 128KBytes to the log entries and have an endurance of 20000 write/erase cycles. this should give you a total of 10240000 entries written before failure. or 1168 years of continuous logging...

Related

Time required to access the memory locations in the same cache line

Consider the big box in the following figure as a cache and the block as a single cache line inside the cache.
The CPU fetched the data (first 4 elements of the array A) from RAM into the cache block.
Now, my question is, does it takes exactly same time to perform read/write operations on all the 4 memory locations (A[0], A[1], A[2] and A[3]) in the cache block or is it approximately same?
PS: I am expecting an answer for ideal case where runtime to perform any read/write operation on any memory location is not affected by the operating system jitter on user processes or applications.

With the line already hot in cache, time is constant for access to any aligned word in the cache. The hardware that handles the offset-within-line part of an address doesn't have to iterate through to the right position or anything, it just MUXes those bytes to the output.
If the line was not already hot in cache, then it depends on the design of the cache. If the CPU doesn't transfer around whole lines at once over a wide bus, then one / some words of the line will arrive before others. A cache that supports early-restart can let the load complete as soon as the needed word arrives.
A critical-word-first bus and memory allow that word to be the first one transferred for a demand-miss. Otherwise they arrive in some fixed order, and a cache miss on the last word of the line could take an extra few cycles.
Related:
Does cacheline size affect memory access latency?
if cache miss happens, the data will be moved to register directly or first moved to cache then to register?
which is optimal a bigger block cache size or a smaller one?

Storing and loading 2d infinite procedurally generated tile based world

Background:
I am working on a 2d infinite world generation. It is tile based meaning my terrain is fully made out of squares. You can imagine it like 2d Minecraft (looking at the terrain from above).
I implemented standard chunk system where the terrain gets chopped into small 8x8 tile areas that get loaded and deleted as player moves around the world. This, so far mentioned, works perfectly smooth without any hiccups or lag. I am using Lua and Corona SDK.
The problem:
Since the player will be able to modify the terrain, I need a fast and efficient system of saving chunks in memory once the player loads a new chunk and a system of loading those chunks from memory if they have been loaded previously.
This is where the problem takes place. It needs to read from and save to files (memory) quite often which causes noticeable lag. Making chunks bigger is not an option.
Solutions I tried but all caused lag:
a) First and obvious solution I implemented was to just create a text file for each chunk with tile names as strings. It looked something like this: x12y10.txt and inside the file I just dumped all tile names in order they need to be placed on screen: "Grass Grass Water Sand Sand Sand Grass Grass...". That worked but loading strings was slow so I tried another solution: save tiles as indexes.
b) Saving tiles as their indexes. I paired every tile to a number. Since numbers are shorter, they take less memory and are faster to load. I gave each tile it's own index: Grass -> id 1, Water -> id 2, Sand -> id 3 and so on. This way I only needed to save 1 or 2 chars instead of full string per tile. My txt files looked like this now: "1 1 2 3 3 3 1 1...". This worked better but still caused lag.
c) Next improvement I did was with how chunks are organized in memory. Instead of dumping all the chunks in a single folder, for each x coordinate I made a folder and put all chunks that have that x value in there.
So instead of this:
Folder with all chunks: x0y0.txt, x0y1.txt, x0y2.txt, x1y0.txt, x1y1.txt, x1y2.txt
Inside folder with all chunks I had this:
Folder x0: x0y0.txt, x0y1.txt, x0y2.txt
Folder x1: x1y0.txt, x1y1.txt, x1y2.txt
I am not sure how much this helped for small number of chunks, but I am pretty sure for thousands of chunks, improvement is there.
Possible solutions?
I have some ideas for improvements, but I would like to hear your opinion on the solutions.
a) Saving terrain in binary files?
b) I have read about Minecraft region format, really tried to understand how it works, but did not get it since there is little information about it. So if anyone knows it and could explain their system to me, I would be really grateful.
c) Another faster file format?
d) Is making/accessing many folders slow? Is there a better alternative?

I really feel like this is cs-101 question, but cannot google up any answer right away so quick summary.
All files are just sequences of bytes. If we're talking about reading and writing raw bytes, no format will make 64 bytes appear in memory faster than another.
Text file is a sequence of bytes with slight limitations on their values (well, the limitation is if you want standard text programs to display it). A string "11" (sequence of bits: 110001110001) from a text file won't be loaded faster than sequence of unprintable bits 100000100000 from "binary" file.
Structuring directories at the very least reduces the number of nodes system checks when trying the file you've requested to open. But mechanisms underlying the filesystems are very complex and affected by a lot of factors. The overall guess is that frequent reading even of small files will be slow. And all files carry some stockpiling overhead (system info to keep them tracked and ordered), small files will have lower useful/auxiliary info ratio. I know of at least one 2d project with mutable map that was making hdds growl and grunt before they moved onto bigger files years ago.
You don't have to make chunks bigger, that's different thing, but you can write them into the same file.
Instead of million of files by 64 bytes you can have a single megabyte file (assuming you use a byte per chunk). A million chunks is lot for a player to modify or walk around. If you unpack that data to tables, it will take up more space but you don't have to decipher all the string, only the currently needed bytes. Yes, modifying a megabyte string in lua will cause creation of another megabyte string which is slow, but you don't have to do it every time, or you can split string into smaller ones and modify those. And only do writing when needed. I/O bufferization may even happen without your intervention but again it is usually helpful for big files.
Yes there will be more than a byte of info per tile (2^8 possible states per tile is a lot however), the system stays the same.
The same thing is done for textures, because loading data in a single big chunk in a single big scoop is faster than searching around for tiny bit here and there. Indexing a single long area of memory is also faster than chasing pointers around.
On top of that, you may try to read\write less bytes than you want in the memory. For example by compressing data.

In minecraft chunks are not stored unless they have been visited / modified, otherwise they are generated.
That would leave you a system where only blocks which have been modified by the player would need to be stored, with the un-modified areas being re-generated by using the same random seed, each time.
Creating a hierarchy of modifications ... A chunk is an 8x8 block, create a super-chunk which is 8x8 chunks, and only look for a file, if any of the 8x8 super-chunk has been modified.
Possibly store all of the super-chunk in one file, which would limit the number of files (adding more files does decrease the speed of the system, and also uses space on the system inefficiently).
If you have any spare time-space, perhaps have a cache of the chunks near the player, and pre-load the modified areas which are being approached. This would limit the visible lag required

Tracking address when writing to flash

My system needs to store data in an EEPROM flash. Strings of bytes will be written to the EEPROM one at a time, not continuously at once. The length of strings may vary. I want the strings to be saved in order without wasting any space by continuing from the last write address. For example, if the first string of bytes was written at address 0x00~0x08, then I want the second string of bytes to be written starting at address 0x09.
How can it be achieved? I found that some EEPROM's write command does not require the address to be specified and just continues from lastly written point. But EEPROM I am using does not support that. (I am using Spansion's S25FL1-K). I thought about allocating part of memory to track the address and storing the address every time I write, but that might wear out flash faster. What is widely used method to handle such case?
Thanks.
EDIT:
What I am asking is how to track/save the address in a non-volatile way so that when next write happens, I know what address to start.

I never worked with this particular flash, but I've implemented something similar. Unfortunately, without knowing your constrains / priorities (memory or CPU efficient, how often write happens etc.) it is impossible to give a definite answer. Here are some techniques that you may want to consider. I don't know if they are widely used though.
Option 1: Write X bytes containing string length before the string. Then on initialization you could parse your flash: read the length n, jump n bytes forward; read the next byte. If it's empty (all ones for your flash according to the datasheet) then you got your first empty bit. Otherwise you've just read the length of the next string, so do the same over again.
This method allows you to quickly search for the last used sector, since the first byte of the used sector is guaranteed to have a value. The flip side here is overhead of extra n bytes (depending on the max string length) each time you write a string, and having to parse it to get the value (although this can only be done once on boot).
Option 2: Instead of prepending the size, append the unique "end-of-string" sequence, and then parse on boot for the last sequence before ones that represent empty flash.
Disadvantage here is longer parse, but you possibly could get away with just 1 byte-long overhead for each string.
Option 3 would be just what you already thought of: allocating a separate sector that would contain the value you need. To reduce flash wear you could also write these values back-to-back and search for the last one each time you boot. Also, you might consider the expected lifetime of the device that you program versus 100,000 erases that your flash can sustain (again according to the datasheet) - is wearing even a problem? That of course depends on how often data will be saved.
Hope that helps.

How is RAM able to acess any place in memory at O(1) speed

We are taught that the abstraction of the RAM memory is a long array of bytes. And that for the CPU it takes the same amount of time to access any part of it. What is the device that has the ability to access any byte out of the 4 gigabytes (on my computer) in the same time? As this does not seem as a trivial task for me.
I have asked colleagues and my professors, but nobody can pinpoint to the how this task can be achieved with simple logic gates, and if it isn't just a tricky combination of logic gates, then what is it?
My personal guess is that you could achieve the access of any memory in O(log(n)) speed, where n would be the size of memory. Because each gate would split the memory in two and send you memory access instruction to the next split the memory in two gate. But that requires ALOT of gates. I can't come up with any other educated guess, and I don't even know the name of the device that I should look up in Google.
Please help my anguished curiosity, and thanks in advance.
edit<
This is what I learned!
quote from yours "the RAM can send the value from cell addressed X to some output pins", here is where everyone skip (again) the thing that is not trivial for me. The way that I see it, In order to build a gate that from 64 pins decides which byte out of 2^64 to get, each pin needs to split the overall possible range of memory into two. If bit at index 0 is 0 -> then the address is at memory 0-2^64/2, else address is at memory 2^64/2-2^64. And so on, However the amout of gates (lets call them) that the memory fetch will go through will be 64, (a constant). However the amount of gates needed is N, where N is the number of memory bytes there are.
Just because there is 64 pins, it doesn't mean that you can still decode it into a single fetch from a range of 2^64. Does 4 gigabytes memory come with a 4 gigabytes gates in the memory control???
now this can be improved, because as I read furiously more and more about how this memory is architectured, if you place the memory into a matrix with sqrt(N) rows and sqrt(N) columns, the amount of gates that a fetch memory will need to go through is O(log(sqrt(N)*2) and the amount of gates that will be required will be 2*sqrt(N), which is much better, and I think that its probably a trade secret.
/edit<

What the heck, I might as well make this an answer.
Yes, in the physical world, memory access cannot be constant time.
But it cannot even be logarithmic time. The O(log n) circuit you have in mind ultimately involves some sort of binary (or whatever) tree, and you cannot make a binary tree with constant-length wires in a 3D universe.
Whatever the "bits per unit volume" capacity of your technology is, storing n bits requires a sphere with radius O(n^(1/3)). Since information can only travel at the speed of light, accessing a bit at the other end of the sphere requires time O(n^(1/3)).
But even this is wrong. If you want to talk about actual limitations of our universe, our physics friends say the absolute maximum number of bits you can store in any sphere is proportional to the sphere's surface area, not its volume. So the actual radius of a minimal sphere containing n bits of information is O(sqrt(n)).
As I mentioned in my comment, all of this is pretty much moot. The models of computation we generally use to analyze algorithms assume constant-access-time RAM, which is close enough to the truth in practice and allows a fair comparison of competing algorithms. (Although good engineers working on high-performance code are very concerned about locality and the memory hierarchy...)

Let's say your RAM has 2^64 cells (places where it is possible to store a single value, let's say 8-bit). Then it needs 64 pins to address every cell with a different number. When at the input pins of your RAM there 'appears' a binary number X the RAM can send the value from cell addressed X to some output pins, and your CPU can get the value from there. In hardware the addressing can be done quite easily, for example by using multiple NAND gates (such 'addressing device' from some logic gates is called a decoder).
So it is all happening at the hardware-level, this is just direct addressing. If the CPU is able to provide 64 bits to 64 pins of your RAM it can address every single memory cell (as 64 bit is enough to represent any number up to 2^64 -1). The only reason why you do not get the value immediately is a kind of 'propagation time', so time it takes for the signal to go through all the logic gates in the circuit.

The component responsible for dealing with memory accesses is the memory controller. It is used by the CPU to read from and write to memory.
The access time is constant because memory words are truly layed out in a matrix form (thus, the "byte array" abstraction is very realistic), where you have rows and columns. To fetch a given memory position, the desired memory address is passed on to the controller, which then activates the right column.
From http://computer.howstuffworks.com/ram1.htm:
Memory cells are etched onto a silicon wafer in an array of columns
(bitlines) and rows (wordlines). The intersection of a bitline and
wordline constitutes the address of the memory cell.
So, basically, the answer to your question is: the memory controller figures it out. Of course that, given a memory address, the mapping to column and row must be easy to calculate in a constant time.
To fully understand this topic, I recommend you to read this guide on how memory works: http://computer.howstuffworks.com/ram.htm
There are so many concepts to master that it is difficult to explain it all in one answer.

I've been reading your comments and questions until I answered. I think you are on the right track, but there is some confusion here. The random access in which you are implying doesn't exist in the same way you think it does.
Reading, writing, and refreshing are done in a continuous cycle. A particular cell in memory is only read or written in a certain interval if a signal is detected to do so in that cycle. There is going to be support circuitry that includes "sense amplifiers to amplify the signal or charge detected on a memory cell."
Unless I am misunderstanding what you are implying, your confusion is in how easy it is to read/write to a cell. It's different dependent on chip design but there IS a minimum number of cycles it takes to read or write data to a cell.
These are my sources:
http://www.doc.ic.ac.uk/~dfg/hardware/HardwareLecture16.pdf
http://www.electronics.dit.ie/staff/tscarff/memory/dram_cycles.htm
http://www.ece.cmu.edu/~ece548/localcpy/dramop.pdf
To avoid a humungous answer, I left most of the detail out but all three of these will describe the process you are looking for.

What does it mean by buffer?

I see the word "BUFFER" everywhere, but I am unable to grasp what it exactly is.
Would anybody please explain what is buffer in layman's language?
When is it used?
How is it used?

Imagine that you're eating candy out of a bowl. You take one piece regularly. To prevent the bowl from running out, someone might refill the bowl before it gets empty, so that when you want to take another piece, there's candy in the bowl.
The bowl acts as a buffer between you and the candy bag.
If you're watching a movie online, the web service will continually download the next 5 minutes or so into a buffer, that way your computer doesn't have to download the movie as you're watching it (which would cause hanging).

The term "buffer" is a very generic term, and is not specific to IT or CS. It's a place to store something temporarily, in order to mitigate differences between input speed and output speed. While the producer is being faster than the consumer, the producer can continue to store output in the buffer. When the consumer gets around to it, it can read from the buffer. The buffer is there in the middle to bridge the gap.
If you average out the definitions at http://en.wiktionary.org/wiki/buffer, I think you'll get the idea.
For proof that we really did "have to walk 10 miles thought the snow every day to go to school", see TOPS-10 Monitor Calls Manual Volume 1, section 11.9, "Using Buffered I/O", at bookmark 11-24. Don't read if you're subject to nightmares.

A buffer is simply a chunk of memory used to hold data. In the most general sense, it's usually a single blob of memory that's loaded in one operation, and then emptied in one or more, Perchik's "candy bowl" example. In a C program, for example, you might have:
#define BUFSIZE 1024
char buffer[BUFSIZE];
size_t len = 0;
// ... later
while((len=read(STDIN, &buffer, BUFSIZE)) > 0)
write(STDOUT, buffer, len);
... which is a minimal version of cp(1). Here, the buffer array is used to store the data read by read(2) until it's written; then the buffer is re-used.
There are more complicated buffer schemes used, for example a circular buffer, where some finite number of buffers are used, one after the next; once the buffers are all full, the index "wraps around" so that the first one is re-used.

Buffer means 'temporary storage'. Buffers are important in computing because interconnected devices and systems are seldom 'in sync' with one another, so when information is sent from one system to another, it has somewhere to wait until the recipient system is ready.

Really it would depend on the context in each case as there is no one definition - but speaking very generally a buffer is an place to temporarily hold something. The best real world analogy I can think of would be a waiting area. One simple example in computing is when buffer refers to a part of RAM used for temporary storage of data.

That a buffer is "a place to store something temporarily, in order to mitigate differences between input speed and output speed" is accurate, consider this as an even more "layman's" way of understanding it.
"To Buffer", the verb, has made its way into every day vocabulary. For example, when an Internet connection is slow and a Netflix video is interrupted, we even hear our parents say stuff like, "Give it time to buffer."
What they are saying is, "Hit pause; allow time for more of the video to download into memory; and then we can watch it without it stopping or skipping."
Given the producer / consumer analogy, Netflix is producing the video. The viewer is consuming it (watching it). A space on your computer where extra downloaded video data is temporarily stored is the buffer.
A video progress bar is probably the best visual example of this:
That video is 5:05. Its total play time is represented by the white portion of the bar (which would be solid white if you had not started watching it yet.)
As represented by the purple, I've actually consumed (watched) 10 seconds of the video.
The grey portion of the bar is the buffer. This is the video data that that is currently downloaded into memory, the buffer, and is available to you locally. In other words, even if your Internet connection where to be interrupted, you could still watch the area you have buffered.

Buffer is temporary placeholder (variables in many programming languages) in memory (ram/disk) on which data can be dumped and then processing can be done.
The term "buffer" is a very generic term, and is not specific to IT or CS. It's a place to store something temporarily, in order to mitigate differences between input speed and output speed. While the producer is being faster than the consumer, the producer can continue to store output in the buffer. When the consumer speeds up, it can read from the buffer. The buffer is there in the middle to bridge the gap.

A buffer is a data area shared by hardware devices or program processes that operate at different speeds or with different sets of priorities. The buffer allows each device or process to operate without being held up by the other. In order for a buffer to be effective, the size of the buffer and the algorithms for moving data into and out of the buffer.
buffer is a "midpoint holding place" but exists not so much to accelerate the speed of an activity as to support the coordination of separate activities.
This term is used both in programming and in hardware. In programming, buffering sometimes implies the need to screen data from its final intended place so that it can be edited or otherwise processed before being moved to a regular file or database.

Buffer is temporary placeholder (variables in many programming languages) in memory (ram/disk) on which data can be dumped and then processing can be done.
There are many advantages of Buffering like it allows things to happen in parallel, improve IO performance, etc.
It also has many downside if not used correctly like buffer overflow, buffer underflow, etc.
C Example of Character buffer.
char *buffer1 = calloc(5, sizeof(char));
char *buffer2 = calloc(15, sizeof(char));

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart