Tracking address when writing to flash - memory

My system needs to store data in an EEPROM flash. Strings of bytes will be written to the EEPROM one at a time, not continuously at once. The length of strings may vary. I want the strings to be saved in order without wasting any space by continuing from the last write address. For example, if the first string of bytes was written at address 0x00~0x08, then I want the second string of bytes to be written starting at address 0x09.
How can it be achieved? I found that some EEPROM's write command does not require the address to be specified and just continues from lastly written point. But EEPROM I am using does not support that. (I am using Spansion's S25FL1-K). I thought about allocating part of memory to track the address and storing the address every time I write, but that might wear out flash faster. What is widely used method to handle such case?
Thanks.
EDIT:
What I am asking is how to track/save the address in a non-volatile way so that when next write happens, I know what address to start.

I never worked with this particular flash, but I've implemented something similar. Unfortunately, without knowing your constrains / priorities (memory or CPU efficient, how often write happens etc.) it is impossible to give a definite answer. Here are some techniques that you may want to consider. I don't know if they are widely used though.
Option 1: Write X bytes containing string length before the string. Then on initialization you could parse your flash: read the length n, jump n bytes forward; read the next byte. If it's empty (all ones for your flash according to the datasheet) then you got your first empty bit. Otherwise you've just read the length of the next string, so do the same over again.
This method allows you to quickly search for the last used sector, since the first byte of the used sector is guaranteed to have a value. The flip side here is overhead of extra n bytes (depending on the max string length) each time you write a string, and having to parse it to get the value (although this can only be done once on boot).
Option 2: Instead of prepending the size, append the unique "end-of-string" sequence, and then parse on boot for the last sequence before ones that represent empty flash.
Disadvantage here is longer parse, but you possibly could get away with just 1 byte-long overhead for each string.
Option 3 would be just what you already thought of: allocating a separate sector that would contain the value you need. To reduce flash wear you could also write these values back-to-back and search for the last one each time you boot. Also, you might consider the expected lifetime of the device that you program versus 100,000 erases that your flash can sustain (again according to the datasheet) - is wearing even a problem? That of course depends on how often data will be saved.
Hope that helps.

Related

How to convert hexadecimal data (stored in a string variable) to an integer value

Edit (abstract)
I tried to interpret Char/String data as Byte, 4 bytes at a time. This was because I could only get TComport/TDatapacket to interpret streamed data as String, not as any other data type. I still don't know how to get the Read method and OnRxBuf event handler to work with TComport.
Problem Summary
I'm trying to get data from a mass spectrometer (MS) using some Delphi code. The instrument is connected with a serial cable and follows the RS232 protocol. I am able to send commands and process the text-based outputs from the MS without problems, but I am having trouble with interpreting the data buffer.
Background
From the user manual of this instrument:
"With the exception of the ion current values, the output of the RGA are ASCII character strings terminated by a linefeed + carriage return terminator. Ion signals are represented as integers in units of 10^-16 Amps, and transmitted directly in hex format (four byte integers, 2's complement format, Least Significant Byte first) for maximum data throughput."
I'm not sure whether (1) hex data can be stored properly in a string variable. I'm also not sure how to (2) implement 2's complement in Delphi and (3) the Least Significant Byte first.
Following #David Heffernan 's advice, I went and revised my data types. Attempting to harvest binary data from characters doesn't work, because not all values from 0-255 can be properly represented. You lose data along the way, basically. Especially it your data is represented 4 bytes at a time.
The solution for me was to use the Async Professional component instead of Denjan's Comport lib. It handles datastreams better and has a built-in log that I could use to figure out how to interpret streamed resposes from the instrument. It's also better documented. So, if you're new to serial communications (like I am), rather give that a go.

Memory Locations of Variables when Using IA-32 Assembly Language

Quick question on memory locations in IA-32 assembly language that i cannot seem to find the answer for anywhere else.
On IA-32 each memory address is 4 bytes long (e.g. 0x0040120e). Each of these addresses points to a 1 byte value (or in the case of a larger value, the first byte of it). Now look at these two simple IA-32 assembly language statements:
var1 db 2
var2 db 3
This will place var1 and var2 in adjacent memory cells (let's say 0x0040120e and 0f). Now I realize that the define directive db allocates 1 byte to the value. But, in the case above I have two values (2 and 3) that in fact only requires two bits each, to be stored.
Questions:
When using the db directive, do these two values still consume a full byte, even though they are smaller than 1 byte?
Is using a full byte for values that could get away with less, still the common way to go (as we have so much memory that we don't care)?
Does integers 0 to 255 then generally take up 1 byte and integers 256 to (2^16 - 1) take up 2 bytes (a word), etc.?
Thank you,
Magnus
EDIT 1: Made questions more clear (apologies for the back and forth)
EDIT 2: Added a structured reply below, based on other posters' input
yes. the B in DB is for Byte.
You could use a nibble for each, like so:
combined db 0x23
but you'd have to
a) shift the result for 4 bits right if you need the "2".
b) mask the leftmost 4 bits if you need the "3".
Hardly worth the effort these days ;-)
Yes, since the architecture is byte-addressable and cannot address anything smaller than a byte.
This means that data requiring less than one byte will need to share its address with other data.
In practice this means that you're going to have to know which bits in the pointed-out byte are used for this particular value.
For hardware registers this sort of mapping is very common.
EDIT: Ah, you seem to mean "values of the same variable" when you said "2 and 3". I thought you meant 2-bit and 3-bit values. You need to decide how many bits are needed at most for a particular variable, for all the values you need that variable to be able to store. There are variable-length encodings for integers of course, but that's generally rarely used in assembly and not what you'd typically use for some general-purpose variable.
You generally should expect to reserve all bits required for all values that a variable need to hold, up front. Otherwise, if you're worried about "wasting memory", you would need to move all other variables as soon as you get some "vacant bits" somewhere. That would end up costing fantastically much. Also, knowing the size of a variable is constant makes it possible to generate (or write) the proper code to handle it, otherwise you would of course also need to explicitly store somewhere "the size of the value held in variable x is now y bits". That becomes extremely painful very very quickly.
My initial question was a bit unstructured, so for the benefit of other searchers stopping by here I will use the answers received from #unwind and #geert3 to create a structured response. Again, this was my fault due to the initial poor structuring and creds for the answers goes to #unwind and #geert3.
When using the db directive you allocate 1 byte to the variable, and even if the variable takes up less space than 1 byte, it will still consume that full 1-byte address spot. As one might guess, that wastes a few bits of memory, but that is okay as you have enough memory and not too bothered about wasting a couple of bits. The reason you want to use the full 1-byte memory location is that it is easier to reference the variable when it is alone in the address slot (see #geert3's note on how to access it if you use less than a byte), and additionally, in case you want to reuse the variable later, it is great to know you have space for any number up to 255.
Yes, see answer to 1
Yes, you would normally allocate multiples of a byte to a variable, in a byte-addressable system

Interview: System/API design

This question was asked in one of the big software company. I have come up with a simple solution and I want to know what others feel about the solution.
You are supposed to design an API and a backend for a system that can
allot phone numbers to people living in a city. The phone numbers will
start from 111-111-1111 and end at 999-999-9999. The API should enable
the clients (people in the city) to do the following:
When a client requests for a phone number, it allots one of the available numbers to them.
Some clients may want fancy numbers, so they can specifically ask for a number to be alloted to them. If the requested number is
available then the system allots it to them, otherwise the system
allots any available number.
The system need not have to know which number is alloted to which
client. The same client may make successive requests and get multiple
phone numbers for himself, but the system is not bothered. At any
point of time, the system only knows which phone numbers are alloted
and which phone numbers are free.
The numbers from 111-111-1111 to 999-999-9999 roughly corresponds to 8 billion numbers. Assuming that memory is not a constraint, I can think of the following two approaches (which are almost similar).
Maintain a huge boolean array of length 8 billion and have a next pointer that points to an array index (next is initialized to zero). If the value pointed by next is not free, then forward next until a free number is found. When fancy numbers are requested, just check whether the corresponding index position is free and return the number. The downside of this approach is, when allocating numbers in a regular way, if there is a huge chunk (say 1 billion) numbers in the middle that was allocated by fancy allocation, then the next pointer has to be moved 1 billion times.
To overcome the difficulty mentioned in the previos design, we can use some sort of a linked hashmap. We maintain a doubly linked list (this replaces the array in the previous design) and another array of the same length as the list where each element of the array points to a corresponding element in the list. So when allocating numbers in regular method, we advance a pointer in the linked list and mark nodes as and when we allocate (same as the previous method). When allocating fancy numbers, we can directly find the node in the list that corresponds to the special number requested by first indexing into the array and the following the pointer. Once the node is identified, short circuit the previous node and the next node so that we do not have to skip the used numbers one by one (which was the problem with the previous approach) when doing a regular allocation.
Let me know whether I am on the right track. Please enlighten me with any important details that I am missing.
You can do significantly better in the anser to this question.
First you should design you API. The one recommended by Icarus3 is perfectly good:
string acquireNextAvailableNumber();
boolean acquireRequestedNumber(string special);
The second one returns true (and reserves the number) if it is available, otherwise returns false.
The question doesn't specify how you allocate phone numbers, so allocate them to suit yourself. Make the first 'next available' request return "111-111-1111", the next "111-111-1112" etc. This means you can record all the numbers allocated through 'next' by just remembering the last one allocated. (You'll need to ask whether '111-111-1119" is followed by "111-111-1120" or 111-111-1121", but that's the sort of thing you should be asking anyway. In any case, the important thing is you can work out what is the next number knowing the last allocated one.)
Special requests you will need to store individually. A hash table work, but so does a BST or simply an ordered list. It depends on what tradeoffs you want between space and speed, and how often special numbers are likely to be requested. I'll use a BST (ordered by the number) in the rest of this, for reasons I'll come to.
So, how do you code this? For the next allocated number:
Look at the last allocated number, and find the next in sequence.
Check that number hasn't been allocated as a special number. You can do this very quickly with a BST because if it's there, it will be the lowest entry in the BST.
If the number was in the 'special numbers' database, increment the 'allocated numbers' value (to include that number) and remove the entry from the special numbers. Then repeat this process until you get a number that isn't in the special numbers.
Note that this process ensures that all 'special numbers' lower than the last one allocated by 'next' do not appear in the special numbers database. As the 'last normal number allocated' increases, it absorbs any special numbers allocated that were less than that, removing them from the table. This is what ensures that when we ask whether the next number in sequence is in the special numbers database, we only have to look at the lowest entry.
Checking for a special number is easy. If it is lower than the last 'normal' number allocated it isn't available. Otherwise you check to see if it exists in the BST. If it doesn't, you add it to the BST.
You can optimize this process by storing not just single numbers in the BST, but storing ranges of numbers. If the allocated special numbers are dense, then it reduces the amount of space in the tree and the number of accesses to find if one is in there. During the test to find if the 'next' number discovers a rnage of size n, then you can immediately increment the highest normal number by n, instead of having to go round the loop n times.
First, you did not prototype your APIs. For example, if I have to design these APIs I will publish 2 APIs.
string acquireNextAvailableNumber();
string acquireRequestedNumber(string special);
Second, you need to decide how you are going to implement it. code driven or data driven ?
You can maintain hash for all these numbers ( it will consume memory ) and quickly query the availability of the number. Or
you could maintain single list to store only distributed numbers ( less memory ). So, whenever request comes, you start searching 1 to n numbers in that list ( increased time-complexity ). if any first (or requested) number isn't there then you allocate it to client and add that entry in the list.
As, there are billion numbers, you will need to consider the trade-off between space and time.
You could also take the advantage of the database.
To enhance previous answers, any BST may not be good enough as insertions or deletions can make it unbalanced. A balanced BST, e.g. Red-Black Tree, should be a good choice.
So, a Red-Black Tree can be created and filled in the beginning to represent available numbers, and each allocation should remove an element from it.
init(from, to) - can be done in O(n) time, a straightforward implementation would be O(n log n). But that is a one-time initialization on your server's start
acquireNextAvailableNumber() - should remove smallest element, time cost O(1)
acquireRequestedNumber(special) - should make a search and remove element if found, guaranteed time cost O(log n)
In Java, a TreeSet<String> or TreeSet<Integer> could be used since it is implemented with Red-Black Tree.
The next question would probably have been that several request-processing threads would access your API, so since Java's TreeSet is not thread-safe, you should have wrapped it at initialization like so:
TreeSet numbers = init(...);
SortedSet availableNumbers = Collections.synchronizedSortedSet(numbers);

String to Byte [delphi]

I need to store my data into memory. My type data of my data is string. I want to minimize the memory usage. I guess I have to change string into byte. Am I right? If I convert string to byte, that means I have to convert string to TMemoryStream?
If you really want to convert it then this code will get it done
var
BinarySize: Integer;
InputString: string;
StringAsBytes: array of Byte;
begin
BinarySize := (Length(InputString) + 1) * SizeOf(Char);
SetLength(StringAsBytes, BinarySize);
Move(InputString[1], StringAsBytes[0], BinarySize);
But as already stated this will not save you memory. The ammount of it used will be practically the same. You will gain nothing from this alone. If you are having to many strings take a different approach. Like something from this list of choices:
Use a dictionary and only store each same string once
Only hold a portion of all strings in memory. Some sort of cache. Have others on hard drive and use streams to load them
If you have very large string consider compressing them.
If you are reading from file and you target is binary data, skip the string in the middle. Read the source directly into a byte buffer.
It is hard to give further help without knowing more about the problem.
EDIT:
If you really want a minimum memory footprint and you can live with a little lower speed (but still very fast) you can use Suffix Trie or B-Tree or event a simple Binary Tree. They can work directly from hard drive and can be very fast for searching. If you then cache a subset of the data to RAM, you get the optimal solution memory vs. speed wise.
Anyway given the ammount of data you claim to have it seems no memory optimization is needed at all. 22MB of RAM is hardly an issue and not worth optimizing.
Are you certain this is an optimization that is needed?
2000 lines that are 10 characters long is only 20000 characters.
In most environments, that's tiny. Most machines have considerably more RAM than that. Most disks are considerably larger than that. And, usually, sending and receiving that much information is trivial over the web.
Perhaps your situation is unique. Maybe you have large number of 20000 character data sets, or very slow web access over which to transmit this date, etc. But, I'd encourage you to consider whether you aren't perhaps trying to optimize something that even if you are very successful in implementing, won't significantly change your application's performance in the real world.
Make your storage type tutf8string. It can be simply assigned from tunicodestring, and conversion should be safe.

Buffering data for delimiter separated blocks

There is a question I have been wondering about for ages and I was hoping someone could give me an answer to rest my mind.
Let's assume that I have an input stream (like a file/socket/pipe) and want to parse the incoming data. Let's assume that each block of incoming data is split by a newline, like most common internet protocols. This application could just as well be parsing html, xml or any other smart data structure. The point is that the data is split into logical blocks by a delimiter rather than a fixed length. How can I buffer the data to wait for the delimiter to appear?
The answer seems simple enough: just have a large enough byte/char array to fit the entire thing.
But what if the delimiter comes after the buffer is full? This is actually a question about how to fit a dynamic block of data in a fixed size block. I can only really think of a few alternatives:
Increase the buffer size when needed. This may require heavy memory reallocation, and may lead to resource exhaustion from specially crafted streams (or perhaps even denial of service in the case of sockets where we want to protect ourselves against exhaustion attacks and drop connections that try to exhaust resources...and an attacker starts sending fake, oversized, packets to trigger the protection).
Start overwriting old data by using a circular buffer. Perhaps not an ideal method since the logical block would become incomplete.
Dump new data when the buffer is full. However, this way the delimiter will never be found, so this choice is obviously not a good option.
Just make the fixed size buffer damn large and assume all incoming logical data blocks is within its bounds...and if it ever fills, just interpret the full buffer as a logical block...
In either case I feel we must assume that the logical blocks will never exceed a certain size...
Any thoughts on this topic? Obviously there must be a way since the higher level languages offer some sort of buffering mechanisms with their readLine() stream methods.
Is there any "best way" to solve this or is there always a tradeoff? I really appreciate all thoughts and ideas on this topic since this question has been haunting me everytime I have needed to write a parser of some sort.
There are normally two techniques for this
1) What I think readline uses - if the buffer fills return the data without the delimiter on the end
2) When the buffer fills, remeber it filled, keep reading until you get the delimiter and report an error (or truncate the record at the buffer size)
Options (2) and (3) are out as you are losing data in both cases. Option (4) of a huge fixed size buffer would not solve the problem as it is just not possible to know what size is large enough? Is it all the physical memory + swap space + the free space available in all disks everywhere in the known universe?
Resizing the buffer looks like the best solution. Say realloc to twice the size and continue writing. There is always a chance of a specially constructed stream like a DoS that tries to bring down the system. My first thought was so set an arbitrarily large size as the max_size for the buffer. However, if we could do that, we could just set that as the size of the large buffer. So, resizing the buffer looks like the best option to me.
If the protocol or you do not define a upper bound for the length of each block then I don't see how you can prevent memory exhaustion edge cases.
Assuming that there is an upper bound using a fixed size block seems like a good approach for reasonably sized limits.
If the limits are high enough that a single fixed buffer will be inefficient then I would suggest using a data structure that is implemented internally as a linked list of fixed size buffers.
Why do you have to wait to start processing?
Generally alternative 4 is sound. It doesn't however, require an "assumption", rather a definition. You simply declare that blocks are smaller than 8K and be done with it. It's not difficult to do.
Further, there's alternative 5: Start processing partial buffers. This works unless you have designed a truly pathological protocol that sends critical data at the very end of the block.
HTML, XML, JSON/YAML, etc., can all be parsed incrementally. You don't require a delimeter to do useful processing.

Resources