How memory is managed in ruby. For Ex: if we take the C program during execution, the following is the memory model. Similar to this
how memory is handled in ruby.
C:
__________________
| |
| stack |
| |
------------------
| |
| <Un Allocated|
| space> |
------------------
| |
| |
| Heap |
| |
| |
__________________
| |
| data |
__________________
| text |
__________________
Ruby:
?
There is no such thing as "memory" in Ruby.
Class#allocate allocates an object and returns that object. And that is the entire extent of interaction that a programmer can have with the object space subsystem.
Where that object is allocated, how it is allocated, if it stays at the same place in memory or moves around, none of that is specified or observable. For example, on MagLev, an object may actually not be allocated in memory at all, but on disk, or in another computer's memory. JRuby, IronRuby, Opal, Cardinal, MacRuby, etc. "outsource" their memory management to a third party, they literally don't even know what's happening to their memory.
A Ruby implementation may use a separate stack and heap, it may use a heap-allocated stack, it may not even use a stack at all (e.g. Cardinal).
Note: the ObjectSpace module allows a limited amount of introspection and reflection of the object space. In general, when I say something is "impossible" in Ruby, there's always an implicit caveat "unless you use reflection". However, even ObjectSpace does not leak any information about the organization of memory.
In YARV, there is also the objspace library and the GC module, which provide internal implementation details about YARV. However, they are private internal implementation details of YARV, they are not even guaranteed to exist in other implementations, and they may change at any time without notice even within YARV.
You may note that I didn't write anything about garbage collection! Well, actually, Ruby only specifies when objects are referenced and when they aren't. What to do with un-referenced objects, it doesn't say. It makes sense for an implementation to reclaim the space used by those unreferenced objects, and all of them do to some degree (e.g. older versions of YARV would not reclaim unreferenced Symbols), but it is not required nor specified. And all implementations use very different approaches. Again, JRuby, IronRuby, Opal, Cardinal, MacRuby, Topaz, MagLev, etc. "outsource" that problem to the underlying platform, Rubinius uses a generational, copying, moving, tracing collector based on the Immix collector, YARV uses a simple mark-and-sweep tracing collector.
Related
my redis version is redis-version 3.2.9 and I Modify redis.conf,
hash-max-ziplist-entries 256
hash-max-ziplist-value 4096
however, the results do not play As descriped in Memory Optimization(redis hash structure can make memory more-efficient),
as well, Capacity assessment also confuse me, I will show the result I get below
As showed above, redis string key-value: the first pic shows that 3085 and 4086 uses the same memory. The second pic shows that 4096 uses more memory(about 1024 byte per key), not 4096 per key. jemalooc
I hope someone can help me, thank you
Redis internally, for optimisation purpose, stores entries in a data-structure called ZipList which directly works with memory addresses.
So the optimisation is actually compaction and reduction of memory wastage in using and maintaining pointers.
ziplist:
+----+----+----+
| a | b | c |
+----+----+----+
now, let's say we did an update in value for b and the value size has increased from let's say 10 to 20 bytes.
We have no way to fit that value in between. So we do a zip-list resizing.
ziplist:
+----+--------+----+
| a | bb | c |
+----+--------+----+
So, when when doing resizing, it will create a new block of memory with the larger size and copy the old data to that newly allocated memory and then it will deallocate the old memory area.
Since memory is moved in such cases it leads to memory fragmentation.
Redis also does memory de-fragmentation which can bring this ratio down to less than 1.
This fragmentation is calculated as,
(resident memory) / (memory allocation)
How is resident memory less than allocated memory you ask!
Normally the allocated memory should be fully contained in the resident memory, nevertheless there are a few exceptions:
If parts of the virtual memory are paged out to disk, the resident memory can be smaller than the allocated memory.
There are cases of shared memory where the shared memory is marked as used, but not as resident.
I'm building an emulator for the MOS6502 processor, and at the moment I'm trying to simulate the stack in code, but I'm really failing to understand how the stack works in the context of the 6502.
One of the features of the 6502's stack structure is that when the stack pointer reaches the end of the stack it will wrap around, but I don't get how this feature even works.
Let's say we have a stack with 64 maximum values if we push the values x, y and z onto the stack, we now have the below structure. With the stack pointer pointing at address 0x62, because that was the last value pushed onto the stack.
+-------+
| x | 0x64
+-------+
| y | 0x63
+-------+
| z | 0x62 <-SP
+-------+
| | ...
+-------+
All well and good. But now if we pop those three values off the stack we now have an empty stack, with the stack pointer pointing at value 0x64
+-------+
| | 0x64 <-SP
+-------+
| | 0x63
+-------+
| | 0x62
+-------+
| | ...
+-------+
If we pop the stack a fourth time, the stack pointer wraps around to point at address 0x00, but what's even the point of doing this when there isn't a value at 0x00?? There's nothing in the stack, so what's the point in wrapping the stack pointer around????
I can understand this process when pushing values, if the stack is full and a value needs to be pushed to the stack it'll overwrite the oldest value present on the stack. This doesn't work for popping.
Can someone please explain this because it makes no sense.
If we pop the stack a fourth time, the stack pointer wraps around to point at address 0x00, but what's even the point of doing this when there isn't a value at 0x00?? There's nothing in the stack, so what's the point in wrapping the stack pointer around????
It is not done for a functional reason. The 6502 architecture was designed so that pushing and popping could be done by incrementing an 8 bit SP register without any additional checking. Checks for overflow or underflow of the SP register would involve more silicon to implement them, more silicon to implement the stack overflow / underflow handling ... and extra gate delays in a critical path.
The 6502 was designed to be cheap and simple using 1975 era chip technology1. Not fast. Not sophisticated. Not easy to program2
1 - According to Wikipedia, the original design had ~3200 or ~3500 transistors. One of the selling points of the 6502 was that it was cheaper than its competitors. Fewer transistors meant smaller dies, better yields and lower production costs.
2 - Of course, this is relative. Compared to some ISAs, the 6502 is easy because it is simple and orthogonal, and you have so few options to chose from. But compared to others, the limitations that make it simple actually make it difficult. For example, the fact that there are at most 256 bytes in the stack page that have to be shared by everything. It gets awkward if you are implementing threads or coroutines. Compare this with an ISA where the SP is a 16 bit register or the stack can be anywhere.
In an embedded environment (using MSP430), I have seen some data corruption caused by partial writes to non-volatile memory. This seems to be caused by power loss during a write (to either FRAM or info segments).
I am validating data stored in these locations with a CRC.
My question is, what is the correct way to prevent this "partial write" corruption? Currently, I have modified my code to write to two separate FRAM locations. So, if one write is interrupted causing an invalid CRC, the other location should remain valid. Is this a common practice? Do I need to implement this double write behavior for any non-volatile memory?
A simple solution is to maintain two versions of the data (in separate pages for flash memory), the current version and the previous version. Each version has a header comprising of a sequence number and a word that validates the sequence number - simply the 1's complement of the sequence number for example:
---------
| seq |
---------
| ~seq |
---------
| |
| data |
| |
---------
The critical thing is that when the data is written the seq and ~seq words are written last.
On start-up you read the data that has the highest valid sequence number (accounting for wrap-around perhaps - especially for short sequence words). When you write the data, you overwrite and validate the oldest block.
The solution you are already using is valid so long as the CRC is written last, but it lacks simplicity and imposes a CRC calculation overhead that may not be necessary or desirable.
On FRAM you have no concern about endurance, but this is an issue for Flash memory and EEPROM. In this case I use a write-back cache method, where the data is maintained in RAM, and when modified a timer is started or restarted if it is already running - when the timer expires, the data is written - this prevents burst-writes from thrashing the memory, and is useful even on FRAM since it minimises the software overhead of data writes.
Our engineering team takes a two pronged approach to these problem: Solve it in hardware and software!
The first is a diode and capacitor arrangement to provide a few milliseconds of power during a brown-out. If we notice we've lost external power, we prevent the code from entering any non-violate writes.
Second, our data is particularly critical for operation, it updates often and we don't want to wear out our non-violate flash storage (it only supports so many writes.) so we actually store the data 16 times in flash and protect each record with a CRC code. On boot, we find the newest valid write and then start our erase/write cycles.
We've never seen data corruption since implementing our frankly paranoid system.
Update:
I should note that our flash is external to our CPU, so the CRC helps validates the data if there is a communication glitch between the CPU and flash chip. Furthermore, if we experience several glitches in a row, the multiple writes protect against data loss.
We've used something similar to Clifford's answer but written in one write operation. You need two copies of the data and alternate between them. Use an incrementing sequence number so that effectively one location has even sequence numbers and one has odd.
Write the data like this (in one write command if you can):
---------
| seq |
---------
| |
| data |
| |
---------
| seq |
---------
When you read it back make sure both the sequence numbers are the same - if they are not then the data is invalid. At startup read both locations and work out which one is more recent (taking into account the sequence number rolling over).
Always store data in some kind of protocol , like START_BYTE, Total bytes to write, data , END BYTE.
Before writting to external / Internal memory always check POWER Moniter registers/ ADC.
if anyhow you data corrupts, END byte will also corrupt. So that entry will not vaild after validation of whole protocol.
checksum is not a good idea , you can choose CRC16 instead of that if you want to include CRC into your protocol.
What's the best way / least wait-causing way to synchronize read/write access to an instance variable in objective-c for iOS?
The variable gets read and written very often (let's say 1000 times per second read and written). It is not important that changes take effect immediately. It is not even important that reads get consistent data with one-another, but writes must sooner or later be reflected in the data acquired by reads. Is there some data structure which allows this?
I thought of this:
Create two variables instead of one variable; let's call them v[0] and v[1].
For each v[i], create a concurrent dispatch queue for constructing a readers-writer-locking mechanism around it. Let's call them q[i].
Now for a writing operation, only v[0] gets written to, adhering to the locking mechanism with q[0].
On a read operation, at first v[1] is read and only at a certain chance, e.g. 1%, the read operation looks into v[0] and updates v[1] if necessary.
The following pseudo-code illustrates this:
typedef int VType; // the type of the variable
VType* v; // array of first and second variable
dispatch_queue_t* q; // queues for synchronizing access to v[i]
- (void) setV:(VType)newV {
[self setV:newV at:0];
}
- (void) setV:(VType)newV at:(int)i {
dispatch_barrier_async(q[i], ^{
v[i] = newV;
});
}
- (VType) getV:(int)i {
__block VType result;
dispatch_sync(q[i], ^{
result = v[i];
});
return result;
}
- (VType) getV {
VType result = [self getV:1];
if ([self random] < 0.01) {
VType v0_result = [self getV:0];
if (v0_result != result) {
[self setV:v0_result at:1];
result = v0_result;
}
}
return result;
}
- (float) random {
// some random number generator - fast, but not necessarily good
}
This has the following benefits:
v[0] is usually not occupied with a read operation. Therefor, a write operation usually does not block.
At most times, v[1] does not get written to, thus read operations on this one usually don't block.
Still, if many read operations occur, eventually the written values are propagated from v[0] into v[1]. Some values might be missed, but that doesn't matter for my application.
What do you guys think, does this work? Are there better solutions?
UPDATE:
Some performance benchmarking (reads and writes of one benchmark at a time are done as quickly as possible concurrently for 1 second, one reading queue, one writing queue):
On iPhone 4S with iOS 7:
runMissingSyncBenchmark: 484759 w/s
runMissingSyncBenchmark: 489558 r/s
runConcurrentQueueRWSyncBenchmark: 2303 w/s
runConcurrentQueueRWSyncBenchmark: 2303 r/s
runAtomicPropertyBenchmark: 460479 w/s
runAtomicPropertyBenchmark: 462145 r/s
In Simulator with iOS 7:
runMissingSyncBenchmark: 16303208 w/s
runMissingSyncBenchmark: 12239070 r/s
runConcurrentQueueRWSyncBenchmark: 2616 w/s
runConcurrentQueueRWSyncBenchmark: 2615 r/s
runAtomicPropertyBenchmark: 4212703 w/s
runAtomicPropertyBenchmark: 4300656 r/s
So far, atomic property wins. Tremendously. This was tested with an SInt64.
I expected that the approach with the concurrent queue is similar in performance to the atomic property, as it is the standard approach for an r/w-sync mechanism.
Of course, the runMissingSyncBenchmark sometimes produces reads which show that a write of the SInt64 is halfway done.
Perhaps, a spinlock will be optimal (see man 3 spinlock).
Since a spin lock can be tested if it is currently locked (which is a fast operation) the reader task could just return the previous value if the spin lock is held by the writer task.
That is, the reader task uses OSSpinLockTry() and retrieves the actual value only if the lock could be obtained. Otherwise, the read task will use the previous value.
The writer task will use OSSpinLockLock() and OSSpinLockUnlock() respectively in order to atomically update the value.
From the man page:
NAME
OSSpinLockTry, OSSpinLockLock, OSSpinLockUnlock -- atomic spin lock synchronization primitives
SYNOPSIS
#include <libkern/OSAtomic.h>
bool
OSSpinLockTry(OSSpinLock *lock);
void
OSSpinLockLock(OSSpinLock *lock);
void
OSSpinLockUnlock(OSSpinLock *lock);
DESCRIPTION
Spin locks are a simple, fast, thread-safe synchronization primitive that is suitable in situations where contention is expected to be low. The spinlock operations use memory barriers to synchronize access to shared memory protected by the lock. Preemption is possible while the lock is held.
OSSpinLockis an integer type. The convention is that unlocked is zero, and locked is nonzero. Locks must be naturally aligned and cannot be in cache-inhibited memory.
OSSpinLockLock() will spin if the lock is already held, but employs various strategies to back off, making it immune to most priority-inversion livelocks. But because it can spin, it may be inefficient in some situations.
OSSpinLockTry() immediately returns false if the lock was held, true if it took the lock. It does not spin.
OSSpinLockUnlock() unconditionally unlocks the lock by zeroing it.
RETURN VALUES
OSSpinLockTry() returns true if it took the lock, false if the lock was already held.
I think CouchDeveloper's suggestion of using try-checks in the synchronization locks is an intriguing possibility. In my particular experiments, it had negligible impact with spin locks, modest gain for pthread read-write lock, and most significant impact with simple mutex lock). I'd wager that difference configurations would achieve some gain with spin locks, too, but must have I failed to get enough contention with spin locks for the impact of using try to be observable.
If you're working with immutable or fundamental data types, you can also use the atomic property as described in the Synchronization Tools section in the Threading Programming Guide:
Atomic operations are a simple form of synchronization that work on simple data types. The advantage of atomic operations is that they do not block competing threads. For simple operations, such as incrementing a counter variable, this can lead to much better performance than taking a lock.
Unaware that you had done your own benchmarking, I benchmarked a couple of these techniques discussed in that document (doing mutex lock and pthread read/write lock both with and without the "try" algorithm), as well as the GCD reader-writer pattern. In my test, I did 5m reads while doing 500k writes of random values. This yielded the following benchmarks (measured in seconds, smaller being better).
| Tables | Simulator | Device |
+---------------------------+-----------+----------+
| Atomic | 1.9 | 7.2 |
| Spinlock w/o try | 2.8 | 8.0 |
| Pthread RW lock w/ try | 2.9 | 9.1 |
| Mutex lock w/ try | 2.9 | 9.4 |
| GCD reader-writer pattern | 3.2 | 9.1 |
| Pthread RW lock w/o try | 7.2 | 22.2 |
| NSLock | 23.1 | 89.7 |
| Mutex lock w/o try | 24.2 | 80.2 |
| #synchronized | 25.2 | 92.0 |
Bottom line, in this particular test, atomic properties performed the best. Obviously, atomic properties have significant limitations, but in your scenario, it sounds like this is acceptable. These results are obviously going to be subject to the specifics of your scenario, and it sounds like your testing has confirmed that atomic properties yielded the best performance for you.
I am preparing for a microprocessor exam. If the use of a program counter is to hold the address of the next instruction, what is use of stack pointer?
A stack is a LIFO data structure (last in, first out, meaning last entry you push on to the stack is the first one you get back when you pop). It is typically used to hold stack frames (bits of the stack that belong to the current function).
This may include, but is not limited to:
the return address.
a place for a return value.
passed parameters.
local variables.
You push items onto the stack and pop them off. In a microprocessor, the stack can be used for both user data (such as local variables and passed parameters) and CPU data (such as return addresses when calling subroutines).
The actual implementation of a stack depends on the microprocessor architecture. It can grow up or down in memory and can move either before or after the push/pop operations.
Operation which typically affect the stack are:
subroutine calls and returns.
interrupt calls and returns.
code explicitly pushing and popping entries.
direct manipulation of the stack pointer register, sp.
Consider the following program in my (fictional) assembly language:
Addr Opcodes Instructions ; Comments
---- -------- -------------- ----------
; 1: pc<-0000, sp<-8000
0000 01 00 07 load r0,7 ; 2: pc<-0003, r0<-7
0003 02 00 push r0 ; 3: pc<-0005, sp<-7ffe, (sp:7ffe)<-0007
0005 03 00 00 call 000b ; 4: pc<-000b, sp<-7ffc, (sp:7ffc)<-0008
0008 04 00 pop r0 ; 7: pc<-000a, r0<-(sp:7ffe[0007]), sp<-8000
000a 05 halt ; 8: pc<-000a
000b 06 01 02 load r1,[sp+2] ; 5: pc<-000e, r1<-(sp+2:7ffe[0007])
000e 07 ret ; 6: pc<-(sp:7ffc[0008]), sp<-7ffe
Now let's follow the execution, describing the steps shown in the comments above:
This is the starting condition where pc (the program counter) is 0 and sp is 8000 (all these numbers are hexadecimal).
This simply loads register r0 with the immediate value 7 and moves pc to the next instruction (I'll assume that you understand the default behavior will be to move to the next instruction unless otherwise specified).
This pushes r0 onto the stack by reducing sp by two then storing the value of the register to that location.
This calls a subroutine. What would have been pc in the next step is pushed on to the stack in a similar fashion to r0 in the previous step, then pc is set to its new value. This is no different to a user-level push other than the fact it's done more as a system-level thing.
This loads r1 from a memory location calculated from the stack pointer - it shows a way to pass parameters to functions.
The return statement extracts the value from where sp points and loads it into pc, adjusting sp up at the same time. This is like a system-level pop instruction (see next step).
Popping r0 off the stack involves extracting the value from where sp currently points, then adjusting sp up.
The halt instruction simply leaves pc where it is, an infinite loop of sorts.
Hopefully from that description, it will become clear. Bottom line is: a stack is useful for storing state in a LIFO way and this is generally ideal for the way most microprocessors do subroutine calls.
Unless you're a SPARC of course, in which case you use a circular buffer for your stack :-)
Update: Just to clarify the steps taken when pushing and popping values in the above example (whether explicitly or by call/return), see the following examples:
LOAD R0,7
PUSH R0
Adjust sp Store val
sp-> +--------+ +--------+ +--------+
| xxxx | sp->| xxxx | sp->| 0007 |
| | | | | |
| | | | | |
| | | | | |
+--------+ +--------+ +--------+
POP R0
Get value Adjust sp
+--------+ +--------+ sp->+--------+
sp-> | 0007 | sp->| 0007 | | 0007 |
| | | | | |
| | | | | |
| | | | | |
+--------+ +--------+ +--------+
The stack pointer stores the address of the most recent entry that was pushed onto the stack.
To push a value onto the stack, the stack pointer is incremented to point to the next physical memory address, and the new value is copied to that address in memory.
To pop a value from the stack, the value is copied from the address of the stack pointer, and the stack pointer is decremented, pointing it to the next available item in the stack.
The most typical use of a hardware stack is to store the return address of a subroutine call. When the subroutine is finished executing, the return address is popped off the top of the stack and placed in the Program Counter register, causing the processor to resume execution at the next instruction following the call to the subroutine.
http://en.wikipedia.org/wiki/Stack_%28data_structure%29#Hardware_stacks
You got more preparing [for the exam] to do ;-)
The Stack Pointer is a register which holds the address of the next available spot on the stack.
The stack is a area in memory which is reserved to store a stack, that is a LIFO (Last In First Out) type of container, where we store the local variables and return address, allowing a simple management of the nesting of function calls in a typical program.
See this Wikipedia article for a basic explanation of the stack management.
For 8085: Stack pointer is a special purpose 16-bit register in the Microprocessor, which holds the address of the top of the stack.
The stack pointer register in a computer is made available for general purpose use by programs executing at lower privilege levels than interrupt handlers. A set of instructions in such programs, excluding stack operations, stores data other than the stack pointer, such as operands, and the like, in the stack pointer register. When switching execution to an interrupt handler on an interrupt, return address data for the currently executing program is pushed onto a stack at the interrupt handler's privilege level. Thus, storing other data in the stack pointer register does not result in stack corruption. Also, these instructions can store data in a scratch portion of a stack segment beyond the current stack pointer.
Read this one for more info.
General purpose use of a stack pointer register
The Stack is an area of memory for keeping temporary data. Stack is used by the CALL instruction to keep the return address for procedures The return RET instruction gets this value from the stack and returns to that offset. The same thing happens when an INT instruction calls an interrupt. It stores in the Stack the flag register, code segment and offset. The IRET instruction is used to return from interrupt call.
The Stack is a Last In First Out (LIFO) memory. Data is placed onto the Stack with a PUSH instruction and removed with a POP instruction. The Stack memory is maintained by two registers: the Stack Pointer (SP) and the Stack Segment (SS) register. When a word of data is PUSHED onto the stack the the High order 8-bit Byte is placed in location SP-1 and the Low 8-bit Byte is placed in location SP-2. The SP is then decremented by 2. The SP addds to the (SS x 10H) register, to form the physical stack memory address. The reverse sequence occurs when data is POPPED from the Stack. When a word of data is POPPED from the stack the the High order 8-bit Byte is obtained in location SP-1 and the Low 8-bit Byte is obtained in location SP-2. The SP is then incremented by 2.
The stack pointer holds the address to the top of the stack. A stack allows functions to pass arguments stored on the stack to each other, and to create scoped variables. Scope in this context means that the variable is popped of the stack when the stack frame is gone, and/or when the function returns. Without a stack, you would need to use explicit memory addresses for everything. That would make it impossible (or at least severely difficult) to design high-level programming languages for the architecture.
Also, each CPU mode usually have its own banked stack pointer. So when exceptions occur (interrupts for example), the exception handler routine can use its own stack without corrupting the user process.
Should you ever crave deeper understanding, I heartily recommend Patterson and Hennessy as an intro and Hennessy and Patterson as an intermediate to advanced text. They're pricey, but truly non-pareil; I just wish either or both were available when I got my Masters' degree and entered the workforce designing chips, systems, and parts of system software for them (but, alas!, that was WAY too long ago;-). Stack pointers are so crucial (and the distinction between a microprocessor and any other kind of CPU so utterly meaningful in this context... or, for that matter, in ANY other context, in the last few decades...!-) that I doubt anything but a couple of thorough from-the-ground-up refreshers can help!-)
On some CPUs, there is a dedicated set of registers for the stack. When a call instruction is executed, one register is loaded with the program counter at the same time as a second register is loaded with the contents of the first, a third register is be loaded with the second, and a fourth with the third, etc. When a return instruction is executed, the program counter is latched with the contents of the first stack register and the same time as that register is latched from the second; that second register is loaded from a third, etc. Note that such hardware stacks tend to be rather small (many the smaller PIC series micros, for example, have a two-level stack).
While a hardware stack does have some advantages (push and pop don't add any time to a call/return, for example) having registers which can be loaded with two sources adds cost. If the stack gets very big, it will be cheaper to replace the push-pull registers with an addressable memory. Even if a small dedicated memory is used for this, it's cheaper to have 32 addressable registers and a 5-bit pointer register with increment/decrement logic, than it is to have 32 registers each with two inputs. If an application might need more stack than would easily fit on the CPU, it's possible to use a stack pointer along with logic to store/fetch stack data from main RAM.
A stack pointer is a small register that stores the address of the top of stack. It is used for the purpose of pointing address of the top of the stack.