Is it possible to protect certain data from a swap-out operation? - memory

I have an issue.
I want my data is always kept in the main memory.
So when I allocate the memory by malloc or mmap, is it possible to protect this memory to be selected victim of a swap-out operation?
Can you give me some advice on these issues? Thank you

Yes, use the mlock(2) system call.
However, read the man page carefully, including the NOTES section; there are a number of restrictions and caveats. In particular, there may be limits on the amount of memory that an unprivileged process may lock.

Related

How do allocations work and how do you prevent them?

The go test tool has a profiler which can tell you the amount of allocations you did inside the code.
However, seeing libraries such as this one:
https://github.com/valyala/fasthttp
stating "Zero memory allocations in hot paths"... what does that mean? and how do you achieve this in Go?
I personally don't like their use of language as it sounds like something a marketer would say... All they mean to say is that no allocations will occur in that code because a buffer has been allocated in advance for use there.
So to be clear, they mean 'in this limited scope no allocations will occur'. How do you achieve this? By allocating a sufficiently large buffer in advance of that and then leveraging it in the scope.
The intent of packages author(s) is to speed up request handling by allocating up front at the cost of using more memory (or having a more constant hold on memory at least, in theory the buffer could be the same size as what would need to be allocated).
If you're curious about the implementation details take a look in files like byte_buffer.go and args.go and you'll find that there is a pool of buffer objects allocated in advance so that your handler code doesn't have to do an allocation for the response body ect. Instead you obtain a buffer from the pool (already allocated) and write the response data to it and then when you're done it's released back into the pool for reuse. In the standard scenario you'd instead allocate space for the response body and after the response is returned that object would leave scope and the memory would be freed. As I mentioned in the paragraph above, moving all of this upfront means when your service starts up it will obtain and hold a larger amount of memory than a similar service which used net/http since it will instead obtain and release memory on an as needed basis.

[ARM CortexA]Difference between Strongly-ordered and Device Memory Type

I am really a new starter to Cortex A and I am aware the ARM applies weakly-ordered memory model, and there are three mutually exclusive memory types:
Strongly-ordered
Device
Normal
I roughly understand what Normal is for and what Strongly-ordered and Device mean. However the diffrence between strongly-ordered and device is confusing to me.
According to the Cortex-A Series Programmer's Guide, the only difference is that:
A write to Strongly-ordered memory can complete only when it reaches the peripheral or memory component accessed by the write.
A write to Device memory is permitted to complete before it reaches the peripheral or memory component accessed by the write.
I am not quite sure about what the real implification of this. I am guessing that, the order of the access to the memory typed with Strongly-ordered or Device should be coherent with programmers' codes (no out-of-order access). But the CPU will potentially execute the next instruction while accessing the memory if typed Device, and it will simply wait untill the access to be complete if typed Strongly-ordered.
Correct me if I am wrong and please tell me what is the meaning of doing this.
Thanks in advance.
One important bit to understand is that memory types have no guaranted effect on the instruction stream as a whole - they affect only the ordering of memory accesses. (They may have a specific effect on a specific processor integrated in a specific way with a specific interconnect - but that can never be relied on by software.)
Another important thing to understand is that even Strongly-ordered memory provides implicit guarantees of ordering only with regards to accesses to the same peripheral. Any ordering requirements more strict than that require use of explicit barrier instructions.
A third important point is that any implicit memory access ordering that takes place due to memory types does not affect the ordering of accesses to other memory types. Again, if your application has dependencies like this, explicit barrier instructions are required.
Now, against that background - a simpler way of describing the difference between Device and Strongly-ordered memory is that Device memory accesses can be buffered - in the processor itself or in the interconnect. The difference being that a buffered access can be signalled as complete to the processor before it has completed (or even initiated) at the end point.
This provides better performance at the cost of losing the synchronous reporting of any error condition.

Memory defragmentation software. How does it work? Does it work?

I was reading an article on memory fragmentation when I recalled that there are several examples of software that claim to defragment memory. I got curious, how does it work? Does it work at all?
EDIT:
xappymah gave a good argument against memory defragmentation in that a process might be very surprised to learn that its memory layout suddenly changed. But as I see it there's still the possibility of the OS providing some sort of API for global memory control. It does seem a bit unlikely however since it would give rise to the possibility of using it in malicious intent, if badly designed. Does anyone know if there is an OS out there that supports something of the sort?
The real memory defragmentation on a process level is possible only in managed environments such as, for example, Java VMs when you have some kind of an access to objects allocated in memory and can manage them.
But if we are talking about the unmanaged applications then there is no possibility to control their memory with third-party tools because every process (both the tool and the application) runs in its own address space and doesn't have access to another's one, at least without help from OS.
However even if you get access to another process's memory (by hacking your OS or else) and start modifying it I think the target application would be very "surprised".
Just imagine, you allocated a chunk of memory, got it's starting address and on the next second this chunk of memory is moved somewhere else because of "VeryCoolMemoryDefragmenter" :)
In my opinion memory it's a kind of Flash Drive, and this chip don't get fragmented because there aren't turning disks pins recording and playing information, in a random way, like a lie detector. This is the way that Hard Disk Fragmentation it's done. That's why SSD drives are so fast, effective, reliable and maintenance free. SSD it's a BIG piece of memory and it kind of look alike.

Can a Memcached daemon ever free() unused memory, without terminating the process?

I believe that you can't force a running Memcached instance to de-allocate memory, short of terminating that Memcached instance (and freeing all of the memory it held). Does anyone know of a definitive piece of documentation, or even a mailing list or blog posting from a reliable source, that can confirm or deny this impression?
As I understand it, a Memcached process initially allocates a chunk of memory (the exact initial allocation size is configurable), and then monotonically increases its memory utilization over its lifetime, limited by the daemon's maximum memory allocation size (also configurable). At no point does the Memcached daemon ever free any memory, regardless of whether the daemon has any ongoing need for the memory it holds.
I know that this question might sound a little whiny, with a tone of "I DEMAND that open source project X support my specific need!" That's not it, at all--I'm purely interested in the exact technical answer, here, and I swear I'm not harshing on Memcached. For the curious, this question came out of a discussion about possible methods for gracefully juggling multiple Memcached instances on a single server, given an application where the cost of a cache flush can be quite high.
However, I'd appreciate it if you save your application suggestions/advice for a different question (re-architecting my application, using a different caching implementation, etc.). I do appreciate a good brainstorm, but I think this question will be most valuable if it stays focused on the technical specifics of how Memcached does and does not work. If you don't have the answer to this specific question, there is probably still value in what you have to say, but I'd guess that there's a different, better place to post the more speculative comments/suggestions/advice.
This is probably the hardest problem we have to solve for memcached currently (well, a variation of it, anyway).
Freeing a chunk of memory requires us to know that a) nothing within the chunk is in use and b) nothing will start using it while we're in the process of purging it for reuse/freeing. I've heard some really good ideas for how we might solve our slab rebalancing problems which is basically the same, except we're not trying to free the memory, but to give it to something else (a common problem in a few large installations).
Also, whether free actually reduces the RSS of your process is implementation dependent. In many cases, a malloc/fill/free will leave the memory mapped in (unless your allocator uses mmap instead of sbrk).
I'm pretty sure this isn't possible with memcached. I don't see any technical reason why it couldn't be implemented though. Lock cache operations, expire enough keys to reach the desired size, update the size, unlock. (I'm sure there's nicer ways to avoid blocking the server during that time.)
The standard and default mechanism of memory management in memcached is slab allocator. It means that memory is being allocated for the process and never released to the operating system. Basically, when memory is no longer used to store some data, it is being held by the process in order to be reused later, when needed. However, the operating system releases memory allocated by the process when it is finished. That is why memory is being released when you kill/stop the memcached.
There is a compile-time option in memcached to enable malloc/free mechanism. So that when free() is called, memory might be released to operating system (this depends on C standard library implementation). But doing so might hurt a good fragmentation and performance.
Please read more about the issue here:
Why not use malloc/free
Memcached memory management

Determine whether memory location is in CPU cache

It is possible for an operating system to determine whether a page of memory is in DRAM or in swap; for example, simply try to access it and if a page fault occurs, it wasn't.
However, is the same thing possible with CPU cache?
Is there any efficient way to tell whether a given memory location has been loaded into a cache line, or to know when it does so?
In general, I don't think this is possible. It works for DRAM and the pagefile since that is an OS managed resource, cache is managed by the CPU itself.
The OS could do a tight timing loop of a memory read and try to see if it completes fast enough to be in the cache or if it had to go out to main memory - this would be very error prone.
On multi-core/multi-proc systems, there are cache coherency protocols that are used between processors to determine when to they need to invalidate each other's caches, I suppose you could have a custom device that would snoop this protocol that the OS would query.
What are you trying to do? If you want to force something into memory, current x86 processors support prefetching memory into the cache in a non-blocking way, for instance with Visual C++ you could use _mm_prefetch to fetch a line into the cache.
EDIT:
I haven't done this myself, so use at your own risk. To determine cache misses for profiling, you may be able to use some architecture-specific registers. http://download.intel.com/design/processor/manuals/253669.pdf, Appendix A gives "Performance Tuning Events". This can't be used to determine if an individual address is in the cache or when it is loaded in the cache, but can be used for overall stats. I believe this is what vTune (a phenomenal profiler for this level) uses.
If you try to determine this yourself then the very act of running your program could invalidate the relevant cache lines, hence rendering your measurements useless.
This is one of those cases that mirrors the scientific principle that you cannot measure something without affecting that which you are measuring.
X86
dont know how to tell if address IS in cache
BUT here is how to tell if address WAS in cache
rdtsc
save timestamp
mov eax,address
rdtsc read timestamp counter
calculate timestamp difference
if < threshold then was in cache
threshold has to be determined from documentation or empirically
some machines have cache hit/miss counters which would serve equally well

Resources