I've tried finding information about this but can't seem to find any. Any ideas whether this is true or not?
For 80x86; segment limit checks are done whenever the CPU is not running 64-bit code.
This includes when the CPU is in (32-bit or 16-bit) supervisor mode (CPL=0, kernel code), even if segmentation is "disabled as much as possible" (by setting all segments to "base = 0, limit = 4 GiB") because it's still possible to break the maximum limit (e.g. access 4 bytes starting at offset 0xFFFFFFFF).
It also includes when the CPU is in real mode (where there is no privilege levels but all code runs as if it is CPL=0). E.g. in real mode mov ax,[0xFFFF] will cause a general protection fault due to exceeding the 64 KiB segment limit.
For any other interpretations of "memory limit protection checks" (e.g. for other architectures that aren't 80x86), I don't know.
Related
I understand that cudaMallocManaged simplifies memory access by eliminating the need for explicit memory allocations on host and device. Consider a scenario where the host memory is significantly larger than the device memory, say 16 GB host & 2 GB device which is fairly common these days. If I am dealing with input data of large size say 4-5 GB which is read from an external data source. Am I forced to resort to explicit host and device memory allocation (as device memory is insufficient to accommodate at once) or does the CUDA unified memory model has a way to get around this (something like, auto allocate/deallocate on need basis)?
Am I forced to resort to explicit host and device memory allocation?
You are not forced to resort to explicit host and device memory allocation, but you will be forced to handle the amount of allocated memory manually. This is because, on current hardware at least, the CUDA unified virtual memory doesn't allow you to oversubscribe GPU memory. In other words, cudaMallocManaged will fail once you allocate more memory than what is available on the device. But that doesn't mean you can't use cudaMallocManaged, it merely means you have to keep track of the amount of memory allocated and never exceed what the device could support, by "streaming" your data instead of allocating everything at once.
Pure speculation as I can't speak for NVIDIA, but I believe this could be one of the future improvements on upcoming hardware.
And indeed, one year and a half after the above prediction, as of CUDA 8, Pascal GPUs are now enhanced with a page-faulting capability that allows memory pages to migrate between the host and the device without explicit intervention from the programmer.
After the boot up in real mode, i would like to figure out the lowest and highest
possible memory addresses i can use. I assume i need to find out
the actual size of the ram installed on the machine then computing
the addresses should be simple (but just for fun, how would i do it :)).
Then i could use these addresses as
the base and limit of my gdt right? That way after loading the gdt,
going to protected mode, and setting up my segments. Ill have all the memory
available to play with.
Real mode by definition can't address all the memory. You'd have to switch to protected mode first, with a safe GDT limit, scan the memory, then adjust the descriptor limit(s) accordingly.
As for detecting memory, just try reading from increasing physical addresses until it exceptions. Designate a selector for that, reset the descriptor in a loop, and go ahead.
I have a 32-bit Delphi application running with /LARGEADDRESSAWARE flag on. This allows to allocate up to 4GB on a 64-bit system.
Am using threads (in a pool) to process files where each task loads a file in memory. When multiple threads are running (multiple files being loaded), at some point the EOutOfMemory hits me.
What would be the proper way to get the available address space so I can check if I have enough memory before processing the next file?
Something like:
if TotalMemoryUsed {from GetMemoryManagerState} + FileSize <
"AvailableUpToMaxAddressSpace" then NoOutOfMemory
I've tried using
TMemoryStatusEx.ullAvailVirtual for AvailableUpToMaxAddressSpace
but the results are not correct (sometimes 0, sometimes > than I actually have).
I don't think that you can reasonably and robustly expect to be able to predict ahead of time whether or not memory allocations will fail. At the very least you would probably need to write your own memory allocator that was dedicated to serving your application, and have a very strong understanding of the heap allocation requirements of your process.
Realistically the tractable way forward for you is to break free from the shackles of 32 bit address space. That is your fundamental problem. The way to escape from 32 bit address space is to compile for 64 bit. That requires XE2 or later.
You may need to continue supporting 32 bit versions of your application because you have users that are still on 32 bit systems. The modern versions of Delphi have 32 bit and 64 bit compilers and it is quite simple to write code that will compile and behave correctly under both scenarios.
For your 32 bit versions you are less likely to run into memory problems anyway because 32 bit systems tend to run on older hardware with fewer processors. In turn this means less demand on memory space because your thread pool tends to be smaller.
If you encounter machines with large enough processor counts to cause out of memory problems then one very simple and pragmatic approach is to give the user a mechanism to limit the number of threads used by your application's thread pool.
Since there are more processes running on the target system even if the necessary infrastructure would be available, if would be no use.
There is no guarantee that another process does not allocate the memory after you have checked its availability and before you actually allocate it. The right thing to do is writing code that will fail gracefully and catch the EOutOfMemory exception when it appears. Use it as a sign to stop creating more threads until some of them is already terminated.
Delphi is 32bit, so you can't allocate memory addresses larger than that.
Take a look at this:
What is a safe Maximum Stack Size or How to measure use of stack?
I'm doing some research with the MIPS architecture and was wondering how operating systems are implemented with the limited instructions and memory protection that mips offers. I'm specifically wondering about how an operating system would prevent certain addresses ranges from being executed. For example, how could an operating system limit PC to operate in a particular range? In other words, prevent something such as executing from dynamically allocated memory?
The first thing that came to mind is with TLBs, but TLBs only offer memory write protection (and not execute).
I don't quite see how it could be handled by the OS either, because that would imply that every instruction would result in an exception and then MANY cycles would be burned just checking to see if PC was in a sane address range.
If anyone knows, how is it typically done? Is it handled somehow by the hardware during initialization (e.g. It's given an address range and an exception is hit if its out of range?)
Most of protection checks are done in hardware, by the CPU itself, and do not need much involvement from the OS side.
The OS sets up some special tables (page tables or segment descriptors or some such) where memory ranges have associated read, write, execute and user/kernel permissions that the CPU then caches internally.
The CPU then on every instruction checks whether or not the memory accesses comply with the OS-established permissions and if everything's OK, carries on. If there's an attempt to violate those permissions the CPU raises an exception (a form of an interrupt similar to those from external to the CPU I/O devices) that the OS handles. In most cases the OS simply terminates the offending application when it gets such an exception.
In some other cases it tries to handle them and make the seemingly broken code work. One of these cases is support for virtual, on-disk memory. The OS marks a region as unpresent/inaccessible when it's not backed up by physical memory and it's data is somewhere on the disk. When the app tries to use that region, the OS catches an exception from the instruction that tries to access this memory region, backs the region with physical memory, fills it in with data from the disk, marks it as present/accessible and restarts the instruction that's caused the exception. Whenever the OS is low on memory, it can offload data from certain ranges to the disk, mark those ranges as unpresent/inaccessible again and reclaim the memory from those regions for other purposes.
There may also be specific hard-coded by the CPU memory ranges inaccessible to software running outside of the OS kernel and the CPU can easily make a check here as well.
This seems to be the case for MIPS (from "Application Note 235 - Migrating from MIPS to ARM"):
3.4.2 Memory protection
MIPS offers memory protection only to the extent described earlier i.e. addresses
in the upper 2GB of the address space are not permitted when in user mode.
No finer-grained protection regime is possible.
This document lists "MEM - page fault on data fetch; misaligned memory access; memory-protection violation" among the other MIPS exceptions.
If a particular version of the MIPS CPU doesn't have any more fine-grained protection checks, they can only be emulated by the OS and at a significant cost. The OS would need to execute code instruction by instruction or translate it into almost equivalent code with inserted address and access checks and execute that instead of the original code.
This is indeed done with TLBs. No Execute Bits (NX bits) became popular only a few years ago, so older MIPS processors do not support it. The latest version of the MIPS architecture (Release 3) and the SmartMIPS Application-Specific Extension support it as an optional feature under the name of XI (Execute Inhibit).
If you have a chip without this feature you are out of luck. Like Alex already said, there is no simple way to emulate this feature.
Can anybody tell me a unix command that can be used to find the number of memory accesses that took place in a given interval. vmstat, top and sar only give the amount of physical memory space occupied/available .. But do not give the number of memory of accesses in a given interval
If I understand what you're asking, such a feature would almost certainly require hardware support at a very low level (e.g. a counter of some sort that monitors memory bus activity).
I don't think such support is available for the common architectures supported by
Unix or Linux, so I'm going to go out on a limb and say that no such Unix command exists.
The situation is somewhat different when considering memory in units of pages,
because most architectures that support virtual memory have dedicated MMU hardware
which operates at that level of granularity, and can be accessed by the operating
system. But as far as I know, the sorts of counter data you'd get from the MMU would
represent events like page faults, allocations, and releases, rather than individual
reads or writes.