I was working on pandaboard, when this problem occurred to me. pandaboard uses OMAP 4430, a harvard based architecture. The board has 1GB memory (DDR2 ram). But a harvard architecture requires two memories right ?
Here is what I understand
the linux kernel image is stored in MMC/SD card and then pulled out by the bootloader into memory. Now, where is the bootloader running from (is it the 1GB ram?). Where does the bootloader dump the kernel image ?(again, is it the 1GB ram ?)
ARM architecture is often called "modified Harvard". It has a single linear (4GB) memory space, but uses different buses (and caches) for code and data. This allows it to read code or execute data, just like x86.
Note that this does not hold for all ARM chips. Some of them (e.g. Cortex-M0 cores) use single bus for code and data, so they're actually von Neumann.
Related
On a Linux machine, I need to count the number of read and write accesses to memory (DRAM) performed by a process. The machine has a NUMA configuration and I am binding the process to access memory from a single remote NUMA node using numactl. The process is running on CPUs in node 0 and accessing memory in node 1.
Currently, I am using perf to count the number of LLC load miss and LLC store miss events to serve as an estimate for read and write accesses to memory. Because, I guessed LLC misses will need to be served by memory accesses. Is this approach correct i.e. is this event relevant ? And, are there any alternatives to obtain the read and write access information ?
Processor : Intel Xeon E5-4620
Kernel : Linux 3.9.0+
Depending on your hardware you should be able to acess performance counter located on the memory side, to exactly count memory accesses. On Intel processor, these events are called uncore events. I know that you can also count the same thing on AMD processors.
Counting LLC misses is not totally correct because some events such as the hardware prefetcher may lead a significant number of memory accesses.
Regarding your hardware, unfortunately you will have to use raw events (in the perf terminology). These events can't be generalized by perf because they are processor's specifics and as a consequence you will have to look into your processor's manual to find the raw encoding of the event to give to perf. For your Intel processor you should look at chapter 18.9.8 Intel® Xeon® Processor E5 Family Uncore Performance Monitoring Facility and CHAPTER 19 PERFORMANCE-MONITORING EVENTS of the Intel software developer manual document available here In these documents you'll need the exact ID of your processor that you can get using /proc/cpuinfo
My system has 32GB of ram, but the device information for the Intel OpenCL implementation says "CL_DEVICE_GLOBAL_MEM_SIZE: 2147352576" (~2GB).
I was under the impression that on a CPU platform the global memory is the "normal" ram and thus something like ~30+GB should be available to the OpenCL CPU implementation. (ofcourse I'm using the 64bit version of the SDK)
Is there some sort of secret setting to tell the Intel OpenCL driver to increase global memory and use all the system memory ?
SOLVED: Got it working by recompiling everything to 64bit. Quite stupid as it seems, but I thought that OpenCL was working similar to OpenGL, where you can easily allocate e.g. 8GB texture memory from a 32bit process and the driver handles the details for you (ofcourse you can't allocate 8GB in one sweep, but e.g. transfer multiple textures that add up to more that 4GB).
I still think that limiting the OpenCL memory abstraction to the adress space of the process (at least for intel/amd drivers) is irritating, but maybe there are some subtle details or performance tradeoff why this implementation was chosen.
I have some question about physical storage of the kernel data in Linux, I know that the upper 1 GB of the VIRTUAL memory of each process points to the same PHYSICAL location, but
this piece of the data has to be contigious in PHYSICAL media as in VIRTUAL MEMORY?
does kernel data will take ONLY 1 GB of the PHYSICAL memory?
can some pages of the kernel data be swapped to the disk (for example page tables, page global directory is always in physical memory and can't be swapped to disk as I understand)
First GB of physical memory mapped to high GB of virtual addresses linearly. But kernel can modify this mappings.
Yes, it is.
No, linux kernel is not swappable. Only user processes memory can be swapped out.
Note that this is only valid for 32-bit systems. Mappings on 64-bit systems are different.
Given a 2 processor Nehalem Xeon server with 12GB of RAM (6x2GB), how are memory addresses mapped onto the physical memory modules?
I would imagine that on a single processor Nehalem with 3 identical memory modules, the address space would be striped over the modules to give better memory bandwidth. But with what kind of stripe size? And how does the second processor (+memory) change that picture?
Intel is not very clear on that, you have to dig into their hardcore technical documentation to find out all the details. Here's my understanding. Each processor has an integrated memory controller. Some Nehalems have triple-channel controllers, some have dual-channel controllers. Each memory module is assigned to one of the processors. Triple channel means that accesses are interleaved across three banks of modules, dual channel = two banks.
The specific interleaving pattern is configurable to some extent, but, given their design, it's almost inevitable that you'll end up with 64 to 256 byte stripes.
If one of the processors wants to access memory that's attached to the IMC of some other processor, the access goes through both processor and incurs additional latency.
Under Windows Server 2003, Enterprise Edition, SP2 (/3GB switch not enabled)
As I understand it, and I may be wrong, the maximum addressable memory for a process is 4GB.
Is that 2GB of private bytes and 2GB of virtual bytes?
Do you get "out of memory" errors when the private byte limit or virtual byte limit is reached?
It is correct that the maximum address space of a process is 4GB, in a sense. Half of the address space is, for each process, taken up by the operating system. This can be changed with the 3GB switch but it might cause system instability. So, we are left with 2GB of addressable memory for the process to use on its own. Well, not entirely. It turns out that a part of this space is taken up by other stuff such as DLLs an other common code. The actual memory available to you as a programmer is around 1.5GB - 1.7GB.
I'm not sure about how you can handle accidentally going above this limit but I know of games which crash in large multiplayer maps for this reason. Another thing to note is that a 32bit program cannot use more than the 2GB address space on a 64bit system unless they enable the /LARGEADDRESSAWARE:YES linker flag.
Mark Russinovich started a series of posts on this..
Pushing the Limits of Windows: Physical Memory
While 4GB is the licensed limit for 32-bit client SKUs, the effective limit is actually lower and dependent on the system's chipset and connected devices. The reason is that the physical address map includes not only RAM, but device memory as well, and x86 and x64 systems map all device memory below the 4GB address boundary to remain compatible with 32-bit operating systems that don't know how to handle addresses larger than 4GB. If a system has 4GB RAM and devices, like video, audio and network adapters, that implement windows into their device memory that sum to 500MB, 500MB of the 4GB of RAM will reside above the 4GB address boundary.
You can only access 2Gb of memory in total (without the 3Gb switch) on 32bit Windows platforms.
You could run multiple 32bit VMs on a 64bit OS so that each app has access to as much memory as possible if your machine has more than 4Gb.
A lot of people are just starting to hit these barriers, I guess it's easier if your app is in .net or Java as the VMs happily go up to 32Gb of memory on 64bit os.
On 32 bits, if there is enough physical memory and disk space for virtual memory, memory runs out around 3GB since the kernel reserves the address space above 0xC0000000 for itself. On a 64 bits kernel running a 64 bits application, the limit is at 8TB.
For more details, check out MSDN - Memory Limits for Windows Releases
Maximum addressable memory for a 32bit machine is 4GB, for a 64bit machine you can address loads more. (Although some 32bit machines have extension systems for accessing more, but I don't think this is worth bothering with or considering for use).
You get out of memory errors when the virtual limit is reached. On Windows Server 2003, task manager tells you the limit on the performance tab labelled 'Commit Charge Limit'.