Memory mapping behaviour in QNX - qnx

I’m working on porting our text to speech code to work on QNX (6.6). One challenge is to keep RAM consumption as low as possible, as our high quality voices need access to around 200 Mbyte of data (stored in a single file).
We hoped that we could memory map this 200 Mbyte of data but it looks like mmap() on QNX has a different behaviour than on other OS’s. What we observe is that when a (4Kbyte ?) page is memory mapped, it never gets released before the file is unmapped. Since the access to this 200 Mbyte data block is fairly random (depends on the input text), we end up with the complete 200 Mbyte of data within a few seconds of generated speech.
Could this behaviour be confirmed, and if so, do you know if are there other ways (maybe unique to QNX) where old pages are released back to the OS (we have to play nicely with other processes that are running like media player, navigation, GUI, etc.)?
A second observation is that memory mapping a big file takes several seconds. This is unexpected as well since I thought one of the advantages/purpose of memory mapping is skipping the long load times of data into RAM. We use this line of code to mmap() the data file:
pFileData = mmap(0, cFileData, PROT_READ, MAP_SHARED, hFile, 0);
I think PROT_READ and MAP_SHARED are not so special as to cause long mapping times (on Linux and iOS, mmap() returns immediately).
Any insights on this memory mapping behaviour on QNX would be greatly appreciated.

Related

How do I increase maximum download size in an MVC application?

I am frankly stumped. This is beyond my experience.
I have a C# MVC program that generates a zip file in a MemoryStream for downloading. The action method is called by a button click to JavaScript.
The only problem is that in some cases the potential file size can easily exceed one Gig and from my reading, that is a common problem. I've tried upping the Maximum Allowed Content Length to 3000000000 in Request Filtering on IIS (IIS8). I've tried adding requestLimits maxAllowedContentLength to my web.config. I've even tried breaking up the zip through multiple calls to the action method (without success), although I have yet to get any confirmation/denial that this is even possible.
Is there any setting within IIS or my web.config that I could be overlooking? Could this be a company network issue, not solvable on an app developer's level?
Okay, so it's kind of hard to explain big concepts in 400 characters or less, so I think I'm just causing more confusion sticking in the comments section. Besides, I think we're close enough here to an "answer" as you're likely to get.
The default constructor of MemoryStream essentially sets the initial size to 0. In reality, the initial size is set to somewhere around 256, but since the initial size is mostly a guide, and it doesn't actually claim that space until its needed, it starts at 0.
Each time you write to the stream, it checks how much is being written versus the remaining size of the buffer array. If it can't fit the write, it creates a new, larger buffer array and copies the old buffer array into that. In this way, setting an initial size can help somewhat, in that you start off with a larger initial buffer array and you may not need to grow that buffer. You might have a better chance of getting a contiguous block of memory, which I'll explain the importance of in a bit, but that actually kind of works against you, as well. If you only need 1MB for the file, but you're initializing with 100MB and there's not 100MB of contiguous memory, you'll get an OutOfMemoryException, even though there might be 1MB of contiguous memory available.
Regardless of whether you initialize or not, there remains certain immutable facts. First, MemoryStream requires contiguous memory. Even if you technically have memory available on the system, it's possible you might not have large blocks of available memory. In other words if you have 4GB available, but it's all fragmented, even trying to create a 1GB stream in memory could fail, simply because it can't reserve 1GB of contiguous memory. Obviously, the larger the file you're tying to create in memory, the greater the chances that you're going to run into this issue. For this reason alone, I would say you're out of luck without raising the amount of system RAM. With 8GB and probably only 4-6GB actually available to IIS and then split up between worker processes and threads, the odds that you're going to be able to claim 25% or so of the available RAM as contiguous space, is highly unlikely.
The next immutable fact may or may not be relevant, but since you haven't specified, I'll mention it. If your web app is deployed as 32-bit, you'll have a hard limit of 2GB for any object, meaning a MemoryStream could never house more than 2GB (actually around 1.3-1.6GB as .NET code consumes some of that address space), and any attempt to make it do so will result in an OutOfMemoryException, even if you had some ridiculous amount of RAM on the system like 1TB+. If your app is 64-bit, this is less likely an issue as you can address a ton more memory, assuming it's compiled properly. You'd have to pretty much try to screw that up, though, so you should be fine.
Finally, multiple writes can cause an issue as well. As I said previously, the buffer array resizes (if necessary) in response to writes. Each time it resizes, the new buffer array must also be able to fit in contiguous address space. As a result, multiple resizes can cause you to bump into an OutOfMemoryException you wouldn't have hit if you had written all the data from the start. This is where initializing the MemoryStream can be helpful, but as I said before, it's also a double-edged sword, as your initial buffer size might be too great to begin with and you end up with an exception where you may have not had one letting it grow organically. Long and short, try to write everything to the stream in one go rather than piecemeal.

Delay between start and end of the RAM

Is the delay the same when a CPU needs to go to the start of the RAM or the end, or there is no difference?
There is no difference in performance when the CPU tries to access physical memory at some arbitrary address/range.
As far as the CPU is concerned, it's interacting with a memory controller over a bus. The values in random access memory are retrieved by means of an address without having to be concerned about where the address happens to be physically located within a memory module.
If we assume the CPU is requesting contents located in RAM after cache misses, then it doesn't matter if the requested address is x or x + 100. The time delays are expected to be within the same performance range.
The 'start' and 'end' locations would matter if you were to switch to media based on sequential access (e.g. tape drives, often used for backups).
Note that I'm avoiding the topic of a process' view of memory when executed by the OS (e.g. virtual memory, etc) and also the idea of trying different kinds/amounts of memory being tested and compared against each other. In other words, I'm assuming that a given system has a fixed amount of a given memory type for a given test.
In addition, when looking at memory module specs, I've never noticed any information indicating some sort of performance penalty/gain if a certain address range within the modulewere to be avoided/used.

How does browser GPU memory usage works?

By pressing F12 and then Esc on Chrome, you can see a few options to tick. One of them is show FPS meter, which allows us to see GPU memory usage in real time.
I have a few questions regarding this GPU memory usage:
This GPU memory means the memory the webpage needs to store its code: variables, methods, images, cached videos, etc. Is this right to affirm?
Is there a reason as to why it has an upper bound of 512 Mb? Is there a way to reduce or increase it?
How much GPU memory usage is enough to see considerable slowdown on browser navigation?
If I have an array with millions of elements (just hypothetically), and I splice all the elements in the array, will it free the memory that was in use? Or will it not "really" free the memory, requiring an additional step to actually wipe it out?
1. What is stored in GPU memory
Although there are no hard-set rules on the type of data that can be stored in GPU-memory, the bulk of GPU memory generally contains single-frame resources like textures, multi-frame resources like vertex buffers and index buffer data, and programmable-shader compiled code fragments. So while in theory it is possible to store video's in GPU memory, as well as all kinds of other bulk data, in practice, for every streamed video only a bunch of frames will ever be in GPU-ram.
The main reason for this soft-selection of texture-like data sets is that a GPU is a parallel hardware architecture, and it expects the data to be compatible with that philosophy, which means that there are no inter-dependencies between sets of data (i.e. pixels). Decoding images from a video stream is more or less the same as resolving interdependence between data-blocks.
2. Is 512MB enough for everyone?
No. It's probably based on your hardware.
3. When does GPU memory become slow?
You have to know that some parts of the GPU memory are so fast you can't even start to appreciate the speed. There is nothing wrong with the speed of a GPU card. What matters is the time it takes to get the data IN that memory in the first place. That is called bandwidth, and the operations usually need to be synchronized. In that case, the driver will lock the Northbridge bus so that data can flow from main memory into GPU memory, and this locking + transfer takes quite some time.
So to answer the question, once it is uploaded, the GUI will remain fast, no matter how much more memory is used on the GPU card. The only thing that can slow it down, are changes to the GUI, and other GPU processes taking time to complete that may interfere with rendering operations.
4. Splicing ram memory frees it up?
I'm not quite sure what you mean by splicing. GPU memory is freed by applications that release that memory by using the API calls to do that. If you want to render you GPU memory blank, you'd have to grab the GPU handles of the resources first, upload 'clear' data into them, and then release the handles again, but (for normal single-threaded GPU applications) you can only do that in your own process context.

40 million page faults. How to fix this?

I have an application that loads 170 files (let’s say they are text files) from disk in individual objects and kept in memory all the time. The memory is allocated once when I load those files from disk. So, there is no memory fragmentation involved. I also use FastMM to make sure my applications never leaks memory.
The application compares all these files with each other to find similarities. Over-simplified we can say that we compare text strings but the algorithm is way more complex as I have to allow some differences between strings. Each file is about 300KB. Loaded in memory (the object that holds it) it takes about 0.4MB of RAM. So, the running app takes about 60MB or RAM (working set). It processes the data for about 15 minutes. The thing is that it generates over 40 million page faults.
Why? I have about 2GB of free RAM. From what I know Page Faults are slow. How much they are slowing down my program?
How can I optimize the program to reduce these page faults? I guess it has something to do with data locality. Does anybody know some example algorithms for this (Delphi)?
Update:
But looking at the number of page faults (no other application in Task Manager comes close to mine, not even by far) I guess that I could increase the speed of my application IF I manage to optimize memory layout (reduce the page faults).
Delphi 7, Win 7 32 bit, RAM 4GB (3GB visible, 2GB free).
Caveat - I'm only addressing the page faulting issue.
I cannot be sure but have you considered using Memory Mapped files? In this way windows will use the files themselves as the paging file (rather than the main paging file pagrefile.sys). If the files are read only then the number of page faults should theoretically decrease as the pages won't need to written out to disk via the paging file as windows will just load the data from the file itself as needed.
Now to reduce files from paging in and out you need to try and go through the data in one direction so that as new data is read, older pages can be discarded for ever. Here is where you trade off going over the files again and caching data - the cache has to be stored somewhere.
Note that Memory Mapped files is how windows loads .dlls and .exes amongst other things. I've used them to scan though gigabyte files without hitting memory limits (we had MBs in those days and not GBs of ram).
However from the data you describe I'd suggest the ability to not go back ovver files will reduce the amount of repaging going on.
On my machine most pagefaults are reported for developer studio which is reported to have 4M page faults after 30+ minutes total CPU time. You get 10 times more, in half the time. And memory is scarce on my system. So 40M faults seems like a lot.
It could just maybe be you have a memory leak.
the working set is only the physical memory in use for your application. If you leak memory, and don't touch it, it will get paged out. You will see the virtual memory useage (or page file use) increase. These pages might be swapped back in when the heap memory walks the heap, to get swapped out again by windows.
Because you have a lot of RAM, the swapped out pages will stay in physical memory, as nobody else needs them. (a page recovered from RAM counts as a soft fault, from disk as a hard one)
Do you use an exponential resize system ?
If you grow the block of memory in too small increments while loading, it might constantly request large blocks from the system, copy the data over, and then release the old block (assuming that fastmm (de)allocates very large blocks directly from the OS).
Maybe somehow this causes a loop where the OS releases memory from your app's process, and then adds it again, causing page faults on first write.
Also avoid Tstringlist.load* methods for very large files, IIRC these consume twice the space needed.

What will happen if a application is large enough to be loaded into the available RAM memory?

There is chance were a heavy weight application that needs to be launched in a low configuration system.. (Especially when the system has too less memory)
Also when we have already opened lot of application in the system & we keep on trying opening new new application what would happen?
I have only seen applications taking time to process or hangs up for sometime when I try operating with it in low config. system with low memory and old processors..
How it is able to accomodate many applications when the memory is low..? (like 128 MB or lesser..)
Does it involves any paging or something else..?
Can someone please let me know the theory behind this..!
"Heavyweight" is a very vague term. When the OS loads your program, the EXE is mapped in your address space, but only the code pages that run (or data pages that are referenced) are paged in as necessary.
You will likely get horrible performance if pages need to constantly be swapped as the program runs (aka many hard page faults), but it should work.
Since your commit charge is near the commit limit, and the commit limit will likely have no room to grow, you will also likely recieve many malloc()/VirtualAlloc(..., MEM_COMMIT)/HeapAlloc()/{Local|Global}Alloc() failures so you need to watch the return codes in your program.
Some keywords for search engines are: paging, swapping, virtual memory.
Wikipedia has an article called Paging (Redirected from Swap space).
There is often the use of virtual memory. Virtual memory pages are mapped to physical memory if they are used. If a physical page is needed and no page is available, another is written to disk. This is called swapping and that explains why crowded systems get slow and memory upgrades have positive effects on performance.

Resources