The maximum supported number of monitors in DirectX3D (D3D9)

The maximum supported number of monitors in DirectX3D (D3D9) - directx

I'm developing the video wall system with Nvidia CUDA Decoder library for decoding and Directx3d(D3D9) for rendering. So, we assume that dozens of monitors can be installed in the system.
(System: Intel I7 Processor, NVIDIA GTX 780 x 4EA, Windos 8 OS)
But, IDirect3D9::GetAdapterCount API returns up to 12, even if more than 12 monitors are installed in the system. Namely, if there are 11 monitors in the system, the API returns 11. And if there are 12 monitors in the system, it returns 12. But, 13 monitors are installed in the system, the API returns 12, not 13.
So, in that case, we cannot identify adapter ID of exceed monitors for rendering.
As I know, Windows supports up 64 monitors. So I think that it is not a limitation of OS.
I wonder if it is a limitation of D3D9. If you have some knowledge about it, please reply.
Thank you.

Related

Modify the memory limit per app in Windows 10 Universal Applications

In our tests, seems that memUsageLimit is fixed in function of installed RAM and platform. For smartphones (mostly ARM processors) the limits are
185 MB for 512 MB RAM device
390 MB for 1GB RAM device
900 MB for 2GB RAM device
For regular Windows + Intel platforms, we found the limit is about 20% more than physical available RAM, perhaps due to the ability of paging to disk.
My question is regarding the first group of devices (phones): is it possible to change the memory limit for a given application? We need to process a JSON document received via oData V4, and when processing with NewtonSoft, the memory consumption is very significant: for every MB of pure JSON data, the app process is increased about 9MB in a very linear fashion.

Win10 1.586 does provide a new API, TrySetAppMemoryUsageLimit, to set the app's memory limitation. However, based on internal discussion, this API only works for very limited scenario right now, such as VOIP application on mobile device. And the sample code and document for this API are not quite ready.
I have tested this API on the UWP VOIP sample and it does work(we need to set the sample project's target to 10586). The code looks like below:
var y = MemoryManager.AppMemoryUsageLimit;
bool result = MemoryManager.TrySetAppMemoryUsageLimit(y+10000);
As for your requirement, we will keep collecting the feedback about this feature. If there is any strong requirement, we will communicate with internal team. However, my personal suggestion for you is that: win store app has very strong safety policy for the apps. It is really not recommended for APP to exceed memory limitation.

Do 32-bit Programs Actually Use More Memory On 64-bit Operating Systems?

When executing a 32-bit program on a 64-bit CPU, can we assume that the underlying hardware will use 32-bit values in memory and in the CPU registers to store an integer?
Given these setups:
Intel i5 + Windows 32 bit + 32 bit application
Intel i5 + Windows 64 bit + 32 bit application
Please explain the drawbacks of each setup. Would using the second one be less efficient because the CPU gets extra work to ignore halt of the register's value?

When an x86_64 processor executes 32-bit code it is effectively running in i686 mode. This is all achieved in hardware so there is no performance penalty.
A user-space 32-bit program will be executed in "32 bit mode" whether the operating system is 32 bits or 64 bits so the behaviour should be identical.
The only potential drawback is that the program might need to dynamically link to some 32-bit libraries which are not installed by default on the 64-bit OS. That sort of scenario is more common on Linux where it is much more common for programs to dynamically link to external libraries; the only example I can think of on Windows is if the program needs the Visual C++ runtime library then you would have to install the 32-bit version of that (possibly alongside the 64-bit version, if another program needed that). So in short you might have to install more stuff with a 64 bit set up.
Also the actual OS might consume more memory on a 64 bit setup, however one drawback of 32-bit system is that your entire system memory size is limited to 4GB (or practically about 3.5 GB due to some memory space being mapped to hardware).

64-bit Knowledge Base: How much memory can an application access in Win32 and Win64?

What's the difference between mach_vm_allocate and vm_allocate?

I would like to know what's the difference betweenmach_vm_allocate and vm_allocate. I know mach_vm_allocate is only available in OS X and not iOS, but I'm not sure why. The file that has all the function prototypes in for the mach_vm_... functions (mach/mach_vm.h) only has #error mach_vm.h unsupported. in iOS.

the new Mach VM API that is introduced in Mac OS X 10.4. The new API is essentially the same as the old API from the programmer's standpoint, with the following key differences.
-Routine names have the mach_ prefixfor example, vm_allocate() becomes mach_vm_allocate() .
-Data types used in routines have been updated to support both 64-bit and 32-bit tasks. Consequently, the new API can be used with any task.
The new and old APIs are exported by different MIG subsystems:
mach_vm and vm_map , respectively. The corresponding header files are
<mach/mach_vm.h> and <mach/vm_map.h> , respectively.

The new Mach VM API that is introduced in Mac OS X 10.4. The new API is essentially the same as the old API from the programmer's standpoint, with the following key differences:
Routine names have the mach_ prefixfor example, vm_allocate() becomes mach_vm_allocate();
Data types used in routines have been updated to support both 64-bit and 32-bit tasks. Consequently, the new API can be used with any task.
The new and old APIs are exported by different MIG subsystems: mach_vm and vm_map respectively. The corresponding header files are <mach/mach_vm.h> and <mach/vm_map.h> respectively.
The information is derived from the book OS X Internals A System Approach.

Those would be /usr/include/mach/mach_vm.h and /usr/include/mach/vm_map.h. You can grep in /usr/include/mach to get those.
In the kernel, the APIs are basically using the same implementation. On iOS, you can use the "old" APIs perfectly well.

I'm not happy with the other answers. Two seem to quote from the same source (one doesn't correctly attribute it) and this source rather misleading.
The data types in the old API are variable size. If you are compiling for i386 CPU architecture, they are 32 bit, if you compile for x86_64 CPU architecture, they are 64 bit.
The data type in the new API are fixed size and always 64 bit.
As a 32 bit process only requires 32 bit data types and a 64 bit process already had 64 bit types even with the old API, it isn't obvious why the new API was required at all. But it will become obvious if you look at Apples kernel design:
The real kernel of macOS is a Mach 3 micro kernel. Being a micro kernel, it only takes care of process and thread management, IPC, virtual memory management and process/thread scheduling. That's it. Usually everything else would be done in userspace when using a micros kernel but that is slow and thus Apple took a different approach.
Apple took the FreeBSD monolithic kernel and wrapped it around the micro kernel which executes the FreeBSD kernel as a single process (a task) with special privileges (despite being a task, the code runs within kernel space not userspace as all the other tasks in the system). The combined kernel has the name XNU.
And in the past, Apple supported any combination of mixing 32 and 64 bits within a system: Your system could run a 32 bit kernel and run 32 or 64 bit processes on it or it could run a 64 bit kernel and run 32 or 64 bit processes on it. You might already be seeing where this going, aren't you?
If process and kernel are of different bit width, the old API is of no use. E.g., a 32 bit kernel cannot use it to interact with the memory of a 64 bit process because all data types of the API calls would only be 32 bit in the 32 bit kernel task but a 64 bit process has a 64 bit memory space even if the kernel itself has not.
Actually there is even a 3rd version of that API. To quote a comment from Apple's kernel source:
* There are three implementations of the "XXX_allocate" functionality in
* the kernel: mach_vm_allocate (for any task on the platform), vm_allocate
* (for a task with the same address space size, especially the current task),
* and vm32_vm_allocate (for the specific case of a 32-bit task). vm_allocate
* in the kernel should only be used on the kernel_task. vm32_vm_allocate only
* makes sense on platforms where a user task can either be 32 or 64, or the kernel
* task can be 32 or 64. mach_vm_allocate makes sense everywhere, and is preferred
* for new code.
As long as you are only using that API in userspace and only for memory management of your old process, using the old API is still fine, even for a 64 bit process as all data types will be 64 bit in that case.
The new API is only required when working across process boundaries and only if it's not certain that both processes are both 32 or 64 bit. However, Apple has dropped 32 bit kernel support long ago and meanwhile they have also dropped 32 bit userspace support as all system libraries only ship as 64 bit library starting with 10.15 (Catalina).
On iOS the new API was never required as you could never write kernel code for iOS and the security concept of iOS forbids direct interaction with other process spaces. On iOS you can only use that API to interact with your own process space and for that it will always have correct data types.

a disastrous slowdown of cudaMalloc in nvidia drivers from version 285

recent years, we have used CUDA for time-critical tasks within many of our 64-bit projects. A few days ago I updated the nvidia drivers on my development system and found a disastrous slowdown of the algorithms associated with CUDA. After some digging, it became clear that many sequential calls of a cudaMalloc lead to a latency increase (with each next call):
void *p[65000];
for (int n = 0; 65000 > n; n++)
cudaMalloc(&p[n], 256);
this code runs for about 4 seconds on nvidia drivers up to version 285 , but starting with the drivers version 285 execution of this code takes more than 8 minutes (120 times slower). tested on GeForce GTX 560Ti, GeForce GTX 460 and Quadro FX4600 on different x64 systems.
Well, the question is: is it a bug of the new drivers? or is it, maybe, some kind of attempt to deal with fragmentation and improving memory management in CUDA (through more complicated allocation)? or something else?
UPDATE:
I have reported this issue to nvidia and was answered that they were able to reproduce it and have assigned it for investigation.

I tracked this down based on the OP's bug report. Turns out it was a known issue already reported and it is fixed in CUDA 5.0. If you download the CUDA 5.0 preview (available to registered CUDA developers) Release Candidate or later you should see an improvement.
edit: The fix will be in the CUDA 5 RC, not in the preview. So as of this edit (May 31, 2012), the fix is not yet available.

Ubuntu 32 bit maximum address space

Jeff covered this a while back on his blog in terms of 32 bit Vista.
Does the same 32 bit 4 GB memory cap that applies in 32 bit Vista apply to 32 bit Ubuntu? Are there any 32 bit operating systems that have creatively solved this problem?

Ubuntu server has PAE enabled in the kernel, the desktop version does not have this feature enabled by default.
This explains, by the way, why Ubuntu server does not work in some hardware emulators whereas the desktop edition does

Well, with windows, there's something called PAE, which means you can access up to 64 GB of memory on a windows machine. The downside is that most apps don't support actually using more than 4 GB of RAM. Only a small number of apps, like SQL Server are programmed to actually take advantage of all the extra memory.

Yes, 32 bit ubuntu has the same memory limitations.
There are exceptions to the 4GB limitation, but they are application specific... As in, Microsoft Sql Server can use 16 gigabytes with "Physical address Extensions" [PAE] configured and supported and... ugh
http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=3703755&SiteID=17
Also drivers in ubuntu and windows both reduce the amount of memory available from the 4GB address space by mapping memory from that 4GB to devices. Graphics cards are particularly bad at this, your 256MB graphics card is using up at least 256MB of your address space...
If you can [your drivers support it, and cpu is new enough] install a 64 bit os. Your 32 bit applications and games will run fine.

There seems to be some confusion around PAE. PAE is "Page Address Extension", and is by no means a Windows feature. It is a hack Intel put in their Pentium II (and newer) chips to allow machines to access 64GB of memory. On Windows, applications need to support PAE explicitely, but in the open source world, packages can be compiled and optimized to your liking. The packages that could use more than 4GB of memory on Ubuntu (and other Linux distro's) are compiled with PAE support. This includes all server-specific software.

In theory, all 32-bit OSes have that problem. You have 32 bits to do addressing.
2^32 bits / 2^10 (bits per kb) / 2^10 (kb per mb) / 2^10 (mb per gb) = 2^2 = 4gb.
Although there are some ways around it. (Look up the jump from 16-bit computing to 32-bit computing. They hit the same problem.)

Linux supports a technology called PAE that lets you use more than 4GB of memory, however I don't know whether Ubuntu has it on by default. You may need to compile a new kernel.
Edit: Some threads on the Ubuntu forums suggest that the server kernel has PAE on by default, you could try installing that.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart