Is it needed to worry about Nvidia SLI with or without Surround while enumerating adapters and outputs?
Assume a fictional build:
(2x) GTX 960 in SLI
(3X) 1920x1080 displays attached in landscape mode by Surround, yielding 5760x1080
While using IDXGIFactory::EnumAdapters to enumerate adapters, does it show two of the GPUs as separate adapters or just one combined? (assuming SLI is enabled and we occluded any integrated GPUs and Microsoft Basic Render Driver). Same case for Outputs, while using IDXGIAdapter::EnumOutputs, does it return 3 distinct outputs or just one? (assuming Surround is enabled). If it returns only one, IDXGIOutput::GetDisplayModeList will include atleast one display mode that is 5760x1080, prove me right or wrong. But if it is three distinct outputs, should I create one swapchain per output?
Notes:
This doesn't answer my question properly
I don't want any NvAPI based approach, just pure Directx11
The Surround Best Practices doesn't give much information
I currently don't have any SLI+Surround setup so I can't answer my own question. And I have to deal with AMD Eyefinity as well
Related
I am a mathematician and not a programmer, I have a notion on the basics of programming and am a quite advanced power-user both in linux and windows.
I know some C and some python but nothing much.
I would like to make an overlay so that when I start a game it can get info about amd and nvidia GPUs like frame time and FPS because I am quite certain the current system benchmarks use to compare two GPUs is flawed because small instances and scenes that bump up the FPS momentarily (but are totally irrelevant in terms of user experience) result in a higher average FPS number and mislead the market either unintentionally or intentionally (for example, I cant remember the name of the game probably COD there was a highly tessellated entity on the map that wasnt even visible to the player which lead AMD GPUs to seemingly under perform when roaming though that area leading to lower average FPS count)
I have an idea on how to calculate GPU performance in theory but I dont know how to harvest the data from the GPU, Could you refer me to api manuals or references to help me making such an overlay possible?
I would like to study as little as possible (by that I mean I would like to learn what I absolutely have to learn in order to get the job done I dont intent to become a coder).
I thank you in advance.
It is generally what the Vulkan Layer system is for, which allows to intercept API commands and inject your own. But it is nontrivial to code it yourself. Here are some pre-existing open-source options for you:
To get to timing info and draw your custom overlay you can use (and modify) a tool like OCAT. It supports Direct3D 11, Direct3D 12, and Vulkan apps.
To just get the timing (and other interesting info) as CSV you can use a command-line tool like PresentMon. Should work in D3D, and I have been using it with Vulkan apps too and it seems to accept them.
I am writing Vulkan API based renderer. Currently I am trying to add MSAA for color attachment.
I was pretty sure I could use VK_SAMPLE_COUNT_16_BIT ,but limits.framebufferColorSampleCounts returns bit flags that allow MSAA level up to VK_SAMPLE_COUNT_8_BIT (inclusive)
I run on a brand new NVIDIA QUADRO RTX 3000 card. I also use latest NVIDIA driver: 441.28
I checked the limits in OpenGL and GPU caps viewer shows
GL_MAX_FRAMEBUFFER_SAMPLES = 32
How does it make sense? is the limit dictated by the Vulkan API only? And if the hardware doesn't support more than x8 then does it mean OpenGL driver simulates it on CPU, e.g via stuff like supersampling? That's what I was said by several rendering developers at khronosdev.slack ? Does it make sense? Doesn't a vendor have to compile with the standard and either implement MSAA the right way or not to implement at all?
Is that possible that OpenGL doesn't "really" support more than x8 MSAA ,but the drivers simulate it via stuff like supersampling?
UPDATE
This page explains the whole state of MSAA implmentation for OpenGL and actually it becomes clear from it why Vulkan doesn't provide more than x8 samples on my card. Here is the punch line:
Some NVIDIA drivers support multisample modes which are internally
implemented as a combination of multisampling and automatic
supersampling in order to obtain a higher level of anti-aliasing than
can be directly supported by hardware.
framebufferColorSampleCounts is flags, not a count. See this enum for the values: https://www.khronos.org/registry/vulkan/specs/1.1-extensions/man/html/VkSampleCountFlagBits.html
15 offers VK_SAMPLE_COUNT_1_BIT, VK_SAMPLE_COUNT_2_BIT, VK_SAMPLE_COUNT_4_BIT or VK_SAMPLE_COUNT_8_BIT.
This answers why you get 15, rather than a power of two, but it still begs the question why the NVidia driver is limiting you more than the OpenGL driver. Perhaps a question for the NVidia forums. You should double-check that your driver is up to date and that you're actually picking your NVidia card and not an integrated one.
I've also come across a similar problem (not Vulkan though, but OpenGL, but also NVidia): on my NVidia GeForce GTX 750 Ti, the Linux driver nvidia reports GL_MAX_SAMPLES=32, but anything higher than 8 samples results in ugly blurring of everything including e.g. text, even with glDisable(GL_MULTISAMPLING) for all rendering.
I remember seeing the same blurring problems when I enabled FXAA globally (via nvidia-settings --assign=fxaa=1) and ran KWin (KDE's compositing window manager) with this setting on. So I suspect this behavior with samples>=9 is because the driver enables FXAA in addition to (or instead of) MSAA.
I use opencl for image processing. For example, I have one 1000*800 image.
I use a 2D global size as 1000*800, and the local work size is 10*8.
In that case, will the GPU give 100*100 computing units automatic?
And do these 10000 units works at the same time so it can be parallel?
If the hardware has no 10000 units, will one units do the same thing for more than one time?
I tested the local size, I found if we use a very small size (1*1) or a big size(100*80), they are both very slow, but if we use a middle value(10*8) it is faster. So last question, Why?
Thanks!
Work group sizes can be a tricky concept to grasp.
If you are just getting started and you don't need to share information between work items, ignore local work size and leave it NULL. The runtime will pick one itself.
Hardcoding a local work size of 10*8 is wasteful and won't utilize the hardware well. Some hardware, for example, prefers work group sizes that are multiples of 32.
OpenCL doesn't specify what order the work will be done it, just that it will be done. It might do one work group at a time, or it may do them in groups, or (for small global sizes) all of them together. You don't know and you can't control it.
To your question "why?": the hardware may run work groups in SIMD (single instruction multiple data) and/or in "Wavefronts" (AMD) or "Warps" (NVIDIA). Too small of a work group size won't leverage the hardware well. Too large and your registers may spill to global memory (slow). "Just right" will run fastest, but it is hard to pick this without benchmarking. So for now, leave it NULL and let the runtime pick for you. Later, when you become an OpenCL expert and understand more about how the hardware works, you can try specifying the work group size. However, be aware that the optimal size may be different for different hardware, and there are other rules (like global size must be a multiple of local size).
I saw the presentation at the High-Perf Graphics "High-Performance Software Rasterization on GPUs" and I was very impressed of the work/analysis/comparison..
http://www.highperformancegraphics.org/previous/www_2011/media/Papers/HPG2011_Papers_Laine.pdf
http://research.nvidia.com/sites/default/files/publications/laine2011hpg_paper.pdf
My background was Cuda, then I started learning OpenGL two years ago to develop the 3d interface of EMM-Check, a field-of-view-analyze program to check if a vehicle is going to fulfill a specific standard or not. essentially you load a vehicle (or different parts), then you can move it completely or separately, add mirrors/cameras, analyze the point of view and shadows for the point of view of the driver, etc..
We are dealing with some transparent elements (mainly the field of views, but also vehicle themselves might be), therefore I wrote some rough algorithm to sort on fly the elements to be rendered (at primitive level, a kind of Painter's algorithm) but of course there are cases in which it easily fails, although for most of cases is enough..
For this reason I started googling, I found many techniques, like (dual) depth peeling, A/R/K/F-buffer, ecc ecc
But it looks like all of them suffer at high resolution and/or large number of triangles..
Since we also deal with millions of triangles (up to 10 more or less), I was looking for something else and I ended up to software renderers, compared to the hw ones, they offer free programmability but they are slower..
So I wonder if it might be possible to implement something hybrid, that is using the hardware renderer for the opaque elements and the software one (cuda/opencl) for the transparent elements and then combining the two results..
Or maybe a simple (no complex visual effect required, just position, color, simple light and properly transparency) ray-tracing algorithm in cuda/opencl might be much simpler from this point of view and give us also a lot of freedom/flexibility in the future?
I did not find anything on the net regarding this... maybe is there any particular obstacle?
I would like to know every single think/tips/idea/suggestion that you have regarding this
Ps: I also found "Single Pass Depth Peeling via CUDA Rasterizer" by Liu, but the solution from the first paper seems fair faster
http://webstaff.itn.liu.se/~jonun/web/teaching/2009-TNCG13/Siggraph09/content/talks/062-liu.pdf
I might suggest that you look at OpenRL, which will let you have hardware-accelerated raytracing?
I am importing a source code for stereo visions. The next code of the author works. It takes two cameras sources. I have two different cameras currently and i receive images. Both works. It crashes at capture2. interesting part is that if i change the orders of the webcams(Unplugging them and invert the orders) the first camera it will be the second one. We it doesn't work? I tested also with Windows XP sp3 and Windows 7 X64. The same problem.
//---------Starting WebCam----------
capture1= cvCaptureFromCAM(1);
assert(capture1!=NULL); cvWaitKey(100);
capture2= cvCaptureFromCAM(2);
assert(capture2!=NULL);
Also If i use -1 for paramters the just give me the first one(all the time).
Or any method to capture two camers using function cvCaptureFrom
Firstly the cameras are generally numbered from 0 - is this just the problem?
Secondly, directshow and multiple USB webcams is notoriously bad in windows. Sometimes it will work with two identical camera, sometimes only if they are different.
You can also try a delay between initialising the cameras, sometimes one will lock the capture stream until it is sending data, preventing the other being detected.
Often the drivers assume they are the only camera and make incorrect calls to lock up the entire capture graph. This isn't helped by it being extremely complicated to write correct drivers+fdirectshow filters in Windows
some mother board can not work with some usb 2.0 cameras. one usb 2.0 camera take 40-60% of usb controller. solution is connect second usb 2.0 camera from pci2usb controller
Get 2 PS3 Eyes, around EUR 10 each, and the free codelaboratories.com SDK, this gets you support up to 2 cameras using C, C#, Java, and AS3 incl. examples etc. You also get FIXED frame rates up 75 fps # 640*480. Their free driver only version 5.1.1.0177 provides decent DirectShow component, but for a single camera only.
COmment for the rest: Multi-cam DirectShow drivers should be a default for any manufacturer, not providing this is a direct failure to implement THE VERY BASIC PORPUSE AND FEATURE OF USB as an interface. It is also VERY EASY to implement, compared to implementing the driver itself for a particular sensor / chipset.
Alternatives that are confirmed to work in identical pairs (via DirectShow):
Microsoft Lifecam HD Cinema (use general UVC driver if you can, less limited fps)
Logitech Webcam Pro 9000 (not to be confused with QuickCam Pro 9000, which DOES NOT work)
Creative VF0220
Creative VF0330
Canyon WCAMN-1N
If you're serious about your work, get a pair of machine vision cameras to get PERFORMANCE. Cheapest on the market, with german engineering quality, CCD, CMOS, mono, colour, GigE (ethernet), USB, FireWire, excellent range of dedicated drivers:
http://www.theimagingsource.com