CUDA bounds checker? - memory

Is there a tool equivalent to a bounds checker or purify or valgrind for CUDA?
I'm basically looking for something that might tell me if I'm reading or writing outside of allocated memory.

NVIDIA have released CUDA memcheck which does exactly this. It's available in the 3.0 beta toolkit, you'll need to be a registered developer to download it. In addition, NVIDIA have also release Nexus which is their debugger/profiler for Visual Studio 2008 (Vista/7/2008) and includes memory checking (see the features list).

If you compile in emulation mode, you can use Valgrind itself to detect memory access problems in your kernels.

Related

can we run digits or caffe on Mac without GPU?

I have seen caffe installation for Mac. But I have a question. If my Mac does not have GPU, then I have no chances to use GPU?? and I have to use CPU-only?
or I have the chance of using (virtual!) GPU by NVIDIA web driver?
Moreover, can I have digits on my Mac? as I try to download it, it does not have any options for Mac download and it is just for Ubuntu!
I am very confused about these questions! Can you please make me clear about these?
The difference in architectures between CPU and GPU does not allow simple transformation of the code written for one architecture to the other. The GPU drivers are specifically written for the GPU architecture and cannot be easily virtualized. On the other hand, some software supports both. This includes OpenGL instructions and caffe (http://caffe.berkeleyvision.org/). NVidia DIGITS is based on caffe and therefore can work without a dedicated GPU (Here the thread how to install on Macs: https://github.com/NVIDIA/DIGITS/issues/88)
According to https://www.github.com/NVIDIA/DIGITS/issues/251 CUDA cannot be run on computers that do not have a dedicated NVidia GPU, but according to How to run my CUDA application on ATI or Intel card in software mode? there is a program gpuocelot that receives CUDA instructions and can work on NVidia GPU, AMD GPU and x86.
In scientific shared computing they wrote separate programs for different devices, e.g. Einstein at Home has four separate programs to find gravitational waves: CPU, NVidia GPU (CUDA), AMD GPU and ARM.
To make DIGITS work you need to
build Caffe with CPU_ONLY and tell DIGITS not to use any GPUs by
running digits-devserver with the --config flag
(https://github.com/NVIDIA/caffe/blob/v0.13.2/Makefile.config.example#L9-L10, https://github.com/NVIDIA/DIGITS/issues/251).
Other possibility:
you can still use the --config flag with the web installer. Try this:
./runme.sh --config. Choose "N" to select none.
Also a possibility:
I am trying to answer how you can choose CPU or GPUs.. Within the
caffe folder, there is a Makefile.config.example file.. Copy the
contents of this file into a new file and rename it as
"Makefile.config". If you want to use CPU, then
1. comment out the "USE_CUDNN :=1 Within "Makefile.config" file,
2. uncomment CPU_ONLY := 1
3. issue the make all command again within the caffe folder..
And if nothing helps you can do the procedure two times because it helped someone at the end of the thread.

How to enable CUDA 5.0 in opencv v2.4.4 and VC10 without CMake and solve error 'missing cudart32_42_9.dll'?

This is my first post, please accept my apologies if I am unclear or fail to completely abide with posting rules. I have in any case sought far and wide in prep for my own question.
Working with:
Windows 7 Enterprise version 6.1.7600
Intel Xeon CPU Quadcore 3.07GHz
NVidia Quadro 4000 GPU
CUDA v5.0 Toolkit for Windows x64 build
OpenCV v2.4.4
OpenCV Cuda Package belonging to opencv v2.4.4
Microsoft Visual Studios C++ 2010 Express ('vc10')
(!) Without CMake (!)
steps, tutorials & checks I've done:
I have installed and configured software I required for opencv 2.4.4 following the opencv.org tutorial....
with vc10, following the opencv.org tutorial on building opencv in vc10 (applying the there-described global method and placing the gpu-related dlls on top), but
I have not installed CMake software and never had any need for it, until I attempted moving calculations to the gpu.
I've furthermore copy-pasted all the .dll files I'm applying in this
vc10-solution into the 'Debug'-folder (placed in the same folder as where
the .sln file of this solution is).
Lastly, I've followed the NVidia developer Zone CUDA 5.0 Getting Started
Guide up to the 'Verify Installation' paragraph, with successful outcome, and also configured the Build configurations to include CUDA compilation following the 'build customization for existing
projects' instructions.
This question is about trying to speed up a win32 console .cpp that I've made in debug-mode (i.e. an visual studios solution using the win32 OpenCV library a rather simple image processing project, but with a blur with a large kernel that's taking much time) by making it run on the gpu. However, I am experiencing trouble running opencv with cuda 5.0 (even though OpenCV Cuda Package's readme.txt tells me to download and install Cuda 5.0).
Upon compiling and running in vc10 (=hitting F5, with Win32 Platform) - or likewise upon running the corresponding .exe executable-, I get an system error saying that "The program could not be started because cudart32_42_9.dll is missing on my computer".
Apparantly , even though opencv's readme tells me to use cuda5.0, it's still looking for the cuda-libraries belonging to 32bits cuda 4.2 toolkit (cudart 32 _42_9.dll) - and obviously not finding them because they're not installed.
In this question it is mentioned that OpenCV v2.4.4 simply hasn't been compiled with cuda 5.0 and the only way to make this run is to compile my own libraries using CMake.
My Question:
I am wondering if in the meantime allowing OpenCV v2.4.4 to run using x64 cuda 5.0 has become possible but WITHOUT having to compile my own libraries using CMake.
I would kindly like to ask any of you to share with me precisely what steps to take. In your solution, please write in detail, as this is only my third week of using C++ language, compilers libraries dlls and all such.
Many thanks in advance!
EDIT
This question has actually now (due to #talonmies 's comment) become much more like a question asked by
user 'duttasankha' titled 'OpenCV with cuda MS Visual Studio 2008', and
user 'zebullon' titled 'Do I need a 64 bit SDK on a 64 bit machine'.
In order to fully answer my own question:
I have been able to get CUDA 5.0 running without having to compile anything myself (e.g. without having to use CMake) or reinstalling any GPU driver software.
I followed - amongst others – duttasankha and zebullon’s posts (I named these in the EDIT in my question) and took an extra, small leap of faith.
I downloaded the 32 bits CUDA 4.2 SDK (software development kit, available on the same site as the other CUDA downloads) and
installed/extracted it. This is noteworthy because I had a newer
CUDA Toolkit and driver version (5.0) installed, which was 64bits!
I looked (windows search function) for where the SDK files had been
extracted and found cudart32_42_9.dll in the C:...\My
Documents\NVIDIA GPU Computing SDK 4.2\C\common\bin folder.
I copied all of the 32bits dll's in this folder (all the dll's
whose names end on '32_42_9.dll') and placed them (together with
the opencv-dll's I mention in the summary in my question above) in
the folder named 'Debug' which is positioned in the same folder in
which the .sln solution-file of this project is (this is the folder
where Visual Studios always places the .exe executable files
belonging to the project). I copied all of them because even though
I only got the message that this one cudart-dll was missing, the gpu
functions in opencv need all of the copied dll’s.
I had already completed the directions concerning the required
Visual Studios settings. (see opencv.org tutorial on enabling Visual
Studios 2010, doing so the global (not local) way, also see this
guide.
But now, in the Linker; Input; Additional Dependancies field I completed my
dependencies list with the cuda-related libraries. It looked like
this: C:\opencv\build\x86\vc10\lib\opencv_gpu244d.lib
C:\opencv\build\x86\vc10\lib\opencv_core244d.lib
C:\opencv\build\x86\vc10\lib\opencv_highgui244d.lib
C:\opencv\build\x86\vc10\lib\opencv_video244d.lib
C:\opencv\build\x86\vc10\lib\opencv_ml244d.lib
C:\opencv\build\x86\vc10\lib\opencv_legacy244d.lib
C:\opencv\build\x86\vc10\lib\opencv_imgproc244d.lib Notice that the cuda-related lib’s 'opencv_gpu244d.lib' and 'opencv_core244d.lib' are at
the top of this list. (Incidently, this core244d.lib is
cuda-related, because this is the opencv core library that came from
the OpenCV-2.4.4-CUDA-vc10.7z package I downloaded from
Sourceforge.com . Instructions for unpacking/ correct placement are
available in the accompanying .text-file in this 7z package from
Sourceforge).
In Visual Studios , in the Project-Folder Explorer, I rightclicked on
the name of my project (=vc10 solution) and choose
Build-configuration. Here I placed a check in the CUDA 5.0(.targets,
.props) which showed corresponding path “
$(VCTargetsPath)\BuildCustomizations\CUDA 5.0.targets “.
Now, running my code does not prompt any more system errors concerning missing DLL’s and the CUDA ‘Initialization and Information ‘ functions from the opencv.org documentation are also functioning in a new test-project I made up to check global functioning of the CUDA set-up.
Apparantly, the Driver and CUDA Toolkit of a newer version know how to cooperate with the DLL of the older version CUDA SDK.
Hope someone else will save some time when they read this. If I missed details in my description of the answer, please let me know.

OpenCv 2.4.3 prebuild seems not to use TBB/IPP

I am using OpenCv 2.4.3. I just downloaded it from their site and used the build that they have made. I did not want to take the headache of building it from the source myself. Anyway, in my machine the haar classifier gives very slow performance to detect faces. In another machine my friend runs it fine.( he built from source with TBB and IPP supprt on in cmake).
Though in the release they say that : "You do not need TBB anymore on MacOSX, iOS and Windows. BTW, the binary package for Windows is now built without TBB support. Libraries and DLLs for Visual Studio 2010 use the Concurrency framework."
I do not know much about these TBB and IPP. Only thing that I understand is making these things available will make multi-threading and parallelism possible resulting to speeding up my program.
Do I need to compile the source with cmake, TBB IPP bla bla... or there is something else that I am missing? Any ideas?
What they say, is that they have the pre-built binaries compiled in a way that does not need TBB, because they use another concurrency framework. So if You don't want to meddle in the library's settings You can use the pre-built version without sacrificing performance. But that is on Windows, iOS and MacOS.
The performance might also depend on the machines parameters (You know, cascades are power hungry), so if Your friend has a stronger machine, he will probably get better results, and OS You are operating, but I cannot tell You which is the best, as I didn't try OpenCV on anything besides Linux.

Bootable and cross platform applications and using delphi or Pascal

Is it Possible to create bootable (Applications for MBR )application using Delphi or Pascal (I know we cant use vcl , RTL and other stuffs because they depend on OS), but can i use at least Readln and writeln.
If it is true !!! Can we run the program under other OS.
but i know that PE (windows) and ELF (Linux ) formats are different. but at least with some small modification can i do it.
It's worth saying that PE is a very diverse format than ELF.
Not only a few bytes to modify... the whole layout and library access is diverse, and binding is totally diverse.
In order to boot Delphi application in console mode, you can put a small DOS system (take a look at FreeDOS, for instance), then run your Delphi application using for instance DWPL. DWPL allows to run native 32-bit protected mode DOS programs with Delphi 5-7 using the WDOSX DOS extender as the core. I used this in some old hardware with a network adapter, and it worked like a charm. If you are interested in it, I could post some updated code of DWPL.
For such targets, you should take a look at Free Pascal. By nature, you can customize it to whatever target you want. There is even diverse draft Operating Systems written using FPC. See for instance Toro or ClassiOS - the latest uses Delphi executables as source.
You can see the boot code of Toro from here, and a "main program" source code created with it.
But for direct booting applications, booting is not so difficult. The real problem is the hardware layer.
The BIOS gives very little access to it.
Just for the network layer, you'll have to take a look at EtherBoot sites and such to get some low-level network access... but it could be very time consuming to rewrite all those drivers by hand!
In short: all those "pure pascal" OS are only theoretical, running a console and some low-performance network (emulating a poor network adapter like NE2000 or such). So those "pascal" OS are only proof of concept. FAR away from a working solution! But very nice technological challenge, in all cases, very inspiring.
Why reinvent the wheel? If you want a light and fast system, use a custom Linux kernel.
Then use CrossKylix to compile your Delphi application (with no User Interface) into Linux, or even better Free Pascal.
You don't really place "applications" in the MBR.
The entire size of an MBR is 512 bytes, of which you can only use 446 for code.
Good luck creating something useful in that if you don't even have an OS to delegate functionality to yet. Basically all that you can do in the MBR is place code to start a boot loader.
Here's a page with disassembly of an MBR:
http://www.dewassoc.com/kbase/hard_drives/master_boot_record.htm
Why must you write the boot loader?
You could use a ready-made bootloader like GRUB and chainload your PE executable, from it.
Of course, this is very ancient and hairy stuff, but in the good old days, people did this win PE format executables, and a DOS Extender.
For something a little more this-century, why not make your own bootable REACTOS disk, and add your own PE executable written in Delphi to handle the "user shell"?
You could also (but this would require licensing) use the Windows PXE. I think that projects like BartPE probably fall on the gray side of legal, or are at least, unlicensed. Thus, a completely MS-free solution (reactos) for a completely self-contained kiosk PC, with ReactOS, might be more what you are looking for.
Can you write your own operating system? your own UI layer? your own video device drivers? I didn't think so. So use DOS and TurboPascal, or ReactOS and a PE win executable. Or you can use FreePascal and just build your app on a very lightweight portable Linux kernel and root filesystem.

can I alloc() in the main thread and free() in another?

I have a program that runs fine on MacOS and Linux and cross-compiles to Windows with mingw. Recently I made the program multi-threaded.
The current design of the program has memory allocated in the main thread and freed in the slave "worker" threads. That's not a problem on MacOS and Linux because the malloc/free system is multi-threaded.
I'm concerned about the cross-compiling, however. The version of mingw that I'm using is built from MacOS ports. It's a pretty ancient version of G++ (version 3.4.5) from 2004. I've been unsuccessful in my attempts to build a more recent version (I'd like to build a 64-bit version, but gave up). I'm getting pthreads from http://sourceware.org/pthreads-win32.
My concern is that the malloc & free system in 3.4.5 is not multi-threaded.
Questions:
Should I rewrite my program so that the blocks of memory to be freed are passed back to the main thread to be freed there?
Should I try to upgrade to a more recent mingw?
Is there any way to find these concurrency problems other than massive amounts of testing? That just doesn't feel good to me.
Thanks!
Why do you say malloc & free are not multithreaded?
mingw32 by default will link with msvcrt.dll which is a multithread dll. See [1]. There was[2] a single-threaded library provided by Microsoft, but it was only available for static linking.
PS: You mention that you are cross-compiling but you seem instead to be compiling the windows program in windows. In such case, Why don't you dowload the binaries from www.mingw.org? (it's a pain to figure out in their downloads the files needed, though)
1- http://msdn.microsoft.com/en-us/library/abx4dbyh%28v=VS.71%29.aspx
2- See [1]. Removed in Visual Studio 2005 http:// msdn.microsoft.com/en-us/library/abx4dbyh%28v=VS.80%29.aspx
I would avoid this. It sounds like you're trying to dodge the main issue.
Yes, that would be a good idea in any case...
One way to detect concurrency problems related to memory allocation/deallocation is a memory leak detector. I'm not sure if valgrind works on cygwin.

Resources