OpenCL can not detect my AMD GPU using OpenCV - opencv

I am using AMD Radeon R9 M375. I tried following this answer https://stackoverflow.com/a/34250412/8731839 but it didn't work for me.
I followed this: http://answers.opencv.org/question/108646/opencl-can-not-detect-my-nvidia-gpu-via-opencv/?answer=108784#post-id-108784
Here is my output from clinfo.exe
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon (TM) R9 M375
Device Topology: PCI[ B#4, D#0, F#0 ]
Max compute units: 10
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1015Mhz
Address bits: 32
Max memory allocation: 3019898880
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 3221225472
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Max pipe arguments: 0
Max pipe active reservations: 0
Max pipe packet size: 0
Max global variable size: 0
Max global variable preferred total size: 0
Max read/write image args: 0
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 00007FFF209D0188
Name: Capeverde
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 2348.3
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2348.3)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing
cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing
cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event cl_amd_liquid_flash
Device Type: CL_DEVICE_TYPE_CPU
Vendor ID: 1002h
Board name:
Max compute units: 4
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 8
Preferred vector width double: 4
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 8
Native vector width double: 4
Max clock frequency: 2200Mhz
Address bits: 64
Max memory allocation: 2147483648
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 64
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 32768
Global memory size: 8499593216
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 2147483648
Max global variable size: 1879048192
Max global variable preferred total size: 1879048192
Max read/write image args: 64
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 465
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 00007FFF209D0188
Name: Intel(R) Core(TM) i5-5200U CPU # 2.20GHz
Vendor: GenuineIntel
Device OpenCL C version: OpenCL C 1.2
Driver version: 2348.3 (sse2,avx)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2348.3)
What works:
std::vector<cv::ocl::PlatformInfo> platforms;
cv::ocl::getPlatfomsInfo(platforms);
//OpenCL Platforms
for (size_t i = 0; i < platforms.size(); i++)
{
//Access to Platform
const cv::ocl::PlatformInfo* platform = &platforms[i];
//Platform Name
std::cout << "Platform Name: " << platform->name().c_str() << "\n";
//Access Device within Platform
cv::ocl::Device current_device;
for (int j = 0; j < platform->deviceNumber(); j++)
{
//Access Device
platform->getDevice(current_device, j);
//Device Type
int deviceType = current_device.type();
cout << "Device Number: " << platform->deviceNumber() << endl;
cout << "Device Type: " << deviceType << endl;
}
}
The above code displays
Platform Name: Intel(R) OpenCL
Device Number: 2
Device Type: 2
Device Number: 2
Device Type: 4
Platform Name: AMD Accelerated Parallel Processing
Device Number: 2
Device Type: 4
Device Number: 2
Device Type: 2
How do I go about making a Context from here using AMD as my GPU? The linked post says to use method initializeContextFromHandlerbut the documentation on OpenCV is not sufficient enough. Documentation Link

Issue is resolved. I don't know what I did but AMD is working now.
Current settings (On Windows):
Environment Variable:
Name: OPENCV_OPENCL_DEVICE
Value: AMD:GPU:Capeverde
Using setUseOpenCL(bool foo) present in ocl.hpp to select whether to use GPU or CPU.
Most likely problem: In my actual code, I wasn't doing any computation but when I wrote a simple code for subtraction of two matrices, AMD started working.
Code:
#include <opencv2/core/ocl.hpp>
#include <opencv2/opencv.hpp>
int main() {
cv::UMat mat1 = cv::UMat::ones(10, 10, CV_32F);
cv::UMat mat2 = cv::UMat::zeros(10, 10, CV_32F);
cv::UMat output = cv::UMat(10, 10, CV_32F);
cv::subtract(mat1, mat2, output);
std::cout << output << "\n";
std::getchar();
}

Related

cv::cuda::GpuMat::create allocates much more than requested

I'm using the latest OpenCV 4.x with CUDA supoprt + CUDA 11.6.
I want to allocate GpuMat image in device memory by doing so:
cv::cuda::GpuMat test1;
test1.create(100, 1000000, CV_8UC1);
and I measure consumed memory before create function call and after (using nvidia-smi tool).
Before:
| 0 N/A N/A 372354 C ...aur/example_build/example 199MiB |
After:
| 0 N/A N/A 389636 C ...aur/example_build/example 295MiB |
So + ~100 MB - makes sense.
But when I allocate the image this way (changed W and H):
cv::cuda::GpuMat test1;
test1.create(1000000, 100, CV_8UC1);
I see this:
Before:
| 0 N/A N/A 379124 C ...aur/example_build/example 199MiB |
After:
| 0 N/A N/A 379124 C ...aur/example_build/example 689MiB |
I expected the same increment as in test1 though.
In various cases, consumption is x5 more than expected, when the image is "high and narrow". What do I understand wrong?
In various cases, consumption is x5 more than expected, when the image is "high and narrow". What do I understand wrong?
OpenCV GpuMat uses a pitched allocation. If the minimum pitch is for example 512 bytes, then allocating a "narrow" image is going to be extra-expensive.
On my tesla V100, the minimum pitch (kind of like saying the minimum "width" for each line) for a pitched allocation is 512. 512/100 = 5x.
No I don't have any suggestions for workarounds. Allocate a wider image. Or accept the extra cost.
I think most CUDA GPUs will have a minimum pitch of 512 bytes, because the minimum texture alignment is 512 bytes. You can use the following code to find yours:
$ cat t2060.cu
#include <iostream>
int main(){
char *d;
size_t p;
cudaMallocPitch(&d, &p, 1, 100);
std::cout << p << std::endl;
}
$ nvcc -o t2060 t2060.cu
$ compute-sanitizer ./t2060
========= COMPUTE-SANITIZER
512
========= ERROR SUMMARY: 0 errors
$
(As an aside, I don't know how you decided that your first example shows +100MB. I see 199MiB and 201MiB. The difference between those two appears to be 2MB. But this doesn't seem to be the crux of your question, and the 500MB allocation for a 100MB image of width 100 bytes is explained above.)

MedianBlur() calculate max kernel size

I want to use MedianBlur function with very high Ksize, like 301 or more. But if I pass ksize too high, sometimes the function will crash. The error message is:
OpenCV Error: (k < 16) in cv::medianBlur_8u_O1, in file ../opencv\modules\imgproc\src\smooth.cpp
(I use opencv4nodejs, but I also tried the original OpenCV 3.4.6).
I did reduce the ksize in a try/catch loop, but not so effective, since I have to work with videos.
I did checkout the OpenCV source code and did some research.
In OpenCV 3.4.6, the crash come from line 241, file opencv\modules\imgproc\src\median_blur.simd.hpp:
for ( k = 0; k < 16 ; ++k )
{
sum += H.coarse[k];
if ( sum > t )
{
sum -= H.coarse[k];
break;
}
}
CV_Assert( k < 16 ); // Error here
t is caculated base on ksize. But sum and H.coarse array's calculations are quite complicated.
Did further researches, I found a scientific document about the algorithm: https://www.researchgate.net/publication/321690537_Efficient_Scalable_Median_Filtering_Using_Histogram-Based_Operations
I am trying to read but honestly, I don't understand too much.
How do I calculate the maximum ksize with a given image?
The maximum kernel size is determined from the bit depth of the image. As mentioned in the publication you cited:
"An 8-bit value is limited to a max value of 255. Our goal is to
support larger kernel sizes, including kernels that are greater in
size than 17 × 17, thus the larger 32-bit data type is used"
so for an image of data type CV_8U the maximum kernel size is 255.

GPU vs CPU end to end latency for dynamic image resizing

I have currently used OpenCV and ImageMagick for some throughput benchmarking and I am not finding working with GPU to be much faster than CPUs. Our usecase on site is to resize dynamically to the size requested from a master copy based on a service call and trying to evaluate if having GPU makes sense to resize per service call dynamically.
Sharing the code I wrote for OpenCV. I am running the following function for all the images stored in a folder serially and Ultimately I am running N such processes to achieve X number of image resizes.I want to understand if my approach is incorrect to evaluate or if the usecase doesn't fit typical GPU usecases. And what exactly might be limiting GPU performance. I am not even maximizing the utilization to anywhere close to 100%
resizeGPU.cpp:
{
cv::Mat::setDefaultAllocator(cv::cuda::HostMem::getAllocator (cv::cuda::HostMem::AllocType::PAGE_LOCKED));
auto t_start = std::chrono::high_resolution_clock::now();
Mat src = imread(input_file,CV_LOAD_IMAGE_COLOR);
auto t_end_read = std::chrono::high_resolution_clock::now();
if(!src.data){
std::cout<<"Image Not Found: "<< input_file << std::endl;
return;
}
cuda::GpuMat d_src;
d_src.upload(src,stream);
auto t_end_h2d = std::chrono::high_resolution_clock::now();
cuda::GpuMat d_dst;
cuda::resize(d_src, d_dst, Size(400, 400),0,0, CV_INTER_AREA,stream);
auto t_end_resize = std::chrono::high_resolution_clock::now();
Mat dst;
d_dst.download(dst,stream);
auto t_end_d2h = std::chrono::high_resolution_clock::now();
std::cout<<"read,"<<std::chrono::duration<double, std::milli>(t_end_read-t_start).count()<<",host2device,"<<std::chrono::duration<double, std::milli>(t_end_h2d-t_end_read).count()
<<",resize,"<<std::chrono::duration<double, std::milli>(t_end_resize-t_end_h2d).count()
<<",device2host,"<<std::chrono::duration<double, std::milli>(t_end_d2h-t_end_resize).count()
<<",total,"<<std::chrono::duration<double, std::milli>(t_end_d2h-t_start).count()<<endl;
}
resizeCPU.cpp:
auto t_start = std::chrono::high_resolution_clock::now();
Mat src = imread(input_file,CV_LOAD_IMAGE_COLOR);
auto t_end_read = std::chrono::high_resolution_clock::now();
if(!src.data){
std::cout<<"Image Not Found: "<< input_file << std::endl;
return;
}
Mat dst;
resize(src, dst, Size(400, 400),0,0, CV_INTER_AREA);
auto t_end_resize = std::chrono::high_resolution_clock::now();
std::cout<<"read,"<<std::chrono::duration<double, std::milli>(t_end_read-t_start).count()<<",resize,"<<std::chrono::duration<double, std::milli>(t_end_resize-t_end_read).count()
<<",total,"<<std::chrono::duration<double, std::milli>(t_end_resize-t_start).count()<<endl;
Compiling : g++ -std=c++11 resizeCPU.cpp -o resizeCPU pkg-config --cflags --libs opencv
I am running each program N number of times controlled by following code : runMultipleGPU.sh
#!/bin/bash
echo $1
START=1
END=$1
for (( c=$START; c<=$END; c++ ))
do
./resizeGPU "$c" &#>/dev/null #&disown;
done
wait
echo All done
Run : ./runMultipleGPU.sh
Those timers around lead to following aggregate data
No_processes resizeCPU resizeGPU memcpyGPU totalresizeGPU
1 1.51 0.55 2.13 2.68
10 5.67 0.37 2.43 2.80
15 6.35 2.30 12.45 14.75
20 6.30 2.05 10.56 12.61
30 8.09 4.57 23.97 28.55
No of images run per process : 267
Average size of the image: 624Kb
According to data above, as we increase the number of processes(leading to increased number of simultaneous resizes) the resize perform
ance(which includes actual resize + host to device and device to host copy) increases significantly on GPU vs CPU.
Similar results after using ImageMagick which uses OpenCL beneath
Code :
setenv("MAGICK_OCL_DEVICE","OFF",1); //Turn in ON to use GPU acceleration
Image image;
auto t_start_read = std::chrono::high_resolution_clock::now();
image.read( full_path );
auto t_end_read = std::chrono::high_resolution_clock::now();
image.resize( Geometry(400,400) );
auto t_end_resize = std::chrono::high_resolution_clock::now();
Results :
No_procs resizeCPU resizeGPU
1 63.23 8.54
10 76.16 31.04
15 76.56 50.79
20 76.58 71.68
30 86.29 140.17
Test Machine configuration:
4 GPU (Tesla P100) - but test only utilizes 1 GPU
64 CPU cores (over Intel Xeon 2680 v4 CPU )
OpenCV version : 3.4.0
ImageMagick version : 6.9.9-26 Q16 x86_64 2018-01-17
Cuda Toolkit : 9.0
Highly propable this is too late to help you. However for people looking at this answer this is my suggestion to improve performance. The way you are setting pinned memory does not give you the boost you are looking for.
This is: Using
//method 1
cv::Mat::setDefaultAllocator(cv::cuda::HostMem::getAllocator(cv::cuda::HostMem::AllocType::PAGE_LOCKED));
In the comments of this discussion. Somebody suggested doing as you. The person answering said that it was slower. I was timing the implementation of a sobel derivatives close to the one in Coldvision.io Sobel The main steps are:
Read color image
gaussian blurring of the color image using a
radius of 3 and a delta of 1;
grayscale conversion;
computing the x and y gradiants
merging them into the final output image.
Instead I implemented a version swaping the order of step 2 and 3. Converting to gray scale first and then denoising the result by passing a gaussian.
I was running openCV 3.4 in windows 10. Cuda 9.0. My CPU is an i7-6820HQ. GPU is a Quadro M1200.
I try your method and this one:
//Method 2
//allocate pinned memory
cv::cuda::HostMem memory(siz.height, siz.width, CV_8U, cv::cuda::HostMem::PAGE_LOCKED);
//Read input image from the disk
Mat input = imread(input_file, CV_LOAD_IMAGE_COLOR);
if (input.empty())
{
std::cout << "Image Not Found: " << input_file << std::endl;
return;
}
input.copyTo(memory);
// copy the input image from CPU to GPU memory
cuda::GpuMat gpuInput;
cv::cuda::Stream stream;
gpuInput.upload(memory, stream);
//Do your processing...
//allocate pinned memory for output
cv::cuda::HostMem outMemory(siz.height, siz.width, CV_8U, cv::cuda::HostMem::PAGE_LOCKED);
gpuOutput.download(outMemory, stream);
cv::Mat output = outMemory.createMatHeader();
I calculated the gain as: (t1-t2)/t1*100. Where t1 is the time running the code normally. t2 running it using pinned memory. The negative values is when the method is slower than running in non-pinned memory.
image size Gain % Method 1 Gain % Method 2
800x600 2.9 8.2
1280x1024 2.5 15.3
1600x1200 0.2 7.0
2048x1536 -2.3 14.6
4096x3072 -1.0 17.2

"NVENC Feature not available for current license key type" error from nvEncoder sample

When I try to run the nvEncoder sample application included in NV Encode SDK 2.0, it fails to open an encode session. Here is the output:
C:\Users\Timothy\Downloads\nvenc_2.0_pkg\Samples\nvEncodeApp>1080p_heavyhand_3se
c.bat
C:\Users\Timothy\Downloads\nvenc_2.0_pkg\Samples\nvEncodeApp>nvEncoder -infile=.
.\yuv\1080p\HeavyHandIdiot.3sec.yuv -outfile=HeavyHandIdiot.3sec.264 -width=1920
-height=1080 -bitrate=6000000
> NVEncode configuration parameters for Encoder[0]
> GPU Device ID = 0
> Input File = ..\yuv\1080p\HeavyHandIdiot.3sec.yuv
> Output File = HeavyHandIdiot.3sec.264
> Frames [000--01] = 0 frames
> Multi-View Codec = No
> Width,Height = [1920,1080]
> Video Output Codec = 4 - H.264 Codec
> Average Bitrate = 6000000 (bps/sec)
> Peak Bitrate = 24000000 (bps/sec)
> BufferSize = 3000000
> Rate Control Mode = 2 - CBR (Constant Bitrate)
> Frame Rate (Num/Denom) = (30000/1001) 29.9700 fps
> GOP Length = 30
> Set Initial RC QP = 0
> Initial RC QP (I,P,B) = I(0), P(0), B(0)
> Number of B Frames = 0
> Display Aspect Ratio X = 1920
> Display Aspect Ratio Y = 1080
> Number of B-Frames = 0
> QP (All Frames) = 26
> QP (I-Frames) = 25
> QP (P-Frames) = 28
> QP (B-Frames) = 31
> Hiearchical P-Frames = 0
> Hiearchical B-Frames = 0
> SVC Temporal Scalability = 0
> Number of Temporal Layers = 0
> Outband SPSPPS = 0
> Video codec profile = 100
> Stereo 3D Mode = 0
> Stereo 3D Enable = No
> Number slices per Frame = 1
> Encoder Preset = 3 - High Performance (HP) Preset
> Asynchronous Mode = Yes
> YUV Input Format = NV12 (Semi-Planar UV Interleaved) Pitch Linear
> NVENC API Interface = 2 - CUDA
> Map Resource API Demo = No
> Dynamic Resolution Change = 0
> Dynamic Bitrate Change = 0
Input Filesize: 236390400 bytes
Input Filename: ..\yuv\1080p\HeavyHandIdiot.3sec.yuv
Auto-Detected (nvAppEncoderParams.endFrame = 76 frames)
>> GetNumberEncoders() has detected 1 CUDA capable GPU device(s) <<
[ GPU #0 - < GeForce GTX 670 > has Compute SM 3.0, NVENC Available ]
>> InitCUDA() has detected 1 CUDA capable GPU device(s)<<
[ GPU #0 - < GeForce GTX 670 > has Compute SM 3.0, Available NVENC ]
>> Select GPU #0 - < GeForce GTX 670 > supports SM 3.0 and NVENC
File: src\CNVEncoder.cpp, Line: 1380, nvEncOpenEncodeSessionEx() returned with e
rror 21
Note: GUID key may be invalid or incorrect. Recommend to upgrade your drivers a
nd obtain a new key
NVENC error at src\CNVEncoder.cpp:1382 code=21(NVENC Feature not available for c
urrent license key type) "nvStatus"
The API says error code 21 is NV_ENC_ERR_INCOMPATIBLE_CLIENT_KEY, with the comment:
/**
* This indicates that the client is attempting to use a feature
* that is not available for the license type for the current system.
*/
The programming guide says:
2. SETTING UP THE HARDWARE FOR ENCODING
2.1 Opening an Encode Session
After loading the NVENC Interface, the client should first call NvEncOpenEncodeSession to open an encoding session. The NVENC Interface will provide a encode session handle to the client, which must be used for all further API calls in the current session.
2.1.1 Using the License client Key GUID:
The client should pass a pointer to the key GUID that has been delivered with this SDK or has been purchased as part of a license separately, as NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::clientKeyPtr
According to the guide, the sample code is invalid, as it doesn't set NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::clientKeyPtr. But the SDK wasn't delivered with a key GUID like the guide said.
Someone had the same problem here and resolved it by using a free trial key. It seems to have been included with the 2.0 beta version of the SDK, which is no longer available.
I've also tried installing drivers 311.06, 312.07, and 314.22 with no success. I have a GeForce GTX 670.
Is there a solution?
Starting with the GeForce 334.67 driver, NVENC no longer requires a license key to use on GeForce cards.
Unfortunately, I have not been able to find the beta version of the SDK anywhere, only the final version. The only way would probably to be to find someone who downloaded the beta version.
The other way would be try to reverse engineer NVIDIA's drivers (especially with "Shadowplay" and SHIELD coming both using NVENC) or existing encoding tools that are licensed to use NVENC on Geforce cards to find a compatible key.
Another potential solution I've been watching is to simply hard mod the card into a Quadro/Tesla/GRID, which you should be able to do on your 670 (though unfortunately for me, nobody has tried it on a Titan): http://www.eevblog.com/forum/projects/hacking-nvidia-cards-into-their-professional-counterparts/
Annoyingly, NVIDIA advertised NVENC as a feature of consumer-level Kepler cards upon the launch of the GTX 680, and they've backed away from this to make it a pro-only feature. It doesn't even work with my "prosumer" $1k GTX Titans. Ironically, I don't even want to use the Titans long-term; even with NVENC, the Grid K1 or K2 would be far more suitable for my project. It would be great to get something working on my workstation/gaming rig before scaling it up (and buying a ton of NVIDIA GPUs...) instead of dropping more of my own money on GPUs... Guess it might be better to go the AMD/OpenCL route with their Open Video Encode engine instead, except Catalyst on GNU/Linux doesn't support it. Ugh.
You need a license key, which can be obtained by asking Nvidia (good luck!), or found by disassembling the shared library, or using gdb's rwatch with the bundled example code. Sorry I can't be more helpful than this.

Haartraining opencv

I am tryng to train cascades using haar training.I have used the following parameters.
C:\opencv\opencv_bin\bin>opencv_haartraining -data haar -vec train.vec -bg neg.
txt -numPos 1000 -numNeg 2000 -nstages 10 -mem 2000 -mode all -w 30 -h 32
but i am getting the following error
Data dir name: haar
Vec file name: train.vec
BG file name: neg.txt, is a vecfile: no
Num pos: 2000
Num neg: 2000
Num stages: 10
Num splits: 1 (stump as weak classifier)
Mem: 2000 MB
Symmetric: TRUE
Min hit rate: 0.995000
Max false alarm rate: 0.500000
Weight trimming: 0.950000
Equal weights: FALSE
Mode: BASIC
Width: 30
Height: 32
Applied boosting algorithm: GAB
Error (valid only for Discrete and Real AdaBoost): misclass
Max number of splits in tree cascade: 0
Min number of positive samples per cluster: 500
Required leaf false alarm rate: 0.000976563
Tree Classifier
Stage
+---+
| 0|
+---+
Number of features used : 234720
Parent node: NULL
*** 1 cluster ***
OpenCV Error: Unspecified error (Vec file sample size mismatch) in icvGetHaarTra
iningDataFromVec, file C:\Downloads\Software\OpenCV-2.2.0-win\OpenCV-2.2.0\modul
es\haartraining\cvhaartraining.cpp, line 1929
terminate called after throwing an instance of 'cv::Exception'
what(): C:\Downloads\Software\OpenCV-2.2.0-win\OpenCV-2.2.0\modules\haartrain
ing\cvhaartraining.cpp:1929: error: (-2) Vec file sample size mismatch in functi
on icvGetHaarTrainingDataFromVec
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
C:\opencv\opencv_bin\bin>cmd |as.txt
'as.txt' is not recognized as an internal or external command,
operable program or batch file.
i am using a vec file having 1000 samples which i downloaded from the internet,and have 2000 negative samples.
"Vec file sample size mismatch" - Try checking the site for the size of the samples. The vec file may not be the one for 30x32 images(which you are trying to pass as -w 30 -h 32).
This is just a guess. Try it. And try using traincascade object. It is there in $OpencvDir$/apps/traincascade/. Compile it like any other object. It can be used for LBP and HOG as well.
Hope this helps.
Regards,
Prasanna S
The ratio of w and h is different from the setting in info.txt. You should modify w's and h's of all images in info.txt int 30:32.

Resources