GPU vs CPU end to end latency for dynamic image resizing - opencv

I have currently used OpenCV and ImageMagick for some throughput benchmarking and I am not finding working with GPU to be much faster than CPUs. Our usecase on site is to resize dynamically to the size requested from a master copy based on a service call and trying to evaluate if having GPU makes sense to resize per service call dynamically.
Sharing the code I wrote for OpenCV. I am running the following function for all the images stored in a folder serially and Ultimately I am running N such processes to achieve X number of image resizes.I want to understand if my approach is incorrect to evaluate or if the usecase doesn't fit typical GPU usecases. And what exactly might be limiting GPU performance. I am not even maximizing the utilization to anywhere close to 100%
resizeGPU.cpp:
{
cv::Mat::setDefaultAllocator(cv::cuda::HostMem::getAllocator (cv::cuda::HostMem::AllocType::PAGE_LOCKED));
auto t_start = std::chrono::high_resolution_clock::now();
Mat src = imread(input_file,CV_LOAD_IMAGE_COLOR);
auto t_end_read = std::chrono::high_resolution_clock::now();
if(!src.data){
std::cout<<"Image Not Found: "<< input_file << std::endl;
return;
}
cuda::GpuMat d_src;
d_src.upload(src,stream);
auto t_end_h2d = std::chrono::high_resolution_clock::now();
cuda::GpuMat d_dst;
cuda::resize(d_src, d_dst, Size(400, 400),0,0, CV_INTER_AREA,stream);
auto t_end_resize = std::chrono::high_resolution_clock::now();
Mat dst;
d_dst.download(dst,stream);
auto t_end_d2h = std::chrono::high_resolution_clock::now();
std::cout<<"read,"<<std::chrono::duration<double, std::milli>(t_end_read-t_start).count()<<",host2device,"<<std::chrono::duration<double, std::milli>(t_end_h2d-t_end_read).count()
<<",resize,"<<std::chrono::duration<double, std::milli>(t_end_resize-t_end_h2d).count()
<<",device2host,"<<std::chrono::duration<double, std::milli>(t_end_d2h-t_end_resize).count()
<<",total,"<<std::chrono::duration<double, std::milli>(t_end_d2h-t_start).count()<<endl;
}
resizeCPU.cpp:
auto t_start = std::chrono::high_resolution_clock::now();
Mat src = imread(input_file,CV_LOAD_IMAGE_COLOR);
auto t_end_read = std::chrono::high_resolution_clock::now();
if(!src.data){
std::cout<<"Image Not Found: "<< input_file << std::endl;
return;
}
Mat dst;
resize(src, dst, Size(400, 400),0,0, CV_INTER_AREA);
auto t_end_resize = std::chrono::high_resolution_clock::now();
std::cout<<"read,"<<std::chrono::duration<double, std::milli>(t_end_read-t_start).count()<<",resize,"<<std::chrono::duration<double, std::milli>(t_end_resize-t_end_read).count()
<<",total,"<<std::chrono::duration<double, std::milli>(t_end_resize-t_start).count()<<endl;
Compiling : g++ -std=c++11 resizeCPU.cpp -o resizeCPU pkg-config --cflags --libs opencv
I am running each program N number of times controlled by following code : runMultipleGPU.sh
#!/bin/bash
echo $1
START=1
END=$1
for (( c=$START; c<=$END; c++ ))
do
./resizeGPU "$c" &#>/dev/null #&disown;
done
wait
echo All done
Run : ./runMultipleGPU.sh
Those timers around lead to following aggregate data
No_processes resizeCPU resizeGPU memcpyGPU totalresizeGPU
1 1.51 0.55 2.13 2.68
10 5.67 0.37 2.43 2.80
15 6.35 2.30 12.45 14.75
20 6.30 2.05 10.56 12.61
30 8.09 4.57 23.97 28.55
No of images run per process : 267
Average size of the image: 624Kb
According to data above, as we increase the number of processes(leading to increased number of simultaneous resizes) the resize perform
ance(which includes actual resize + host to device and device to host copy) increases significantly on GPU vs CPU.
Similar results after using ImageMagick which uses OpenCL beneath
Code :
setenv("MAGICK_OCL_DEVICE","OFF",1); //Turn in ON to use GPU acceleration
Image image;
auto t_start_read = std::chrono::high_resolution_clock::now();
image.read( full_path );
auto t_end_read = std::chrono::high_resolution_clock::now();
image.resize( Geometry(400,400) );
auto t_end_resize = std::chrono::high_resolution_clock::now();
Results :
No_procs resizeCPU resizeGPU
1 63.23 8.54
10 76.16 31.04
15 76.56 50.79
20 76.58 71.68
30 86.29 140.17
Test Machine configuration:
4 GPU (Tesla P100) - but test only utilizes 1 GPU
64 CPU cores (over Intel Xeon 2680 v4 CPU )
OpenCV version : 3.4.0
ImageMagick version : 6.9.9-26 Q16 x86_64 2018-01-17
Cuda Toolkit : 9.0

Highly propable this is too late to help you. However for people looking at this answer this is my suggestion to improve performance. The way you are setting pinned memory does not give you the boost you are looking for.
This is: Using
//method 1
cv::Mat::setDefaultAllocator(cv::cuda::HostMem::getAllocator(cv::cuda::HostMem::AllocType::PAGE_LOCKED));
In the comments of this discussion. Somebody suggested doing as you. The person answering said that it was slower. I was timing the implementation of a sobel derivatives close to the one in Coldvision.io Sobel The main steps are:
Read color image
gaussian blurring of the color image using a
radius of 3 and a delta of 1;
grayscale conversion;
computing the x and y gradiants
merging them into the final output image.
Instead I implemented a version swaping the order of step 2 and 3. Converting to gray scale first and then denoising the result by passing a gaussian.
I was running openCV 3.4 in windows 10. Cuda 9.0. My CPU is an i7-6820HQ. GPU is a Quadro M1200.
I try your method and this one:
//Method 2
//allocate pinned memory
cv::cuda::HostMem memory(siz.height, siz.width, CV_8U, cv::cuda::HostMem::PAGE_LOCKED);
//Read input image from the disk
Mat input = imread(input_file, CV_LOAD_IMAGE_COLOR);
if (input.empty())
{
std::cout << "Image Not Found: " << input_file << std::endl;
return;
}
input.copyTo(memory);
// copy the input image from CPU to GPU memory
cuda::GpuMat gpuInput;
cv::cuda::Stream stream;
gpuInput.upload(memory, stream);
//Do your processing...
//allocate pinned memory for output
cv::cuda::HostMem outMemory(siz.height, siz.width, CV_8U, cv::cuda::HostMem::PAGE_LOCKED);
gpuOutput.download(outMemory, stream);
cv::Mat output = outMemory.createMatHeader();
I calculated the gain as: (t1-t2)/t1*100. Where t1 is the time running the code normally. t2 running it using pinned memory. The negative values is when the method is slower than running in non-pinned memory.
image size Gain % Method 1 Gain % Method 2
800x600 2.9 8.2
1280x1024 2.5 15.3
1600x1200 0.2 7.0
2048x1536 -2.3 14.6
4096x3072 -1.0 17.2

Related

MedianBlur() calculate max kernel size

I want to use MedianBlur function with very high Ksize, like 301 or more. But if I pass ksize too high, sometimes the function will crash. The error message is:
OpenCV Error: (k < 16) in cv::medianBlur_8u_O1, in file ../opencv\modules\imgproc\src\smooth.cpp
(I use opencv4nodejs, but I also tried the original OpenCV 3.4.6).
I did reduce the ksize in a try/catch loop, but not so effective, since I have to work with videos.
I did checkout the OpenCV source code and did some research.
In OpenCV 3.4.6, the crash come from line 241, file opencv\modules\imgproc\src\median_blur.simd.hpp:
for ( k = 0; k < 16 ; ++k )
{
sum += H.coarse[k];
if ( sum > t )
{
sum -= H.coarse[k];
break;
}
}
CV_Assert( k < 16 ); // Error here
t is caculated base on ksize. But sum and H.coarse array's calculations are quite complicated.
Did further researches, I found a scientific document about the algorithm: https://www.researchgate.net/publication/321690537_Efficient_Scalable_Median_Filtering_Using_Histogram-Based_Operations
I am trying to read but honestly, I don't understand too much.
How do I calculate the maximum ksize with a given image?
The maximum kernel size is determined from the bit depth of the image. As mentioned in the publication you cited:
"An 8-bit value is limited to a max value of 255. Our goal is to
support larger kernel sizes, including kernels that are greater in
size than 17 × 17, thus the larger 32-bit data type is used"
so for an image of data type CV_8U the maximum kernel size is 255.

python opencv create image from bytearray

I am capturing video from a Ricoh Theta V camera. It delivers the video as Motion JPEG (MJPEG). To get the video you have to do an HTTP POST alas which means I cannot use the cv2.VideoCapture(url) feature.
So the way to do this per numerous posts on the web and SO is something like this:
bytes = bytes()
while True:
bytes += stream.read(1024)
a = bytes.find(b'\xff\xd8')
b = bytes.find(b'\xff\xd9')
if a != -1 and b != -1:
jpg = bytes[a:b+2]
bytes = bytes[b+2:]
i = cv2.imdecode(np.fromstring(jpg, dtype=np.uint8), cv2.IMREAD_COLOR)
cv2.imshow('i', i)
if cv2.waitKey(1) == 27:
exit(0)
That actually works, except it is slow. I'm processing a 1920x1080 jpeg stream. on a Mac Book Pro running OSX 10.12.6. The call to imdecode takes approx 425000 microseconds to process each image
Any idea how to do this without imdecode or make imdecode faster? I'd like it to work at 60FPS with HD video (at least).
I'm using Python3.7 and OpenCV4.
Updated Again
I looked into JPEG decoding from the memory buffer using PyTurboJPEG, the code goes like this to compare with OpenCV's imdecode():
#!/usr/bin/env python3
import cv2
from turbojpeg import TurboJPEG, TJPF_GRAY, TJSAMP_GRAY
# Load image into memory
r = open('image.jpg','rb').read()
inp = np.asarray(bytearray(r), dtype=np.uint8)
# Decode JPEG from memory into Numpy array using OpenCV
i0 = cv2.imdecode(inp, cv2.IMREAD_COLOR)
# Use default library installation
jpeg = TurboJPEG()
# Decode JPEG from memory using turbojpeg
i1 = jpeg.decode(r)
cv2.imshow('Decoded with TurboJPEG', i1)
cv2.waitKey(0)
And the answer is that TurboJPEG is 7x faster! That is 4.6ms versus 32.2ms.
In [18]: %timeit i0 = cv2.imdecode(inp, cv2.IMREAD_COLOR)
32.2 ms ± 346 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [19]: %timeit i1 = jpeg.decode(r)
4.63 ms ± 55.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Kudos to #Nuzhny for spotting it first!
Updated Answer
I have been doing some further benchmarks on this and was unable to verify your claim that it is faster to save an image to disk and read it with imread() than it is to use imdecode() from memory. Here is how I tested in IPython:
import cv2
# First use 'imread()'
%timeit i1 = cv2.imread('image.jpg', cv2.IMREAD_COLOR)
116 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Now prepare the exact same image in memory
r = open('image.jpg','rb').read()
inp = np.asarray(bytearray(r), dtype=np.uint8)
# And try again with 'imdecode()'
%timeit i0 = cv2.imdecode(inp, cv2.IMREAD_COLOR)
113 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
So, I find imdecode() around 3% faster than imread() on my machine. Even if I include the np.asarray() into the timing, it is still quicker from memory than disk - and I have seriously fast 3GB/s NVME disks on my machine...
Original Answer
I haven't tested this but it seems to me that you are doing this in a loop:
read 1k bytes
append it to a buffer
look for JPEG SOI marker (0xffdb)
look for JPEG EOI marker (0xffd9)
if you have found both the start and the end of a JPEG frame, decode it
1) Now, most JPEG images with any interesting content I have seen are between 30kB to 300kB so you are going to do 30-300 append operations on a buffer. I don't know much abut Python but I guess that may cause a re-allocation of memory, which I guess may be slow.
2) Next you are going to look for the SOI marker in the first 1kB, then again in the first 2kB, then again in the first 3kB, then again in the first 4kB - even if you have already found it!
3) Likewise, you are going to look for the EOI marker in the first 1kB, the first 2kB...
So, I would suggest you try:
1) allocating a bigger buffer at the start and acquiring directly into it at the appropriate offset
2) not searching for the SOI marker if you have already found it - e.g. set it to -1 at the start of each frame and only try and find it if it is still -1
3) only look for the EOI marker in the new data on each iteration, not in all the data you have already searched on previous iterations
4) furthermore, actually, don't bother looking for the EOI marker unless you have already found the SOI marker, because the end of a frame without the corresponding start is no use to you anyway - it is incomplete.
I may be wrong in my assumptions, (I have been before!) but at least if they are public someone cleverer than me can check them!!!
I recommend to use turbo-jpeg. It has a python API: PyTurboJPEG.

Unhandled exception when using opencv HOG descriptor compute function with OpenCL

I've recently been looking into using OpenCL to reduce a bottleneck in our application, which is a call to cv::HOGDescriptor::compute(). Using the OpenCL version seems to be a simple case of using UMat instead of Mat and I've already had some success with the following:
// Replaced the following...
// _descriptor_calculator.compute(image, descriptor_flat);
// with...
cv::UMat input = image.getUMat(cv::ACCESS_RW);
_descriptor_calculator.compute(input, descriptor_flat);
This works for images with a resolution (up to and including) 2048x1536. However for an image of 2592x1936 I get an unhandled exception, and my debugger breaks in gshandlereh.c.
GSUnwindInfo = *(PULONG)GSHandlerData;
if (IS_DISPATCHING(ExceptionRecord->ExceptionFlags)
? (GSUnwindInfo & UNW_FLAG_EHANDLER)
: (GSUnwindInfo & UNW_FLAG_UHANDLER))
{
Disposition = __CxxFrameHandler3( // <--------- breaks here
ExceptionRecord,
EstablisherFrame,
ContextRecord,
DispatcherContext
);
}
else
{
Disposition = ExceptionContinueSearch;
}
Two important points to mention:
I can fix the error by downsampling the image, so it appears to be related to the size of the image
The CPU version of the code works just fine with 2592x1936 images, so I'm reasonably confident that there isn't an error with any of the input parameters
Downsampling larger images is my fallback option, but I find the conclusion that the OpenCL version of the HOG descriptor compute function can't handle images beyond a certain size a bit unsatisfactory - the fact that there is an unhandled exception rather than an assert and a sensible error message leads me to believe it is more complicated/sinister than this..!
Thanks in advance
Can anybody shed any light on this? Any input would be appreciated.
EDIT:
As requested, a self-contained example. I was able to reproduce the problem by feeding in a random google image that was big enough (searched for jpgs of size 2592x1936):
cv::UMat img = cv::imread("image3.jpg", cv::IMREAD_COLOR).getUMat(cv::ACCESS_RW);
uint16_t image_width = img.cols;
uint16_t image_height = img.rows;
int down_scale = 1;
uint16_t _image_width = (uint16_t)ceil((float)(image_width / down_scale) / 16) * 16;
uint16_t _image_height = (uint16_t)ceil((float)(image_height / down_scale) / 16) * 16;
std::cout << _image_height << " x " << _image_width << std::endl;
std::cout << (_image_width - 16) % 16 << std::endl;
std::cout << (_image_height - 16) % 16 << std::endl;
auto descriptor_calculator = cv::HOGDescriptor(cv::Size(_image_width, _image_height), cv::Size(16, 16), cv::Size(16, 16), cv::Size(16, 16), 9);
std::vector<float> descriptor_flat;
descriptor_calculator.compute(img, descriptor_flat);
Similar to the original problem, works for small enough images, fails for large enough images. Unfortunately did not manage to glean much more information from running outside of the IDE or setting the IDE to break on all exceptions - the only clue that I've got is that the exception is an access violation. I wondered if it was the call to ceil that was causing an access violation, but the resolution is divisible by 16 so not surprisingly changing to floor made no difference.
Additional info as requested:
Running in Windows
Compiler/IDE is Visual Studio 2015
OpenCV version is 3.4.0
Thanks for the help, happy to provide more info

CUDA not running in OpenCV even after successful build

I am trying to build OpenCV 2.4.10 on a Win 8.1 machine with CUDA 6.5. I have other third part libraries as well and they have installed successfully. I ram a simple GPU based program and I got this error No GPU found or the library was compiled without GPU support. I also ran the sample exe files like performance_gpu.exe that were built during the installation and I got the same error. I also had WITH_CUDA flag checked. Following are the flags (related to CUDA) that were set during the CMAKE build.
WITH_CUDA : Checked
WITH_CUBLAS : Checked
WITH_CUFFT : Checked
CUDA_ARCH_BIN : 1.1 1.2 1.3 2.0 2.1(2.0) 3.0 3.5
CUDA_ARCH_PTX : 3.0
CUDA_FAST_MATH : Checked
CUDA_GENERATION : Auto
CUDA_HOST_COMPILER : $(VCInstallDir)bin
CUDA_SPERABLE_COMPILATION : Unchecked
CUDA_TOOLKIT_ROOT_DIR : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v6.5
Another thing is that in some posts I have read that along with CUDA the built takes a lot of time. My build takes ~ 3 Hrs where maximum time is taken up during the compilation of .cu files. I have not got any errors as far as I know during the compilation of those files.
In some posts I have seen that people talk about a directory names gpu inside the build directory but I don't see any in mine!
I am using Visual Studio 2013.
What could be the issue? Please help!
UPDATE:
I again tried to build opencv and this time before starting the build I added the bin, lib and include directories of CUDA. After the build in E:\opencv\build\bin\Release I ran gpu_perf4au.exe and I got this output
[----------]
[ INFO ] Implementation variant: cuda.
[----------]
[----------]
[ GPU INFO ] Run test suite on GeForce GTX 860M GPU.
[----------]
Time compensation is 0
OpenCV version: 2.4.10
OpenCV VCS version: unknown
Build type: release
Parallel framework: tbb
CPU features: sse sse2 sse3 ssse3 sse4.1 sse4.2 avx avx2
[----------]
[ GPU INFO ] Run on OS Windows x64.
[----------]
*** CUDA Device Query (Runtime API) version (CUDART static linking) ***
Device count: 1
Device 0: "GeForce GTX 860M"
CUDA Driver Version / Runtime Version 6.50 / 6.50
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 2048 MBytes (2147483648 bytes)
GPU Clock Speed: 1.02 GHz
Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3
D=(4096,4096,4096)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16
384) x 2048
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): Yes
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
Default (multiple host threads can use ::cudaSetDevice() with device simul
taneously)
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.50, CUDA Runtime Ver
sion = 6.50, NumDevs = 1
I thought that every thing was fine but after running this program where I had included all opencv and CUDA directories in its property files,
#include <cv.h>
#include <highgui.h>
#include <iostream>
#include <opencv2\opencv.hpp>
#include <opencv2\gpu\gpu.hpp>
using namespace std;
using namespace cv;
char key;
Mat thresholder (Mat input) {
gpu::GpuMat dst, src;
src.upload(input);
gpu::threshold(src, dst, 128.0, 255.0, CV_THRESH_BINARY);
Mat result_host(dst);
return result_host;
}
int main(int argc, char* argv[]) {
cvNamedWindow("Camera_Output", 1);
CvCapture* capture = cvCaptureFromCAM(CV_CAP_ANY);
while (1){
IplImage* frame = cvQueryFrame(capture);
IplImage* gray_frame = cvCreateImage(cvGetSize(frame), IPL_DEPTH_8U, 1);
cvCvtColor(frame, gray_frame, CV_RGB2GRAY);
Mat temp(gray_frame);
Mat thres_temp;
thres_temp = thresholder(temp);
//cvShowImage("Camera_Output", frame); //Show image frames on created window
imshow("Camera_Output", thres_temp);
key = cvWaitKey(10);
if (char(key) == 27){
break; //If you hit ESC key loop will break.
}
}
cvReleaseCapture(&capture);
cvDestroyWindow("Camera_Output");
return 0;
}
I got the error:
OpenCV Error: No GPU support (The library is compiled without CUDA support) in E
mptyFuncTable::mallocPitch, file C:\builds\2_4_PackSlave-win64-vc12-shared\openc
v\modules\dynamicuda\include\opencv2/dynamicuda/dynamicuda.hpp, line 126
Thanks to #BeRecursive for giving me a lead to solve my issue. The CMAKE build log has three unavailable opencv modules namely androidcamera, dynamicuda and viz. I could not find any information on dynamicuda i.e. the module whose unavailability might have caused the error that I mentioned in the question. Instead I searched for viz module and checked how is it installed.
After going through some blogs and forums I found out that viz module has not been included in the pre-built versions of OpenCV. It was recommended to build from source version 2.4.9. I thought to give it a try and I installed it with VS 2013 and CMAKE 3.0.1 but there were many build failures and warnings. Upon further search I found that CMAKE versions 3.0.x aren't recommended for building OpenCV as they are producing many warnings.
At last I decided to switch to VS 2010 and CMAKE 2.8.12.2 and after building the source I got no error and luckily the after adding all executables, libraries and DLLs in the PATH, when I ran my program that I have mentioned above I got no errors but it is running very slowly! So I ran this program:
#include <cv.h>
#include <highgui.h>
#include <iostream>
#include <opencv2\opencv.hpp>
#include <opencv2\core\core.hpp>
#include <opencv2\gpu\gpu.hpp>
#include <opencv2\highgui\highgui.hpp>
using namespace std;
using namespace cv;
Mat thresholder(Mat input) {
cout << "Beginning thresholding using GPU" << endl;
gpu::GpuMat dst, src;
src.upload(input);
cout << "upload done ..." << endl;
gpu::threshold(src, dst, 128.0, 255.0, CV_THRESH_BINARY);
Mat result_host(dst);
cout << "Thresolding complete!" << endl;
return result_host;
}
int main(int argc, char** argv) {
Mat image, gray_image;
image = imread("desert.jpg", CV_LOAD_IMAGE_COLOR); // Read the file
if (!image.data) {
cout << "Could not open or find the image" << endl;
return -1;
}
cout << "Orignal image loaded ..." << endl;
cvtColor(image, gray_image, CV_BGR2GRAY);
cout << "Original image converted to Grayscale" << endl;
Mat thres_image;
thres_image = thresholder(gray_image);
namedWindow("Original Image", WINDOW_AUTOSIZE);// Create a window for display.
namedWindow("Gray Image", WINDOW_AUTOSIZE);
namedWindow("GPU Threshed Image", WINDOW_AUTOSIZE);
imshow("Original Image", image);
imshow("Gray Image", gray_image);
imshow("GPU Threshed Image", thres_image);
waitKey(0);
return 0;
}
Later I even tested the build on VS 2013 and it also worked.
The GPU based programs are slow due to reasons mentioned here.
So three important things I want to point out:
BUILD from source only
Use a little older version of CMAKE
Prefer VS 2010 for building the binaries.
NOTE:
This might sound weird but all my first BUILDS failed due to some linker error. So, I don't know whether this is work around or not but try to build opencv_gpu before anything and all other modules one by one after that and then build ALL_BUILDS and INSTALL projects.
When you build this way in DEBUG mode you might get an error iff you are building opencv with Python support i.e. "python27_d.lib" otherwise all projects will be built successfully.
WEB SOURCES:
Following are web sources that helped me in solving my problem:
http://answers.opencv.org/question/32502/opencv-249-viz-module-not-there/
http://home.eps.hw.ac.uk/~cgb7/opencv/opencv_tutorial.pdf
http://perso.uclouvain.be/allan.barrea/opencv/opencv.html
http://eavise.wikispaces.com/Building+OpenCV+yourself+on+Windows+7+x64+with+OpenCV+2.4.5+and+CUDA+5.0
https://devtalk.nvidia.com/default/topic/767647/how-do-i-enable-cuda-when-installing-opencv-/
So that is a run time error, being thrown by OpenCV. If you take a look at your CMake log fro your previous question, you can see that one of the Unavailable packages was dynamiccuda, which appears to be what that error is complaining about.
However, I don't have a lot of experience with Windows OpenCV so that could be a red herring. My gut feeling says that you don't have all the libraries correctly on the path. Have you made sure that you have the CUDA lib/include/bin on the PATH? Have you made sure that you have your OpenCV build lib/include directory on the path. Windows has a very simple linking order that essentially just includes the current directory, anything on the PATH and the main Windows directories. So, I would try making sure everything was correctly on the PATH/that you have copied all the correct libraries into the folder.
A note: this is different from a compiling/linking error because it is at RUNTIME. So setting the compiler paths will not help with runtime linking errors.

Matching image to images collection

I have large collecton of card images, and one photo of particular card. What tools can I use to find which image of collection is most similar to mine?
Here's collection sample:
Abundance
Aggressive Urge
Demystify
Here's what I'm trying to find:
Card Photo
New method!
It seems that the following ImageMagick command, or maybe a variation of it, depending on looking at a greater selection of your images, will extract the wording at the top of your cards
convert aggressiveurge.jpg -crop 80%x10%+10%+10% crop.png
which takes the top 10% of your image and 80% of the width (starting at 10% in from the top left corner and stores it in crop.png as follows:
And if your run that through tessseract OCR as follows:
tesseract crop.png agg
you get a file called agg.txt containing:
E‘ Aggressive Urge \L® E
which you can run through grep to clean up, looking only for upper and lower case letters adjacent to each other:
grep -Eo "\<[A-Za-z]+\>" agg.txt
to get
Aggressive Urge
:-)
Thank you for posting some photos.
I have coded an algorithm called Perceptual Hashing which I found by Dr Neal Krawetz. On comparing your images with the Card, I get the following percentage measures of similarity:
Card vs. Abundance 79%
Card vs. Aggressive 83%
Card vs. Demystify 85%
so, it is not an ideal discriminator for your image type, but kind of works somewhat. You may wish to play around with it to tailor it for your use case.
I would calculate a hash for each of the images in your collection, one at a time and store the hash for each image just once. Then, when you get a new card, calculate its hash and compare it to the stored ones.
#!/bin/bash
################################################################################
# Similarity
# Mark Setchell
#
# Calculate percentage similarity of two images using Perceptual Hashing
# See article by Dr Neal Krawetz entitled "Looks Like It" - www.hackerfactor.com
#
# Method:
# 1) Resize image to black and white 8x8 pixel square regardless
# 2) Calculate mean brightness of those 64 pixels
# 3) For each pixel, store "1" if pixel>mean else store "0" if less than mean
# 4) Convert resulting 64bit string of 1's and 0's, 16 hex digit "Perceptual Hash"
#
# If finding difference between Perceptual Hashes, simply total up number of bits
# that differ between the two strings - this is the Hamming distance.
#
# Requires ImageMagick - www.imagemagick.org
#
# Usage:
#
# Similarity image|imageHash [image|imageHash]
# If you pass one image filename, it will tell you the Perceptual hash as a 16
# character hex string that you may want to store in an alternate stream or as
# an attribute or tag in filesystems that support such things. Do this in order
# to just calculate the hash once for each image.
#
# If you pass in two images, or two hashes, or an image and a hash, it will try
# to compare them and give a percentage similarity between them.
################################################################################
function PerceptualHash(){
TEMP="tmp$$.png"
# Force image to 8x8 pixels and greyscale
convert "$1" -colorspace gray -quality 80 -resize 8x8! PNG8:"$TEMP"
# Calculate mean brightness and correct to range 0..255
MEAN=$(convert "$TEMP" -format "%[fx:int(mean*255)]" info:)
# Now extract all 64 pixels and build string containing "1" where pixel > mean else "0"
hash=""
for i in {0..7}; do
for j in {0..7}; do
pixel=$(convert "${TEMP}"[1x1+${i}+${j}] -colorspace gray text: | grep -Eo "\(\d+," | tr -d '(,' )
bit="0"
[ $pixel -gt $MEAN ] && bit="1"
hash="$hash$bit"
done
done
hex=$(echo "obase=16;ibase=2;$hash" | bc)
printf "%016s\n" $hex
#rm "$TEMP" > /dev/null 2>&1
}
function HammingDistance(){
# Convert input hex strings to upper case like bc requires
STR1=$(tr '[a-z]' '[A-Z]' <<< $1)
STR2=$(tr '[a-z]' '[A-Z]' <<< $2)
# Convert hex to binary and zero left pad to 64 binary digits
STR1=$(printf "%064s" $(echo "obase=2;ibase=16;$STR1" | bc))
STR2=$(printf "%064s" $(echo "obase=2;ibase=16;$STR2" | bc))
# Calculate Hamming distance between two strings, each differing bit adds 1
hamming=0
for i in {0..63};do
a=${STR1:i:1}
b=${STR2:i:1}
[ $a != $b ] && ((hamming++))
done
# Hamming distance is in range 0..64 and small means more similar
# We want percentage similarity, so we do a little maths
similarity=$((100-(hamming*100/64)))
echo $similarity
}
function Usage(){
echo "Usage: Similarity image|imageHash [image|imageHash]" >&2
exit 1
}
################################################################################
# Main
################################################################################
if [ $# -eq 1 ]; then
# Expecting a single image file for which to generate hash
if [ ! -f "$1" ]; then
echo "ERROR: File $1 does not exist" >&2
exit 1
fi
PerceptualHash "$1"
exit 0
fi
if [ $# -eq 2 ]; then
# Expecting 2 things, i.e. 2 image files, 2 hashes or one of each
if [ -f "$1" ]; then
hash1=$(PerceptualHash "$1")
else
hash1=$1
fi
if [ -f "$2" ]; then
hash2=$(PerceptualHash "$2")
else
hash2=$2
fi
HammingDistance $hash1 $hash2
exit 0
fi
Usage
I also tried a normalised cross-correlation of each of your images with the card, like this:
#!/bin/bash
size="300x400!"
convert card.png -colorspace RGB -normalize -resize $size card.jpg
for i in *.jpg
do
cc=$(convert $i -colorspace RGB -normalize -resize $size JPG:- | \
compare - card.jpg -metric NCC null: 2>&1)
echo "$cc:$i"
done | sort -n
and I got this output (sorted by match quality):
0.453999:abundance.jpg
0.550696:aggressive.jpg
0.629794:demystify.jpg
which shows that the card correlates best with demystify.jpg.
Note that I resized all images to the same size and normalized their contrast so that they could be readily compared and effects resulting from differences in contrast are minimised. Making them smaller also reduces the time needed for the correlation.
I tried this by arranging the image data as a vector and taking the inner-product between the collection image vectors and the searched image vector. The vectors that are most similar will give the highest inner-product. I resize all the images to the same size to get equal length vectors so I can take inner-product. This resizing will additionally reduce inner-product computational cost and give a coarse approximation of the actual image.
You can quickly check this with Matlab or Octave. Below is the Matlab/Octave script. I've added comments there. I tried varying the variable mult from 1 to 8 (you can try any integer value), and for all those cases, image Demystify gave the highest inner product with the card image. For mult = 8, I get the following ip vector in Matlab:
ip =
683007892
558305537
604013365
As you can see, it gives the highest inner-product of 683007892 for image Demystify.
% load images
imCardPhoto = imread('0.png');
imDemystify = imread('1.jpg');
imAggressiveUrge = imread('2.jpg');
imAbundance = imread('3.jpg');
% you can experiment with the size by varying mult
mult = 8;
size = [17 12]*mult;
% resize with nearest neighbor interpolation
smallCardPhoto = imresize(imCardPhoto, size);
smallDemystify = imresize(imDemystify, size);
smallAggressiveUrge = imresize(imAggressiveUrge, size);
smallAbundance = imresize(imAbundance, size);
% image collection: each image is vectorized. if we have n images, this
% will be a (size_rows*size_columns*channels) x n matrix
collection = [double(smallDemystify(:)) ...
double(smallAggressiveUrge(:)) ...
double(smallAbundance(:))];
% vectorize searched image. this will be a (size_rows*size_columns*channels) x 1
% vector
x = double(smallCardPhoto(:));
% take the inner product of x and each image vector in collection. this
% will result in a n x 1 vector. the higher the inner product is, more similar the
% image and searched image(that is x)
ip = collection' * x;
EDIT
I tried another approach, basically taking the euclidean distance (l2 norm) between reference images and the card image and it gave me very good results with a large collection of reference images (383 images) I found at this link for your test card image.
Here instead of taking the whole image, I extracted the upper part that contains the image and used it for comparison.
In the following steps, all training images and the test image are resized to a predefined size before doing any processing.
extract the image regions from training images
perform morphological closing on these images to get a coarse approximation (this step may not be necessary)
vectorize these images and store in a training set (I call it training set even though there's no training in this approach)
load the test card image, extract the image region-of-interest(ROI), apply closing, then vectorize
calculate the euclidean distance between each reference image vector and the test image vector
choose the minimum distance item (or the first k items)
I did this in C++ using OpenCV. I'm also including some test results using different scales.
#include <opencv2/opencv.hpp>
#include <iostream>
#include <algorithm>
#include <string.h>
#include <windows.h>
using namespace cv;
using namespace std;
#define INPUT_FOLDER_PATH string("Your test image folder path")
#define TRAIN_IMG_FOLDER_PATH string("Your training image folder path")
void search()
{
WIN32_FIND_DATA ffd;
HANDLE hFind = INVALID_HANDLE_VALUE;
vector<Mat> images;
vector<string> labelNames;
int label = 0;
double scale = .2; // you can experiment with scale
Size imgSize(200*scale, 285*scale); // training sample images are all 200 x 285 (width x height)
Mat kernel = getStructuringElement(MORPH_ELLIPSE, Size(3, 3));
// get all training samples in the directory
hFind = FindFirstFile((TRAIN_IMG_FOLDER_PATH + string("*")).c_str(), &ffd);
if (INVALID_HANDLE_VALUE == hFind)
{
cout << "INVALID_HANDLE_VALUE: " << GetLastError() << endl;
return;
}
do
{
if (!(ffd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY))
{
Mat im = imread(TRAIN_IMG_FOLDER_PATH+string(ffd.cFileName));
Mat re;
resize(im, re, imgSize, 0, 0); // resize the image
// extract only the upper part that contains the image
Mat roi = re(Rect(re.cols*.1, re.rows*35/285.0, re.cols*.8, re.rows*125/285.0));
// get a coarse approximation
morphologyEx(roi, roi, MORPH_CLOSE, kernel);
images.push_back(roi.reshape(1)); // vectorize the roi
labelNames.push_back(string(ffd.cFileName));
}
}
while (FindNextFile(hFind, &ffd) != 0);
// load the test image, apply the same preprocessing done for training images
Mat test = imread(INPUT_FOLDER_PATH+string("0.png"));
Mat re;
resize(test, re, imgSize, 0, 0);
Mat roi = re(Rect(re.cols*.1, re.rows*35/285.0, re.cols*.8, re.rows*125/285.0));
morphologyEx(roi, roi, MORPH_CLOSE, kernel);
Mat testre = roi.reshape(1);
struct imgnorm2_t
{
string name;
double norm2;
};
vector<imgnorm2_t> imgnorm;
for (size_t i = 0; i < images.size(); i++)
{
imgnorm2_t data = {labelNames[i],
norm(images[i], testre) /* take the l2-norm (euclidean distance) */};
imgnorm.push_back(data); // store data
}
// sort stored data based on euclidean-distance in the ascending order
sort(imgnorm.begin(), imgnorm.end(),
[] (imgnorm2_t& first, imgnorm2_t& second) { return (first.norm2 < second.norm2); });
for (size_t i = 0; i < imgnorm.size(); i++)
{
cout << imgnorm[i].name << " : " << imgnorm[i].norm2 << endl;
}
}
Results:
scale = 1.0;
demystify.jpg : 10989.6, sylvan_basilisk.jpg : 11990.7, scathe_zombies.jpg : 12307.6
scale = .8;
demystify.jpg : 8572.84, sylvan_basilisk.jpg : 9440.18, steel_golem.jpg : 9445.36
scale = .6;
demystify.jpg : 6226.6, steel_golem.jpg : 6887.96, sylvan_basilisk.jpg : 7013.05
scale = .4;
demystify.jpg : 4185.68, steel_golem.jpg : 4544.64, sylvan_basilisk.jpg : 4699.67
scale = .2;
demystify.jpg : 1903.05, steel_golem.jpg : 2154.64, sylvan_basilisk.jpg : 2277.42
If i understand you correctly you need to compare them as pictures. There is one very simple, but effective solution here - it's called Sikuli.
What tools can I use to find which image of collection is most similar to mine?
This tool is working very good with the image-processing and is not only capable to find if your card(image) is similar to what you have already defined as pattern, but also search partial image content (so called rectangles).
By default you can extend it's functionality via Python. Any ImageObject can be set to accept similarity_pattern in percentages and by doing so you'll be able to precisely find what you are looking for.
Also another big advantage of this tool is that you can learn basics in one day.
Hope this helps.

Resources