I found that using pyrDown and pyrUp makes my DownUp full of zeros for some odd reason. However, when I do this normally on the cpu, the results are perfectly fine.
NOTE : I'm using opencv4tegra on the jetson tk1 if that matters at all.
for (int i = 0; i < Pyramid_Size; i++) {
cv::gpu::pyrDown(DownUp, DownUp);
}
for (int i = 0; i < Pyramid_Size; i++){
cv::gpu::pyrUp(DownUp, DownUp);
}
Anyone know why this may be?
edit:
DownUp.upload(Input);
GpuMat buffer;
DownUp.copyTo(buffer);
for (int i = 0; i < Pyramid_Size; i++, DownUp.copyTo(buffer)) {
cv::gpu::pyrDown(buffer, DownUp);
}
for (int i = 0; i < Pyramid_Size; i++, DownUp.copyTo(buffer)){
cv::gpu::pyrUp(buffer, DownUp);
GpuMat a = GpuMat(DownUp.size(), CV_32F);
a.setTo(20.0f);
cv::gpu::add(DownUp, a, DownUp);
}
this is now working in my code but it is SIGNIFICANTLY slower than the cpu version. This gpu version takes around 1.6-2 seconds total to run and the cpu takes 0.1 seconds.
I also noticed the amount of time it takes to send data from host to device takes a lot longer than it does to simply process on the cpu. Is there anyway in opencv to speed this up? I'm definitely doing something wrong, even large 5mp images are faster to down / up sample on the cpu.
Both gpu::pyrDown and gpu::pyrUp in OpenCV 2.4 can't operate in-place. Please use separete GpuMat objects for input and output.
Related
I wonder if there is a better way to reduce the quality of Texture2D. My situation here is that I want to send a captured photo to my server on iOS, but the captured photo size is too big, which takes so much time to get response from server. For some reason, the texture2D got from camera do not support mipmap (in which I can getPixels(2) to directly reduce the size). To still use mipmap, I have to create a Texture2D (tmp) and copy it from captured photo, and then create another texture2D to get high mipmap level of tmp. But it costs to much memory on iOS, which may cause crashes. Any idea that can solve my problem?
Thanks in advance.
Update
pixels = new Color[(int)(photo.width * photo.height / 16)];
pixels2 = photo.GetPixels();
for(int i = 0; i < pixels.Length; ++i){
pixels[i] = pixels2[16*i];
}
tempPhoto = new Texture2D((int)photo.width / 4, (int)photo.height / 4);
tempPhoto.SetPixels(pixels);
tempPhoto.Apply();
Should this be okay?
Solved this problem by "hard coding setting pixels"
pixels = new Color[(int)(photo.width * photo.height / 16)];
pixels2 = photo.GetPixels();
for(int i = 0; i < (int)photo.height / 4; ++i){
for(int j = 0; j < (int)photo.width / 4; ++j){
pixels[i*(int)(photo.width/4) + j] = pixels2[i*4*(int)(photo.width) + j*4];
}
}
tempPhoto = new Texture2D((int)photo.width / 4, (int)photo.height / 4);
tempPhoto.SetPixels(pixels);
tempPhoto.Apply();
But I think I still need to carefully care the usage of Color[] array, I cannot use "using" for Color[] since it not supports "IDisposable". But according to here, it seems Unity automatically handles it for us if I set both "pixels" and "pixels2" to null after using it.
here is simple cuda code.
I am testing the time of accessing global memory. read and right.
below is kernel function(test1()).
enter code here
__global__ void test1(int *direct_map)
{
int index = 10;
int index2;
for(int j=0; j<1024; j++)
{
index2 = direct_map[index];
direct_map[index] = -1;
index = index2;
}
}
direct_map is 683*1024 linear matrix and, each pixel has a offset value to access to other pixel.
index and index2 is not continued address.
this kernel function needs about 600 micro second.
But, if i delete the code,
direct_map[index] = -1;
just takes 27 micro second.
I think the code already read the value of direct_map[index] from global memory from
index2 = direct_map[index];
then, it should be located L2 cache.
So, when doing "direct_map[index] = -1;", the speed should be fast.
And, I tested random writing to global memory(test2()).
It takes about 120 micro seconds.
enter code here
__global__ void test2(int *direct_map)
{
int index = 10;
for(int j=0; j<1024; j++)
{
direct_map[index] = -1;
index = j*683 + j/3 - 1;
}
}
So, I don't know why test1() takes over than 600 micro seconds.
thank you.
When you delete the code line:
direct_map[index] = -1;
your kernel isn't doing anything useful. The compiler can recognize this and eliminate most of the code associated with the kernel launch. That modification to the kernel code means that the kernel no longer affects any global state and the code is effectively useless, from the compiler's perspective.
You can verify this by dumping the assembly code that the compiler generates in each case, for example with cuobjdump -sass myexecutable
Anytime you make a small change to the code and see a large change in timing, you should suspect that the change you made has allowed the compiler to make different optimization decisions.
I need to compute sum of elements in all columns separately.
Now I'm using:
Matrix cross_corr should be summed.
Mat cross_corr_summed;
for (int i=0;i<cross_corr.cols;i++)
{
double column_sum=0;
for (int k=0;k<cross_corr.rows;k++)
{
column_sum +=cross_corr.at<float>(k,i);
}
cross_corr_summed.push_back(column_sum);
}
The problem is that my program takes quite a long time to run. This is one of parts that is suspicious to cause this.
Can you advise any possible faster implementation???
Thanks!!!
You need a cv::reduce:
cv::reduce(cross_corr, cross_corr_summed, 0, CV_REDUCE_SUM, CV_32S);
If you know that your data is continuous and single-channeled, you can access the matrix data directly:
int width = cross_corr.cols;
float* data = (float*)cross_corr.data;
Mat cross_corr_summed;
for (int i=0;i<cross_corr.cols;i++)
{
double column_sum=0;
for (int k=0;k<cross_corr.rows;k++)
{
column_sum += data[i + k*width];
}
cross_corr_summed.push_back(column_sum);
}
which will be faster than your use of .at_<float>(). In general I avoid the use of .at() whenever possible because it is slower than direct access.
Also, although cv::reduce() (suggested by Andrey) is much more readable, I have found it is slower than even your implementation in some cases.
Mat originalMatrix;
Mat columnSum;
for (int i = 0; i<originalMatrix.cols; i++)
columnSum.push_back(cv::sum(originalMatrix.col(i))[0]);
I have read various posts here at StackOverflow regarding the execution of FFT on accelerometer data, but none of them helped me understand my problem.
I am executing this FFT implementation on my accelerometer data array in the following way:
int length = data.size();
double[] re = new double[256];
double[] im = new double[256];
for (int i = 0; i < length; i++) {
input[i] = data[i];
}
FFT fft = new FFT(256);
fft.fft(re, im);
float outputData[] = new float[256];
for (int i = 0; i < 128; i++) {
outputData[i] = (float) Math.sqrt(re[i] * re[i]
+ im[i] * im[i]);
}
I plotted the contents of outputData (left,) and also used R to perform the FFT on my data (right.)
What am I doing wrong here? I am using the same code for executing the FFT that I see in other places.
EDIT: Following the advice of #PaulR to apply a windowing function, and the link provided by #BjornRoche (http://baumdevblog.blogspot.com.br/2010/11/butterworth-lowpass-filter-coefficients.html), I was able to solve my problem. The solution is pretty much what is described in that link. This is my graph now: http://imgur.com/wGs43
The low frequency artefacts are probably due to a lack of windowing. Try applying a window function.
The overall shift is probably due to different scaling factors in the two different FFT implementations - my guess is that you are seeing a shift of 24 dB which corresponds to a difference in scaling by a factor of 256.
Because all your data on left are above 0, for frequency analyze it is a DC signal. So after your fft, it abstract the DC signal out, it is very hugh. For your scene, you only need to cut off the DC signal, just preserve the signal over 0 Hz(AC signal), that makes sense.
I am a newbie to openCV. I have installed the opencv library on a ubuntu system, compiled it and trying to look into some image/video processing apps in opencv to understand more.
I am interested to know if OpenCV library has any algorithm/class for removal flicker in captured videos? If yes what document or code should I should look deeper into?
If openCV does not have it, are there any standard implementations in some other Video processing library/SDK/Matlab,.. which provide algorithms for flicker removal from video sequences?
Any pointers would be useful
Thank you.
-AD.
I don't know any standard way to deflicker a video.
But VirtualDub is a Video Processing software which has a Filter for deflickering the video. You can find it's filter source and documents (algorithm description probably) here.
I wrote my own Deflicker C++ function. here it is. You can cut and paste this code as is - no headers needed other than the usual openCV ones.
Mat deflicker(Mat,int);
Mat prevdeflicker;
Mat deflicker(Mat Mat1,int strengthcutoff = 20){ //deflicker - compares each pixel of the frame to a previously stored frame, and throttle small changes in pixels (flicker)
if (prevdeflicker.rows){//check if we stored a previous frame of this name.//if not, theres nothing we can do. clone and exit
int i,j;
uchar* p;
uchar* prevp;
for( i = 0; i < Mat1.rows; ++i)
{
p = Mat1.ptr<uchar>(i);
prevp = prevdeflicker.ptr<uchar>(i);
for ( j = 0; j < Mat1.cols; ++j){
Scalar previntensity = prevp[j];
Scalar intensity = p[j];
int strength = abs(intensity.val[0] - previntensity.val[0]);
if(strength < strengthcutoff){ //the strength of the stimulus must be greater than a certain point, else we do not want to allow the change
//value 25 works good for medium+ light. anything higher creates too much blur around moving objects.
//in low light however this makes it worse, since low light seems to increase contrasts in flicker - some flickers go from 0 to 255 and back. :(
//I need to write a way to track large group movements vs small pixels, and only filter out the small pixel stuff. maybe blur first?
if(intensity.val[0] > previntensity.val[0]){ // use the previous frames value. Change it by +1 - slow enough to not be noticable flicker
p[j] = previntensity.val[0] + 1;
}else{
p[j] = previntensity.val[0] - 1;
}
}
}
}//end for
}
prevdeflicker = Mat1.clone();//clone the current one as the old one.
return Mat1;
}
Call it as: Mat= deflicker(Mat). It needs a loop, and a greyscale image, like so:
for(;;){
cap >> frame; // get a new frame from camera
cvtColor( frame, src_grey, CV_RGB2GRAY ); //convert to greyscale - simplifies everything
src_grey = deflicker(src_grey); // this is the function call
imshow("grey video", src_grey);
if(waitKey(30) >= 0) break;
}