Need help in Parallelizing if and else condition in CUDA C program

Need help in Parallelizing if and else condition in CUDA C program - image-processing

I have written a filter for image blurring in C and it's working fine, I am trying to run in on GPU using CUDA C for faster processing. The program has a few if and else conditions as can be seen below for the C code version,
The input to the function being input image, output image, and size of columns.
void convolve_young1D(double * in, double * out, int datasize) {
int i, j;
/* Compute first 3 output elements */
out[0] = B*in[0];
out[1] = B*in[1] + bf[2]*out[0];
out[2] = B*in[2] + (bf[1]*out[0]+bf[2]*out[1]);
/* Recursive computation of output in forward direction using filter parameters bf and B */
for (i=3; i<datasize; i++) {
out[i] = B*in[i];
for (j=0; j<3; j++) {
out[i] += bf[j]*out[i-(3-j)];
}
}
}
//Calling function below
void convolve_young2D(int rows, int columns, int sigma, double ** ip_padded) {
/** \brief Filter radius */
w = 3*sigma;
/** \brief Filter parameter q */
double q;
if (sigma < 2.5)
q = 3.97156 - 4.14554*sqrt(1-0.26891*sigma);
else
q = 0.98711*sigma - 0.9633;
/** \brief Filter parameters b0, b1, b2, b3 */
double b0 = 1.57825 + 2.44413*q + 1.4281*q*q + 0.422205*q*q*q;
double b1 = 2.44413*q + 2.85619*q*q + 1.26661*q*q*q;
double b2 = -(1.4281*q*q + 1.26661*q*q*q);
double b3 = 0.422205*q*q*q;
/** \brief Filter parameters bf, bb, B */
bf[0] = b3/b0; bf[1] = b2/b0; bf[2] = b1/b0;
bb[0] = b1/b0; bb[1] = b2/b0; bb[2] = b3/b0;
B = 1 - (b1+b2+b3)/b0;
int i,j;
/* Convolve each row with 1D Gaussian filter */
double *out_t = calloc(columns+(2*w),sizeof(double ));
for (i=0; i<rows+2*w; i++) {
convolve_young1D(ip_padded[i], out_t, columns+2*w);
}
free(out_t);
Tried the same approach with blocks and threads in CUDA C but wasn't successful I have been getting zeros as output and even the input values seem to change to Zeros don't know where I am going wrong please do help. I am pretty new to CUDA C programming. Here is my attempted version of the CUDA Kernel.
__global__ void convolve_young2D( float *in, float *out,int rows,int columns, int j,float B,float bf[3],int w) {
int k;
int x = blockIdx.x * blockDim.x + threadIdx.x;
if((x>0) && (x<(rows+2*w)))
{
//printf("%d \t",x);
if(j ==0)
{
// Compute first output elements
out[x*columns] = B*in[x*columns];
}
else if(j==1)
{
out[x*columns +1 ] = B*in[x*columns +1] + bf[2]*out[x*columns];
}
else if (j== 2)
{
out[2] = B*in[x*columns +2] + (bf[1]*out[x*columns]+bf[2]*out[x*columns+1]);
}
else{
// Recursive computation of output in forward direction using filter parameters bf and B
out[x*columns+j] = B*in[x*columns+j];
for (k=0; k<3; k++) {
out[x*columns + j] += bf[k]*out[(x*columns+j)-(3-k)];
}
}
}
}
//Calling function below
void convolve_young2D(int rows, int columns, int sigma, const float * const ip_padded, float * const op_padded) {
float bf[3], bb[3];
float B;
int w;
/** \brief Filter radius */
w = 3*sigma;
/** \brief Filter parameter q */
float q;
if (sigma < 2.5)
q = 3.97156 - 4.14554*sqrt(1-0.26891*sigma);
else
q = 0.98711*sigma - 0.9633;
/** \brief Filter parameters b0, b1, b2, b3 */
float b0 = 1.57825 + 2.44413*q + 1.4281*q*q + 0.422205*q*q*q;
float b1 = 2.44413*q + 2.85619*q*q + 1.26661*q*q*q;
float b2 = -(1.4281*q*q + 1.26661*q*q*q);
float b3 = 0.422205*q*q*q;
/** \brief Filter parameters bf, bb, B */
bf[0] = b3/b0; bf[1] = b2/b0; bf[2] = b1/b0;
bb[0] = b1/b0; bb[1] = b2/b0; bb[2] = b3/b0;
B = 1 - (b1+b2+b3)/b0;
int p;
const int inputBytes = (rows+2*w) * (columns+2*w) * sizeof(float);
float *d_input, *d_output; // arrays in the GPU´s global memory
cudaMalloc(&d_input, inputBytes);
cudaMemcpy(d_input, ip_padded, inputBytes, cudaMemcpyHostToDevice);
cudaMalloc(&d_output,inputBytes);
for (p = 0; p<columns+2*w; p++){
convolve_young<<<4,500>>>(d_input,d_output,rows,columns,p,B,bf,w);
}
cudaMemcpy(op_padded, d_input, inputBytes, cudaMemcpyDeviceToHost);
cudaFree(d_input);

The first problem is that you call convolve_young<<<4,500>>>(d_input,d_output,rows,columns,p,B,bf,w); but you defined a kernel named convolve_young2D.
Another possible problem is that to do the convolution you do:
for (p = 0; p<columns+2*w; p++){
convolve_young<<<4,500>>>(d_input,d_output,rows,columns,p,B,bf,w);
}
Here you're looping over the columns instead of the rows compared to the CPU algorithm:
for (i=0; i<rows+2*w; i++) {
convolve_young1D(ip_padded[i], out_t, columns+2*w);
}
First you should try to do a direct port of your CPU algorithm, computing one line at the time, and then modify it to transfer the whole image.

Related

Implementation of LASSO in C

I am trying to understand the LASSO algorithm for linear regression. I have implemented the algorithm using naive coordinate descent method for optimization. However the coefficients that I obtained from my code, wasn't matching with those obtained from the 'glmnet'package for LASSO in R. I wanted to understand how I could make the algorithm more accurate, so that the coefficients match with those obtained from R. I think they use coordinate descent as well.
Note: I have generated some toy data with 11 observations, and 6
features(x,x^2 ,x^3,...,x^6). The last column contains the y values
generated from a dummy function (e^(-x^2)). I wanted to use LASSO to
estimate this function. Also, I have randomly picked the initial
weight vector, multiple times to crosscheck my results.
Here is my code:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<math.h>
#include<time.h>
int num_dim = 6;
int num_obs = 11;
/*Computes the normalization factor*/
float norm_feature(int j,double arr[][7],int n){
float sum = 0.0;
int i;
for(i=0;i<n;i++){
sum = sum + pow(arr[i][j],2);
}
return sum;
}
/*Computes the partial sum*/
float approx(int dim,int d_ignore,float weights[],double arr[][7],int
i){
int flag = 1;
if(d_ignore == -1)
flag = 0;
int j;
float sum = 0.0;
for(j=0;j<dim;j++){
if(j != d_ignore)
sum = sum + weights[j]*arr[i][j];
else
continue;
}
return sum;
}
/* Computes rho-j */
float rho_j(double arr[][7],int n,int j,float weights[7]){
float sum = 0.0;
int i;
float partial_sum ;
for(i=0;i<n;i++){
partial_sum = approx(num_dim,j,weights,arr,i);
sum = sum + arr[i][j]*(arr[i][num_dim]-partial_sum);
}
return sum;
}
float intercept(float arr1[7],double arr[][7],int dim) {
int i;
float sum =0.0;
for (i = 0; i < num_obs; i++) {
sum = sum + pow((arr[i][num_dim]) - approx(num_dim, -1, arr1, arr,
i), 1);
}
return sum;
}
int main(){
double data[num_obs][7];
int i=0,j=0;
float a = 1.0;
float lambda = 0.1; //Setting lambda
float weights[7]; //weights[6] contains the intercept
srand((unsigned int) time(NULL));
/*Generating the data matrix */
for(i=0;i<11;i++)
data[i][0] = ((float)rand()/(float)(RAND_MAX)) * a;
for(i=0;i<11;i++)
for(j=1;j<6;j++)
data[i][j] = pow(data[i][0],j+1);
for(i=0;i<11;i++)
data[i][6] = exp(-pow(data[i][0],2)); // the last column in the
datamatrix contains the y values generated by the dummy function
/*Printing the data matrix */
printf("Data Matrix:\n");
for(i=0;i<11;i++){
for(j=0;j<7;j++){
printf("%lf ",data[i][j]);}
printf("\n");}
printf("\n");
int seed =0;
while(seed<20) {
//Initializing the weight vector
for (i = 0; i < 7; i++)
weights[i] = ((float) rand() / (float) (RAND_MAX)) * a;
int iter = 500;
int t = 0;
int r, l;
double rho[num_dim];
for (i = 0; i < 6; i++) {
rho[i] = rho_j(data, num_obs, r, weights);
}
// Intercept initialization
weights[num_dim] = intercept(weights,data,num_dim);
printf("Weights initialization: ");
for (i = 0; i < (num_dim+1); i++)
printf("%f ", weights[i]);
printf("\n");
while (t < iter) {
for (r = 0; r < num_dim; r++) {
rho[r] = rho_j(data, num_obs, r, weights);
//printf("rho %d:%f ",r,rho[r]);
if (rho[r] < -lambda / 2)
weights[r] = (rho[r] + lambda / 2) / norm_feature(r,
data, num_obs);
else if (rho[r] > lambda / 2)
weights[r] = (rho[r] - lambda / 2) / norm_feature(r,
data, num_obs);
else
weights[r] = 0;
weights[num_dim] = intercept(weights, data, num_dim);
}
/* printf("Iter(%d): ", t);
for (l = 0; l < 7; l++)
printf("%f ", weights[l]);
printf("\n");*/
t++;
}
//printf("\n");
printf("Final Weights: ");
for (i = 0; i < 7; i++)
printf("%f ", weights[i]);
printf("\n");
printf("\n");
seed++;
}
return 0;
}
PseudoCode:

Multi otsu(multi-thresholding) with openCV

I am trying to carry out multi-thresholding with otsu. The method I am using currently is actually via maximising the between class variance, I have managed to get the same threshold value given as that by the OpenCV library. However, that is just via running otsu method once.
Documentation on how to do multi-level thresholding or rather recursive thresholding is rather limited. Where do I do after obtaining the original otsu's value? Would appreciate some hints, I been playing around with the code, adding one external for loop, but the next value calculated is always 254 for any given image:(
My code if need be:
//compute histogram first
cv::Mat imageh; //image edited to grayscale for histogram purpose
//imageh=image; //to delete and uncomment below;
cv::cvtColor(image, imageh, CV_BGR2GRAY);
int histSize[1] = {256}; // number of bins
float hranges[2] = {0.0, 256.0}; // min andax pixel value
const float* ranges[1] = {hranges};
int channels[1] = {0}; // only 1 channel used
cv::MatND hist;
// Compute histogram
calcHist(&imageh, 1, channels, cv::Mat(), hist, 1, histSize, ranges);
IplImage* im = new IplImage(imageh);//assign the image to an IplImage pointer
IplImage* finalIm = cvCreateImage(cvSize(im->width, im->height), IPL_DEPTH_8U, 1);
double otsuThreshold= cvThreshold(im, finalIm, 0, 255, cv::THRESH_BINARY | cv::THRESH_OTSU );
cout<<"opencv otsu gives "<<otsuThreshold<<endl;
int totalNumberOfPixels= imageh.total();
cout<<"total number of Pixels is " <<totalNumberOfPixels<< endl;
float sum = 0;
for (int t=0 ; t<256 ; t++)
{
sum += t * hist.at<float>(t);
}
cout<<"sum is "<<sum<<endl;
float sumB = 0; //sum of background
int wB = 0; // weight of background
int wF = 0; //weight of foreground
float varMax = 0;
int threshold = 0;
//run an iteration to find the maximum value of the between class variance(as between class variance shld be maximise)
for (int t=0 ; t<256 ; t++)
{
wB += hist.at<float>(t); // Weight Background
if (wB == 0) continue;
wF = totalNumberOfPixels - wB; // Weight Foreground
if (wF == 0) break;
sumB += (float) (t * hist.at<float>(t));
float mB = sumB / wB; // Mean Background
float mF = (sum - sumB) / wF; // Mean Foreground
// Calculate Between Class Variance
float varBetween = (float)wB * (float)wF * (mB - mF) * (mB - mF);
// Check if new maximum found
if (varBetween > varMax) {
varMax = varBetween;
threshold = t;
}
}
cout<<"threshold value is: "<<threshold;

To extend Otsu's thresholding method to multi-level thresholding the between class variance equation becomes:
Please check out Deng-Yuan Huang, Ta-Wei Lin, Wu-Chih Hu, Automatic
Multilevel Thresholding Based on Two-Stage Otsu's Method with Cluster
Determination by Valley Estimation, Int. Journal of Innovative
Computing, 2011, 7:5631-5644 for more information.
http://www.ijicic.org/ijicic-10-05033.pdf
Here is my C# implementation of Otsu Multi for 2 thresholds:
/* Otsu (1979) - multi */
Tuple < int, int > otsuMulti(object sender, EventArgs e) {
//image histogram
int[] histogram = new int[256];
//total number of pixels
int N = 0;
//accumulate image histogram and total number of pixels
foreach(int intensity in image.Data) {
if (intensity != 0) {
histogram[intensity] += 1;
N++;
}
}
double W0K, W1K, W2K, M0, M1, M2, currVarB, optimalThresh1, optimalThresh2, maxBetweenVar, M0K, M1K, M2K, MT;
optimalThresh1 = 0;
optimalThresh2 = 0;
W0K = 0;
W1K = 0;
M0K = 0;
M1K = 0;
MT = 0;
maxBetweenVar = 0;
for (int k = 0; k <= 255; k++) {
MT += k * (histogram[k] / (double) N);
}
for (int t1 = 0; t1 <= 255; t1++) {
W0K += histogram[t1] / (double) N; //Pi
M0K += t1 * (histogram[t1] / (double) N); //i * Pi
M0 = M0K / W0K; //(i * Pi)/Pi
W1K = 0;
M1K = 0;
for (int t2 = t1 + 1; t2 <= 255; t2++) {
W1K += histogram[t2] / (double) N; //Pi
M1K += t2 * (histogram[t2] / (double) N); //i * Pi
M1 = M1K / W1K; //(i * Pi)/Pi
W2K = 1 - (W0K + W1K);
M2K = MT - (M0K + M1K);
if (W2K <= 0) break;
M2 = M2K / W2K;
currVarB = W0K * (M0 - MT) * (M0 - MT) + W1K * (M1 - MT) * (M1 - MT) + W2K * (M2 - MT) * (M2 - MT);
if (maxBetweenVar < currVarB) {
maxBetweenVar = currVarB;
optimalThresh1 = t1;
optimalThresh2 = t2;
}
}
}
return new Tuple(optimalThresh1, optimalThresh2);
}
And this is the result I got by thresholding an image scan of soil with the above code:
(T1 = 110, T2 = 147).
Otsu's original paper: "Nobuyuki Otsu, A Threshold Selection Method
from Gray-Level Histogram, IEEE Transactions on Systems, Man, and
Cybernetics, 1979, 9:62-66" also briefly mentions the extension to
Multithresholding.
https://engineering.purdue.edu/kak/computervision/ECE661.08/OTSU_paper.pdf
Hope this helps.

Here is a simple general approach for 'n' thresholds in python (>3.0) :
# developed by- SUJOY KUMAR GOSWAMI
# source paper- https://people.ece.cornell.edu/acharya/papers/mlt_thr_img.pdf
import cv2
import numpy as np
import math
img = cv2.imread('path-to-image')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
a = 0
b = 255
n = 6 # number of thresholds (better choose even value)
k = 0.7 # free variable to take any positive value
T = [] # list which will contain 'n' thresholds
def sujoy(img, a, b):
if a>b:
s=-1
m=-1
return m,s
img = np.array(img)
t1 = (img>=a)
t2 = (img<=b)
X = np.multiply(t1,t2)
Y = np.multiply(img,X)
s = np.sum(X)
m = np.sum(Y)/s
return m,s
for i in range(int(n/2-1)):
img = np.array(img)
t1 = (img>=a)
t2 = (img<=b)
X = np.multiply(t1,t2)
Y = np.multiply(img,X)
mu = np.sum(Y)/np.sum(X)
Z = Y - mu
Z = np.multiply(Z,X)
W = np.multiply(Z,Z)
sigma = math.sqrt(np.sum(W)/np.sum(X))
T1 = mu - k*sigma
T2 = mu + k*sigma
x, y = sujoy(img, a, T1)
w, z = sujoy(img, T2, b)
T.append(x)
T.append(w)
a = T1+1
b = T2-1
k = k*(i+1)
T1 = mu
T2 = mu+1
x, y = sujoy(img, a, T1)
w, z = sujoy(img, T2, b)
T.append(x)
T.append(w)
T.sort()
print(T)
For full paper and more informations visit this link.

I've written an example on how otsu thresholding work in python before. You can see the source code here: https://github.com/subokita/Sandbox/blob/master/otsu.py
In the example there's 2 variants, otsu2() which is the optimised version, as seen on Wikipedia page, and otsu() which is more naive implementation based on the algorithm description itself.
If you are okay in reading python codes (in this case, they are pretty simple, almost pseudo code like), you might want to look at otsu() in the example and modify it. Porting it to C++ code is not hard either.

#Antoni4 gives the best answer in my opinion and it's very straight forward to increase the number of levels.
This is for three-level thresholding:
#include "Shadow01-1.cuh"
void multiThresh(double &optimalThresh1, double &optimalThresh2, double &optimalThresh3, cv::Mat &imgHist, cv::Mat &src)
{
double W0K, W1K, W2K, W3K, M0, M1, M2, M3, currVarB, maxBetweenVar, M0K, M1K, M2K, M3K, MT;
unsigned char *histogram = (unsigned char*)(imgHist.data);
int N = src.rows*src.cols;
W0K = 0;
W1K = 0;
M0K = 0;
M1K = 0;
MT = 0;
maxBetweenVar = 0;
for (int k = 0; k <= 255; k++) {
MT += k * (histogram[k] / (double) N);
}
for (int t1 = 0; t1 <= 255; t1++)
{
W0K += histogram[t1] / (double) N; //Pi
M0K += t1 * (histogram[t1] / (double) N); //i * Pi
M0 = M0K / W0K; //(i * Pi)/Pi
W1K = 0;
M1K = 0;
for (int t2 = t1 + 1; t2 <= 255; t2++)
{
W1K += histogram[t2] / (double) N; //Pi
M1K += t2 * (histogram[t2] / (double) N); //i * Pi
M1 = M1K / W1K; //(i * Pi)/Pi
W2K = 1 - (W0K + W1K);
M2K = MT - (M0K + M1K);
if (W2K <= 0) break;
M2 = M2K / W2K;
W3K = 0;
M3K = 0;
for (int t3 = t2 + 1; t3 <= 255; t3++)
{
W2K += histogram[t3] / (double) N; //Pi
M2K += t3 * (histogram[t3] / (double) N); // i*Pi
M2 = M2K / W2K; //(i*Pi)/Pi
W3K = 1 - (W1K + W2K);
M3K = MT - (M1K + M2K);
M3 = M3K / W3K;
currVarB = W0K * (M0 - MT) * (M0 - MT) + W1K * (M1 - MT) * (M1 - MT) + W2K * (M2 - MT) * (M2 - MT) + W3K * (M3 - MT) * (M3 - MT);
if (maxBetweenVar < currVarB)
{
maxBetweenVar = currVarB;
optimalThresh1 = t1;
optimalThresh2 = t2;
optimalThresh3 = t3;
}
}
}
}
}

#Guilherme Silva
Your code has a BUG
You Must Replace:
W3K = 0;
M3K = 0;
with
W2K = 0;
M2K = 0;
and
W3K = 1 - (W1K + W2K);
M3K = MT - (M1K + M2K);
with
W3K = 1 - (W0K + W1K + W2K);
M3K = MT - (M0K + M1K + M2K);
;-)
Regards
EDIT(1): [Toby Speight]
I discovered this bug by applying the effect to the same picture at different resoultions(Sizes) and seeing that the output results were to much different from each others (Even changing resolution a little bit)
W3K and M3K must be the totals minus the Previous WKs and MKs.
(I thought about this for Code-similarity with the one with one level less)
At the moment due to my lacks of English I cannot explain Better How and Why
To be honest I'm still not 100% sure that this way is correct, even thought from my outputs I could tell that it gives better results. (Even with 1 Level more (5 shades of gray))
You could try yourself ;-)
Sorry
My Outputs:
3 Thresholds
4 Thresholds

I found a useful piece of code in this thread. I was looking for a multi-level Otsu implementation for double/float images. So, I tried to generalize example for N-levels with double/float matrix as input. In my code below I am using armadillo library as dependency. But this code can be easily adapted for standard C++ arrays, just replace vec, uvec objects with single dimensional double and integer arrays, mat and umat with two-dimensional. Two other functions from armadillo used here are: vectorise and hist.
// Input parameters:
// map - input image (double matrix)
// mask - region of interest to be thresholded
// nBins - number of bins
// nLevels - number of Otsu thresholds
#include <armadillo>
#include <algorithm>
#include <vector>
mat OtsuFilterMulti(mat map, int nBins, int nLevels) {
mat mapr; // output thresholded image
mapr = zeros<mat>(map.n_rows, map.n_cols);
unsigned int numElem = 0;
vec threshold = zeros<vec>(nLevels);
vec q = zeros<vec>(nLevels + 1);
vec mu = zeros<vec>(nLevels + 1);
vec muk = zeros<vec>(nLevels + 1);
uvec binv = zeros<uvec>(nLevels);
if (nLevels <= 1) return mapr;
numElem = map.n_rows*map.n_cols;
uvec histogram = hist(vectorise(map), nBins);
double maxval = map.max();
double minval = map.min();
double odelta = (maxval - abs(minval)) / nBins; // distance between histogram bins
vec oval = zeros<vec>(nBins);
double mt = 0, variance = 0.0, bestVariance = 0.0;
for (int ii = 0; ii < nBins; ii++) {
oval(ii) = (double)odelta*ii + (double)odelta*0.5; // centers of histogram bins
mt += (double)ii*((double)histogram(ii)) / (double)numElem;
}
for (int ii = 0; ii < nLevels; ii++) {
binv(ii) = ii;
}
double sq, smuk;
int nComb;
nComb = nCombinations(nBins,nLevels);
std::vector<bool> v(nBins);
std::fill(v.begin(), v.begin() + nLevels, true);
umat ibin = zeros<umat>(nComb, nLevels); // indices from combinations will be stored here
int cc = 0;
int ci = 0;
do {
for (int i = 0; i < nBins; ++i) {
if(ci==nLevels) ci=0;
if (v[i]) {
ibin(cc,ci) = i;
ci++;
}
}
cc++;
} while (std::prev_permutation(v.begin(), v.end()));
uvec lastIndex = zeros<uvec>(nLevels);
// Perform operations on pre-calculated indices
for (int ii = 0; ii < nComb; ii++) {
for (int jj = 0; jj < nLevels; jj++) {
smuk = 0;
sq = 0;
if (lastIndex(jj) != ibin(ii, jj) || ii == 0) {
q(jj) += double(histogram(ibin(ii, jj))) / (double)numElem;
muk(jj) += ibin(ii, jj)*(double(histogram(ibin(ii, jj)))) / (double)numElem;
mu(jj) = muk(jj) / q(jj);
q(jj + 1) = 0.0;
muk(jj + 1) = 0.0;
if (jj>0) {
for (int kk = 0; kk <= jj; kk++) {
sq += q(kk);
smuk += muk(kk);
}
q(jj + 1) = 1 - sq;
muk(jj + 1) = mt - smuk;
mu(jj + 1) = muk(jj + 1) / q(jj + 1);
}
if (jj>0 && jj<(nLevels - 1)) {
q(jj + 1) = 0.0;
muk(jj + 1) = 0.0;
}
lastIndex(jj) = ibin(ii, jj);
}
}
variance = 0.0;
for (int jj = 0; jj <= nLevels; jj++) {
variance += q(jj)*(mu(jj) - mt)*(mu(jj) - mt);
}
if (variance > bestVariance) {
bestVariance = variance;
for (int jj = 0; jj<nLevels; jj++) {
threshold(jj) = oval(ibin(ii, jj));
}
}
}
cout << "Optimized thresholds: ";
for (int jj = 0; jj<nLevels; jj++) {
cout << threshold(jj) << " ";
}
cout << endl;
for (unsigned int jj = 0; jj<map.n_rows; jj++) {
for (unsigned int kk = 0; kk<map.n_cols; kk++) {
for (int ll = 0; ll<nLevels; ll++) {
if (map(jj, kk) >= threshold(ll)) {
mapr(jj, kk) = ll+1;
}
}
}
}
return mapr;
}
int nCombinations(int n, int r) {
if (r>n) return 0;
if (r*2 > n) r = n-r;
if (r == 0) return 1;
int ret = n;
for( int i = 2; i <= r; ++i ) {
ret *= (n-i+1);
ret /= i;
}
return ret;
}

WGS84 Geoid Height Altitude Offset for external GPS data on IOS

For an application I'm writing we are interfacing IOS devices with an external sensor which outputs GPS data over a local wifi network. This data comes across in a "raw" format with respect to altitude. In general all GPS altitude needs to have a correction factor applied related to the WGS84 geoid height based on the current location.
For example, in the following Geo Control Point (http://www.ngs.noaa.gov/cgi-bin/ds_mark.prl?PidBox=HV9830) which resides at Lat 38 56 36.77159 and a Lon 077 01 08.34929
HV9830* NAD 83(2011) POSITION- 38 56 36.77159(N) 077 01 08.34929(W) ADJUSTED
HV9830* NAD 83(2011) ELLIP HT- 42.624 (meters) (06/27/12) ADJUSTED
HV9830* NAD 83(2011) EPOCH - 2010.00
HV9830* NAVD 88 ORTHO HEIGHT - 74.7 (meters) 245. (feet) VERTCON
HV9830 ______________________________________________________________________
HV9830 GEOID HEIGHT - -32.02 (meters) GEOID12A
HV9830 NAD 83(2011) X - 1,115,795.966 (meters) COMP
HV9830 NAD 83(2011) Y - -4,840,360.447 (meters) COMP
HV9830 NAD 83(2011) Z - 3,987,471.457 (meters) COMP
HV9830 LAPLACE CORR - -2.38 (seconds) DEFLEC12A
You can see that the Geoid Height is -32 meters. So given a RAW GPS reading near this point one would have to apply a correction of -32 meters in order to calculate the correct altitude. (Note:corrections are negative so you would actually be subtracting a negative and thus shifting the reading up 32 meters).
As opposed to Android it is our understanding that with regards to coreLocation this GeoidHeight information is automagically calculated internally by IOS. Where we are running into difficulty is that we are using a local wifi network with a sensor that calculates uncorrected GPS and collecting both the external sensor data as well as coreLocation readings for GPS. I was wondering if anybody was aware of a library (C/Objective-C) which has the Geoid information and can help me do these calculations on the fly when I'm reading the raw GPS signal from our sensor package.
Thank you for your help.
Side note: Please don't suggest I look at the following post: Get altitude by longitude and latitude in Android This si a good solution however we do not have a live internet connection so we cannot make a live query to Goole or USGS.

I've gone ahead and solved my problems here. What I did was create an ObjectiveC implementation of a c implementation of fortran code to do what I needed. The original c can be found here: http://sourceforge.net/projects/egm96-f477-c/
You would need to download the project from source forge in order to access the input files required for this code: CORCOEF and EGM96
My objective-c implementation is as follows:
GeoidCalculator.h
#import <Foundation/Foundation.h>
#interface GeoidCalculator : NSObject
+ (GeoidCalculator *)instance;
-(double) getHeightFromLat:(double)lat andLon:(double)lon;
-(double) getCurrentHeightOffset;
-(void) updatePositionWithLatitude:(double)lat andLongitude:(double)lon;
#end
GeoidCalculator.m
#import "GeoidCalculator.h"
#import <stdio.h>
#import <math.h>
#define l_value (65341)
#define _361 (361)
#implementation GeoidCalculator
static int nmax;
static double currentHeight;
static double cc[l_value+ 1], cs[l_value+ 1], hc[l_value+ 1], hs[l_value+ 1],
p[l_value+ 1], sinml[_361+ 1], cosml[_361+ 1], rleg[_361+ 1];
+ (GeoidCalculator *)instance {
static GeoidCalculator *_instance = nil;
#synchronized (self) {
if (_instance == nil) {
_instance = [[self alloc] init];
init_arrays();
currentHeight = -9999;
}
}
return _instance;
}
- (double)getHeightFromLat:(double)lat andLon:(double)lon {
[self updatePositionWithLatitude:lat andLongitude:lon];
return [self getCurrentHeightOffset];
}
- (double)getCurrentHeightOffset {
return currentHeight;
}
- (void)updatePositionWithLatitude:(double)lat andLongitude:(double)lon {
const double rad = 180 / M_PI;
double flat, flon, u;
flat = lat; flon = lon;
/*compute the geocentric latitude,geocentric radius,normal gravity*/
u = undulation(flat / rad, flon / rad, nmax, nmax + 1);
/*u is the geoid undulation from the egm96 potential coefficient model
including the height anomaly to geoid undulation correction term
and a correction term to have the undulations refer to the
wgs84 ellipsoid. the geoid undulation unit is meters.*/
currentHeight = u;
}
double hundu(unsigned nmax, double p[l_value+ 1],
double hc[l_value+ 1], double hs[l_value+ 1],
double sinml[_361+ 1], double cosml[_361+ 1], double gr, double re,
double cc[l_value+ 1], double cs[l_value+ 1]) {/*constants for wgs84(g873);gm in units of m**3/s**2*/
const double gm = .3986004418e15, ae = 6378137.;
double arn, ar, ac, a, b, sum, sumc, sum2, tempc, temp;
int k, n, m;
ar = ae / re;
arn = ar;
ac = a = b = 0;
k = 3;
for (n = 2; n <= nmax; n++) {
arn *= ar;
k++;
sum = p[k] * hc[k];
sumc = p[k] * cc[k];
sum2 = 0;
for (m = 1; m <= n; m++) {
k++;
tempc = cc[k] * cosml[m] + cs[k] * sinml[m];
temp = hc[k] * cosml[m] + hs[k] * sinml[m];
sumc += p[k] * tempc;
sum += p[k] * temp;
}
ac += sumc;
a += sum * arn;
}
ac += cc[1] + p[2] * cc[2] + p[3] * (cc[3] * cosml[1] + cs[3] * sinml[1]);
/*add haco=ac/100 to convert height anomaly on the ellipsoid to the undulation
add -0.53m to make undulation refer to the wgs84 ellipsoid.*/
return a * gm / (gr * re) + ac / 100 - .53;
}
void dscml(double rlon, unsigned nmax, double sinml[_361+ 1], double cosml[_361+ 1]) {
double a, b;
int m;
a = sin(rlon);
b = cos(rlon);
sinml[1] = a;
cosml[1] = b;
sinml[2] = 2 * b * a;
cosml[2] = 2 * b * b - 1;
for (m = 3; m <= nmax; m++) {
sinml[m] = 2 * b * sinml[m - 1] - sinml[m - 2];
cosml[m] = 2 * b * cosml[m - 1] - cosml[m - 2];
}
}
void dhcsin(unsigned nmax, double hc[l_value+ 1], double hs[l_value+ 1]) {
// potential coefficient file
//f_12 = fopen("EGM96", "rb");
NSString* path2 = [[NSBundle mainBundle] pathForResource:#"EGM96" ofType:#""];
FILE* f_12 = fopen(path2.UTF8String, "rb");
if (f_12 == NULL) {
NSLog([path2 stringByAppendingString:#" not found"]);
}
int n, m;
double j2, j4, j6, j8, j10, c, s, ec, es;
/*the even degree zonal coefficients given below were computed for the
wgs84(g873) system of constants and are identical to those values
used in the NIMA gridding procedure. computed using subroutine
grs written by N.K. PAVLIS*/
j2 = 0.108262982131e-2;
j4 = -.237091120053e-05;
j6 = 0.608346498882e-8;
j8 = -0.142681087920e-10;
j10 = 0.121439275882e-13;
m = ((nmax + 1) * (nmax + 2)) / 2;
for (n = 1; n <= m; n++)hc[n] = hs[n] = 0;
while (6 == fscanf(f_12, "%i %i %lf %lf %lf %lf", &n, &m, &c, &s, &ec, &es)) {
if (n > nmax)continue;
n = (n * (n + 1)) / 2 + m + 1;
hc[n] = c;
hs[n] = s;
}
hc[4] += j2 / sqrt(5);
hc[11] += j4 / 3;
hc[22] += j6 / sqrt(13);
hc[37] += j8 / sqrt(17);
hc[56] += j10 / sqrt(21);
fclose(f_12);
}
void legfdn(unsigned m, double theta, double rleg[_361+ 1], unsigned nmx)
/*this subroutine computes all normalized legendre function
in "rleg". order is always
m, and colatitude is always theta (radians). maximum deg
is nmx. all calculations in double precision.
ir must be set to zero before the first call to this sub.
the dimensions of arrays rleg must be at least equal to nmx+1.
Original programmer :Oscar L. Colombo, Dept. of Geodetic Science
the Ohio State University, August 1980
ineiev: I removed the derivatives, for they are never computed here*/
{
static double drts[1301], dirt[1301], cothet, sithet, rlnn[_361+ 1];
static int ir;
int nmx1 = nmx + 1, nmx2p = 2 * nmx + 1, m1 = m + 1, m2 = m + 2, m3 = m + 3, n, n1, n2;
if (!ir) {
ir = 1;
for (n = 1; n <= nmx2p; n++) {
drts[n] = sqrt(n);
dirt[n] = 1 / drts[n];
}
}
cothet = cos(theta);
sithet = sin(theta);
/*compute the legendre functions*/
rlnn[1] = 1;
rlnn[2] = sithet * drts[3];
for (n1 = 3; n1 <= m1; n1++) {
n = n1 - 1;
n2 = 2 * n;
rlnn[n1] = drts[n2 + 1] * dirt[n2] * sithet * rlnn[n];
}
switch (m) {
case 1:
rleg[2] = rlnn[2];
rleg[3] = drts[5] * cothet * rleg[2];
break;
case 0:
rleg[1] = 1;
rleg[2] = cothet * drts[3];
break;
}
rleg[m1] = rlnn[m1];
if (m2 <= nmx1) {
rleg[m2] = drts[m1 * 2 + 1] * cothet * rleg[m1];
if (m3 <= nmx1)
for (n1 = m3; n1 <= nmx1; n1++) {
n = n1 - 1;
if ((!m && n < 2) || (m == 1 && n < 3))continue;
n2 = 2 * n;
rleg[n1] = drts[n2 + 1] * dirt[n + m] * dirt[n - m] *
(drts[n2 - 1] * cothet * rleg[n1 - 1] - drts[n + m - 1] * drts[n - m - 1] * dirt[n2 - 3] * rleg[n1 - 2]);
}
}
}
void radgra(double lat, double lon, double *rlat, double *gr, double *re)
/*this subroutine computes geocentric distance to the point,
the geocentric latitude,and
an approximate value of normal gravity at the point based
the constants of the wgs84(g873) system are used*/
{
const double a = 6378137., e2 = .00669437999013, geqt = 9.7803253359, k = .00193185265246;
double n, t1 = sin(lat) * sin(lat), t2, x, y, z;
n = a / sqrt(1 - e2 * t1);
t2 = n * cos(lat);
x = t2 * cos(lon);
y = t2 * sin(lon);
z = (n * (1 - e2)) * sin(lat);
*re = sqrt(x * x + y * y + z * z);/*compute the geocentric radius*/
*rlat = atan(z / sqrt(x * x + y * y));/*compute the geocentric latitude*/
*gr = geqt * (1 + k * t1) / sqrt(1 - e2 * t1);/*compute normal gravity:units are m/sec**2*/
}
double undulation(double lat, double lon, int nmax, int k) {
double rlat, gr, re;
int i, j, m;
radgra(lat, lon, &rlat, &gr, &re);
rlat = M_PI / 2 - rlat;
for (j = 1; j <= k; j++) {
m = j - 1;
legfdn(m, rlat, rleg, nmax);
for (i = j; i <= k; i++)p[(i - 1) * i / 2 + m + 1] = rleg[i];
}
dscml(lon, nmax, sinml, cosml);
return hundu(nmax, p, hc, hs, sinml, cosml, gr, re, cc, cs);
}
void init_arrays(void) {
int ig, i, n, m;
double t1, t2;
NSString* path1 = [[NSBundle mainBundle] pathForResource:#"CORCOEF" ofType:#""];
//correction coefficient file: modified with 'sed -e"s/D/e/g"' to be read with fscanf
FILE* f_1 = fopen([path1 cStringUsingEncoding:1], "rb");
if (f_1 == NULL) {
NSLog([path1 stringByAppendingString:#" not found"]);
}
nmax = 360;
for (i = 1; i <= l_value; i++)cc[i] = cs[i] = 0;
while (4 == fscanf(f_1, "%i %i %lg %lg", &n, &m, &t1, &t2)) {
ig = (n * (n + 1)) / 2 + m + 1;
cc[ig] = t1;
cs[ig] = t2;
}
/*the correction coefficients are now read in*/
/*the potential coefficients are now read in and the reference
even degree zonal harmonic coefficients removed to degree 6*/
dhcsin(nmax, hc, hs);
fclose(f_1);
}
#end
I've done some limited testing against the Geoid Height Calculator (http://www.unavco.org/community_science/science-support/geoid/geoid.html) and looks like everything is a match
UPDATE iOS8 or Greater
As of IOS8 This code might not work correctly. You may need to change how the bundle is loaded:
[[NSBundle mainBundle] pathForResource:#"EGM96" ofType:#""];
Do some googling or add a comment here.

Impressive stuff Jeef! I just used your code to create this sqlite which may be easier to add/use in a project, assuming integer precision for lat/lon is good enough:
https://github.com/vectorstofinal/geoid_heights

You could use GeoTrans.
Provided by http://earth-info.nga.mil/GandG/geotrans/index.html
The keyword is "vertical datum". So you want to convert from WGS84 to e.g EGM96 vertical datum. Make sure which Geoid modell you want to use. EGM96 is one of that.
Maybe these answer help you, too:
How to calculate the altitude above from mean sea level
Next read the ios Open Source License Text: Available in
Settings -> General -> About -> Legal -> License ...
There you get a list of all libs that ios uses. One of them I found was the calculation of magnetic decilination usung a sw of USGS. Chances are verry high that the Geoid height calculation is listed there too.

Pitch Shifting on Blackberry

I'm trying to change pitch of some recorded sound using PitchShifter.java class below:
package mypackage;
import net.rim.device.api.util.MathUtilities;
//package com.course.android.voicechanger;
//import android.util.Log;
/****************************************************************************
*
* NAME: PitchShift.cs
* VERSION: 1.2
* HOME URL: http://www.dspdimension.com
* KNOWN BUGS: none
*
* SYNOPSIS: Routine for doing pitch shifting while maintaining
* duration using the Short Time Fourier Transform.
*
* DESCRIPTION: The routine takes a pitchShift factor value which is between 0.5
* (one octave down) and 2. (one octave up). A value of exactly 1 does not change
* the pitch. numSampsToProcess tells the routine how many samples in indata[0...
* numSampsToProcess-1] should be pitch shifted and moved to outdata[0 ...
* numSampsToProcess-1]. The two buffers can be identical (ie. it can process the
* data in-place). fftFrameSize defines the FFT frame size used for the
* processing. Typical values are 1024, 2048 and 4096. It may be any value <=
* MAX_FRAME_LENGTH but it MUST be a power of 2. osamp is the STFT
* oversampling factor which also determines the overlap between adjacent STFT
* frames. It should at least be 4 for moderate scaling ratios. A value of 32 is
* recommended for best quality. sampleRate takes the sample rate for the signal
* in unit Hz, ie. 44100 for 44.1 kHz audio. The data passed to the routine in
* indata[] should be in the range [-1.0, 1.0), which is also the output range
* for the data, make sure you scale the data accordingly (for 16bit signed integers
* you would have to divide (and multiply) by 32768).
*
* COPYRIGHT 1999-2006 Stephan M. Bernsee <smb [AT] dspdimension [DOT] com>
*
* The Wide Open License (WOL)
*
* Permission to use, copy, modify, distribute and sell this software and its
* documentation for any purpose is hereby granted without fee, provided that
* the above copyright notice and this license appear in all source copies.
* THIS SOFTWARE IS PROVIDED "AS IS" WITHOUT EXPRESS OR IMPLIED WARRANTY OF
* ANY KIND. See www.dspguru.com/wol.htm for more information.
*
*****************************************************************************/
/****************************************************************************
*
* This code was converted to C# by Michael Knight madmik3 at gmail dot com. sites.google.com/site/mikescoderama/
*
*****************************************************************************/
public class PitchShifter2 {
private static int MAX_FRAME_LENGTH = 16000;
private static float[] gInFIFO = new float[MAX_FRAME_LENGTH];
private static float[] gOutFIFO = new float[MAX_FRAME_LENGTH];
private static float[] gFFTworksp = new float[2 * MAX_FRAME_LENGTH];
private static float[] gLastPhase = new float[MAX_FRAME_LENGTH / 2 + 1];
private static float[] gSumPhase = new float[MAX_FRAME_LENGTH / 2 + 1];
private static float[] gOutputAccum = new float[2 * MAX_FRAME_LENGTH];
private static float[] gAnaFreq = new float[MAX_FRAME_LENGTH];
private static float[] gAnaMagn = new float[MAX_FRAME_LENGTH];
private static float[] gSynFreq = new float[MAX_FRAME_LENGTH];
private static float[] gSynMagn = new float[MAX_FRAME_LENGTH];
private static long gRover, gInit;
public static void PitchShift2(float pitchShift, long numSampsToProcess, float sampleRate, float[] indata) {
// PitchShift2(pitchShift, numSampsToProcess, (long) 256, (long) 10, sampleRate, indata);
PitchShift2(pitchShift, numSampsToProcess, (long) 1024, (long) 32, sampleRate, indata);
}
public static void PitchShift2(float pitchShift, long numSampsToProcess, long fftFrameSize, long osamp, float sampleRate, float[] indata) {
double magn, phase, tmp, window, real, imag;
double freqPerBin, expct;
long i, k, qpd, index, inFifoLatency, stepSize, fftFrameSize2;
float[] outdata = indata;
/* set up some handy variables */
fftFrameSize2 = fftFrameSize / 2;
stepSize = fftFrameSize / osamp;
freqPerBin = sampleRate / (double) fftFrameSize;
expct = 2.0 * Math.PI * (double) stepSize / (double) fftFrameSize;
inFifoLatency = fftFrameSize - stepSize;
if (gRover == 0)
gRover = inFifoLatency;
int c = 0;
int round = 0;
/* main processing loop */
for (i = 0; i < numSampsToProcess; i++) {
/* As long as we have not yet collected enough data just read in */
gInFIFO[(int) gRover] = indata[(int) i];
outdata[(int) i] = gOutFIFO[(int) (gRover - inFifoLatency)];
gRover++;
/* now we have enough data for processing */
if (gRover >= fftFrameSize) {
c++;
if (c > 100) {
// Log.d("Liwei", "round= " + round++);
System.out.println("PitchShifter2.PitchShift(.....................): Liwei" + "round= " + round++);
c = 0;
}
gRover = inFifoLatency;
/* do windowing and re,im interleave */
for (k = 0; k < fftFrameSize; k++) {
window = -.5 * Math.cos(2.0 * Math.PI * (double) k / (double) fftFrameSize) + .5;
gFFTworksp[(int) (2 * k)] = (float) (gInFIFO[(int) k] * window);
gFFTworksp[(int) (2 * k + 1)] = 0.0F;
}
/* ***************** ANALYSIS ******************* */
/* do transform */
ShortTimeFourierTransform(gFFTworksp, fftFrameSize, -1);
/* this is the analysis step */
for (k = 0; k <= fftFrameSize2; k++) {
/* de-interlace FFT buffer */
real = gFFTworksp[(int) (2 * k)];
imag = gFFTworksp[(int) (2 * k + 1)];
/* compute magnitude and phase */
magn = 2.0 * Math.sqrt(real * real + imag * imag);
phase = MathUtilities.atan2(imag, real);
/* compute phase difference */
tmp = phase - gLastPhase[(int) k];
gLastPhase[(int) k] = (float) phase;
/* subtract expected phase difference */
tmp -= (double) k * expct;
/* map delta phase into +/- Pi interval */
qpd = (long) (tmp / Math.PI);
if (qpd >= 0)
qpd += qpd & 1;
else
qpd -= qpd & 1;
tmp -= Math.PI * (double) qpd;
/* get deviation from bin frequency from the +/- Pi interval */
tmp = osamp * tmp / (2.0 * Math.PI);
/* compute the k-th partials' true frequency */
tmp = (double) k * freqPerBin + tmp * freqPerBin;
/* store magnitude and true frequency in analysis arrays */
gAnaMagn[(int) k] = (float) magn;
gAnaFreq[(int) k] = (float) tmp;
}
/* ***************** PROCESSING ******************* */
/* this does the actual pitch shifting */
for (int zero = 0; zero < fftFrameSize; zero++) {
gSynMagn[zero] = 0;
gSynFreq[zero] = 0;
}
for (k = 0; k <= fftFrameSize2; k++) {
index = (long) (k * pitchShift);
if (index <= fftFrameSize2) {
gSynMagn[(int) index] += gAnaMagn[(int) k];
gSynFreq[(int) index] = gAnaFreq[(int) k] * pitchShift;
}
}
/* ***************** SYNTHESIS ******************* */
/* this is the synthesis step */
for (k = 0; k <= fftFrameSize2; k++) {
/* get magnitude and true frequency from synthesis arrays */
magn = gSynMagn[(int) k];
tmp = gSynFreq[(int) k];
/* subtract bin mid frequency */
tmp -= (double) k * freqPerBin;
/* get bin deviation from freq deviation */
tmp /= freqPerBin;
/* take osamp into account */
tmp = 2.0 * Math.PI * tmp / osamp;
/* add the overlap phase advance back in */
tmp += (double) k * expct;
/* accumulate delta phase to get bin phase */
gSumPhase[(int) k] += (float) tmp;
phase = gSumPhase[(int) k];
/* get real and imag part and re-interleave */
gFFTworksp[(int) (2 * k)] = (float) (magn * Math.cos(phase));
gFFTworksp[(int) (2 * k + 1)] = (float) (magn * Math.sin(phase));
}
/* zero negative frequencies */
for (k = fftFrameSize + 2; k < 2 * fftFrameSize; k++)
gFFTworksp[(int) k] = 0.0F;
/* do inverse transform */
ShortTimeFourierTransform(gFFTworksp, fftFrameSize, 1);
/* do windowing and add to output accumulator */
for (k = 0; k < fftFrameSize; k++) {
window = -.5 * Math.cos(2.0 * Math.PI * (double) k / (double) fftFrameSize) + .5;
gOutputAccum[(int) k] += (float) (2.0 * window * gFFTworksp[(int) (2 * k)] / (fftFrameSize2 * osamp));
}
for (k = 0; k < stepSize; k++)
gOutFIFO[(int) k] = gOutputAccum[(int) k];
/* shift accumulator */
// memmove(gOutputAccum, gOutputAccum + stepSize, fftFrameSize *
// sizeof(float));
for (k = 0; k < fftFrameSize; k++) {
gOutputAccum[(int) k] = gOutputAccum[(int) (k + stepSize)];
}
/* move input FIFO */
for (k = 0; k < inFifoLatency; k++)
gInFIFO[(int) k] = gInFIFO[(int) (k + stepSize)];
}
}
}
public static void ShortTimeFourierTransform(float[] fftBuffer, long fftFrameSize, long sign) {
float wr, wi, arg, temp;
float tr, ti, ur, ui;
long i, bitm, j, le, le2, k;
for (i = 2; i < 2 * fftFrameSize - 2; i += 2) {
for (bitm = 2, j = 0; bitm < 2 * fftFrameSize; bitm <<= 1) {
if ((i & bitm) != 0)
j++;
j <<= 1;
}
if (i < j) {
temp = fftBuffer[(int) i];
fftBuffer[(int) i] = fftBuffer[(int) j];
fftBuffer[(int) j] = temp;
temp = fftBuffer[(int) (i + 1)];
fftBuffer[(int) (i + 1)] = fftBuffer[(int) (j + 1)];
fftBuffer[(int) (j + 1)] = temp;
// temp = fftBuffer[i];
// fftBuffer[i] = fftBuffer[j];
// fftBuffer[j] = temp;
// temp = fftBuffer[i + 1];
// fftBuffer[i + 1] = fftBuffer[j + 1];
// fftBuffer[j + 1] = temp;
}
long max = (long) (MathUtilities.log(fftFrameSize) / MathUtilities.log(2.0) + .5);
for (k = 0, le = 2; k < max; k++) {
le <<= 1;
le2 = le >> 1;
ur = 1.0F;
ui = 0.0F;
arg = (float) Math.PI / (le2 >> 1);
wr = (float) Math.cos(arg);
wi = (float) (sign * Math.sin(arg));
for (j = 0; j < le2; j += 2) {
for (i = j; i < 2 * fftFrameSize; i += le) {
tr = fftBuffer[(int) (i + le2)] * ur - fftBuffer[(int) (i + le2 + 1)] * ui;
ti = fftBuffer[(int) (i + le2)] * ui + fftBuffer[(int) (i + le2 + 1)] * ur;
fftBuffer[(int) (i + le2)] = fftBuffer[(int) i] - tr;
fftBuffer[(int) (i + le2 + 1)] = fftBuffer[(int) (i + 1)] - ti;
fftBuffer[(int) i] += tr;
fftBuffer[(int) (i + 1)] += ti;
// tr = fftBuffer[i + le2] * ur - fftBuffer[i + le2 + 1]
// * ui;
// ti = fftBuffer[i + le2] * ui + fftBuffer[i + le2 + 1]
// * ur;
// fftBuffer[i + le2] = fftBuffer[i] - tr;
// fftBuffer[i + le2 + 1] = fftBuffer[i + 1] - ti;
// fftBuffer[i] += tr;
// fftBuffer[i + 1] += ti;
}
tr = ur * wr - ui * wi;
ui = ur * wi + ui * wr;
ur = tr;
}
}
}
}
}
I get byte array from sound file, convert it into float array and pass to PitchShift2() and
after that I convert that float array to byte array, form a stream from byte array and pass it to the player.
but it gives an exception "Unsupported file format".
I have also taken care of Endianness while converting bytes to floats and vise-versa.
can anyone tell me how to use this class properly.

Search for lines with a small range of angles in OpenCV

I'm using the Hough transform in OpenCV to detect lines. However, I know in advance that I only need lines within a very limited range of angles (about 10 degrees or so). I'm doing this in a very performance sensitive setting, so I'd like to avoid the extra work spent detecting lines at other angles, lines I know in advance I don't care about.
I could extract the Hough source from OpenCV and just hack it to take min_rho and max_rho parameters, but I'd like a less fragile approach (have to manually update my code w/ each OpenCV update, etc.).
What's the best approach here?

Well, i've modified the icvHoughlines function to go for a certain range of angles. I'm sure there's cleaner ways that plays with memory allocation as well, but I got a speed gain going from 100ms to 33ms for a range of angle going from 180deg to 60deg, so i'm happy with that.
Note that this code also outputs the accumulator value. Also, I only output 1 line because that fit my purposes but there was no gain really there.
static void
icvHoughLinesStandard2( const CvMat* img, float rho, float theta,
int threshold, CvSeq *lines, int linesMax )
{
cv::AutoBuffer<int> _accum, _sort_buf;
cv::AutoBuffer<float> _tabSin, _tabCos;
const uchar* image;
int step, width, height;
int numangle, numrho;
int total = 0;
float ang;
int r, n;
int i, j;
float irho = 1 / rho;
double scale;
CV_Assert( CV_IS_MAT(img) && CV_MAT_TYPE(img->type) == CV_8UC1 );
image = img->data.ptr;
step = img->step;
width = img->cols;
height = img->rows;
numangle = cvRound(CV_PI / theta);
numrho = cvRound(((width + height) * 2 + 1) / rho);
_accum.allocate((numangle+2) * (numrho+2));
_sort_buf.allocate(numangle * numrho);
_tabSin.allocate(numangle);
_tabCos.allocate(numangle);
int *accum = _accum, *sort_buf = _sort_buf;
float *tabSin = _tabSin, *tabCos = _tabCos;
memset( accum, 0, sizeof(accum[0]) * (numangle+2) * (numrho+2) );
// find n and ang limits (in our case we want 60 to 120
float limit_min = 60.0/180.0*PI;
float limit_max = 120.0/180.0*PI;
//num_steps = (limit_max - limit_min)/theta;
int start_n = floor(limit_min/theta);
int stop_n = floor(limit_max/theta);
for( ang = limit_min, n = start_n; n < stop_n; ang += theta, n++ )
{
tabSin[n] = (float)(sin(ang) * irho);
tabCos[n] = (float)(cos(ang) * irho);
}
// stage 1. fill accumulator
for( i = 0; i < height; i++ )
for( j = 0; j < width; j++ )
{
if( image[i * step + j] != 0 )
//
for( n = start_n; n < stop_n; n++ )
{
r = cvRound( j * tabCos[n] + i * tabSin[n] );
r += (numrho - 1) / 2;
accum[(n+1) * (numrho+2) + r+1]++;
}
}
int max_accum = 0;
int max_ind = 0;
for( r = 0; r < numrho; r++ )
{
for( n = start_n; n < stop_n; n++ )
{
int base = (n+1) * (numrho+2) + r+1;
if (accum[base] > max_accum)
{
max_accum = accum[base];
max_ind = base;
}
}
}
CvLinePolar2 line;
scale = 1./(numrho+2);
int idx = max_ind;
n = cvFloor(idx*scale) - 1;
r = idx - (n+1)*(numrho+2) - 1;
line.rho = (r - (numrho - 1)*0.5f) * rho;
line.angle = n * theta;
line.votes = accum[idx];
cvSeqPush( lines, &line );
}

If you use the Probabilistic Hough transform then the output is in the form of a cvPoint each for lines[0] and lines[1] parameters. We can get x and y co-ordinated for each of the two points by pt1.x, pt1.y and pt2.x and pt2.y.
Then use the simple formula for finding slope of a line - (y2-y1)/(x2-x1). Taking arctan (tan inverse) of that will yield that angle in radians. Then simply filter out desired angles from the values for each hough line obtained.

I think it's more natural to use standart HoughLines(...) function, which gives collection of lines directly in rho and theta terms and select nessessary angle range from it, rather than recalculate angle from segment end points.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Need help in Parallelizing if and else condition in CUDA C program - image-processing

Related

Implementation of LASSO in C

Multi otsu(multi-thresholding) with openCV

WGS84 Geoid Height Altitude Offset for external GPS data on IOS

Pitch Shifting on Blackberry

Search for lines with a small range of angles in OpenCV

Categories

Resources