EXC_BAD_ACCESS on double array - ios

writing an objective c (with c low level functionality) function to detect pitch. Getting a EXC_BAD_ACCESS code=2 on the line "w[i] = i;" in the first for loop. In our iteration, numberofSamples = 32768. The Error occurs in the first loop after i = 16332. For some reason after filling about half of the array, the pointer is no longer valid. Single threaded app, nothing else should be messing around with stuff while this is running. I was wondering if there were any fundamental errors in this setup that could cause this pointer error. I can provide more detail as requested. Thanks!
NSArray* getMainFrequency(NSMutableArray *input,int numberOfSamples){
const int log2n = log(numberOfSamples)/log(2);
const int n = 1 << log2n;
const int nOver2 = n / 2;
int sFreq=44100;
double fFreq=0,maxMag=0,realFreq=0,t,alpha=0.6;
double mag[n/2+1],dblInput[numberOfSamples],w[numberOfSamples];
FFTSetupD fftSetup = vDSP_create_fftsetupD (log2n, kFFTRadix2);
for(int i=0;i<numberOfSamples;i++){
dblInput[i]=[input[i] doubleValue];
w[i] = i;
}
DSPDoubleSplitComplex fft_data;
fft_data.realp = malloc(nOver2 * sizeof(double));
fft_data.imagp = malloc(nOver2 * sizeof(double));
vDSP_ctozD((DSPDoubleComplex *)dblInput, 2, &fft_data, 1, nOver2);
vDSP_fft_zripD (fftSetup, &fft_data, 1, log2n, kFFTDirection_Forward);
for (int i = 0; i < nOver2; ++i){
fft_data.realp[i] *= 0.5;
fft_data.imagp[i] *= 0.5;
}
for (int i = 48; i < 2974; ++i){
mag[i]=pow((pow(fft_data.realp[i],2)+pow(fft_data.imagp[i],2)),0.5);
}
for(int i=48;i <= 2974;i++){ //to get a max of 4000 hZ hopefully?
if (mag[i]>maxMag){
maxMag = mag[i];
fFreq = i;
}
}
realFreq=fFreq/(numberOfSamples/2+1)*sFreq/2;
return [NSArray arrayWithObjects: #(maxMag),#(realFreq),nil];
}
}

Related

Implementation of LASSO in C

I am trying to understand the LASSO algorithm for linear regression. I have implemented the algorithm using naive coordinate descent method for optimization. However the coefficients that I obtained from my code, wasn't matching with those obtained from the 'glmnet'package for LASSO in R. I wanted to understand how I could make the algorithm more accurate, so that the coefficients match with those obtained from R. I think they use coordinate descent as well.
Note: I have generated some toy data with 11 observations, and 6
features(x,x^2 ,x^3,...,x^6). The last column contains the y values
generated from a dummy function (e^(-x^2)). I wanted to use LASSO to
estimate this function. Also, I have randomly picked the initial
weight vector, multiple times to crosscheck my results.
Here is my code:
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<math.h>
#include<time.h>
int num_dim = 6;
int num_obs = 11;
/*Computes the normalization factor*/
float norm_feature(int j,double arr[][7],int n){
float sum = 0.0;
int i;
for(i=0;i<n;i++){
sum = sum + pow(arr[i][j],2);
}
return sum;
}
/*Computes the partial sum*/
float approx(int dim,int d_ignore,float weights[],double arr[][7],int
i){
int flag = 1;
if(d_ignore == -1)
flag = 0;
int j;
float sum = 0.0;
for(j=0;j<dim;j++){
if(j != d_ignore)
sum = sum + weights[j]*arr[i][j];
else
continue;
}
return sum;
}
/* Computes rho-j */
float rho_j(double arr[][7],int n,int j,float weights[7]){
float sum = 0.0;
int i;
float partial_sum ;
for(i=0;i<n;i++){
partial_sum = approx(num_dim,j,weights,arr,i);
sum = sum + arr[i][j]*(arr[i][num_dim]-partial_sum);
}
return sum;
}
float intercept(float arr1[7],double arr[][7],int dim) {
int i;
float sum =0.0;
for (i = 0; i < num_obs; i++) {
sum = sum + pow((arr[i][num_dim]) - approx(num_dim, -1, arr1, arr,
i), 1);
}
return sum;
}
int main(){
double data[num_obs][7];
int i=0,j=0;
float a = 1.0;
float lambda = 0.1; //Setting lambda
float weights[7]; //weights[6] contains the intercept
srand((unsigned int) time(NULL));
/*Generating the data matrix */
for(i=0;i<11;i++)
data[i][0] = ((float)rand()/(float)(RAND_MAX)) * a;
for(i=0;i<11;i++)
for(j=1;j<6;j++)
data[i][j] = pow(data[i][0],j+1);
for(i=0;i<11;i++)
data[i][6] = exp(-pow(data[i][0],2)); // the last column in the
datamatrix contains the y values generated by the dummy function
/*Printing the data matrix */
printf("Data Matrix:\n");
for(i=0;i<11;i++){
for(j=0;j<7;j++){
printf("%lf ",data[i][j]);}
printf("\n");}
printf("\n");
int seed =0;
while(seed<20) {
//Initializing the weight vector
for (i = 0; i < 7; i++)
weights[i] = ((float) rand() / (float) (RAND_MAX)) * a;
int iter = 500;
int t = 0;
int r, l;
double rho[num_dim];
for (i = 0; i < 6; i++) {
rho[i] = rho_j(data, num_obs, r, weights);
}
// Intercept initialization
weights[num_dim] = intercept(weights,data,num_dim);
printf("Weights initialization: ");
for (i = 0; i < (num_dim+1); i++)
printf("%f ", weights[i]);
printf("\n");
while (t < iter) {
for (r = 0; r < num_dim; r++) {
rho[r] = rho_j(data, num_obs, r, weights);
//printf("rho %d:%f ",r,rho[r]);
if (rho[r] < -lambda / 2)
weights[r] = (rho[r] + lambda / 2) / norm_feature(r,
data, num_obs);
else if (rho[r] > lambda / 2)
weights[r] = (rho[r] - lambda / 2) / norm_feature(r,
data, num_obs);
else
weights[r] = 0;
weights[num_dim] = intercept(weights, data, num_dim);
}
/* printf("Iter(%d): ", t);
for (l = 0; l < 7; l++)
printf("%f ", weights[l]);
printf("\n");*/
t++;
}
//printf("\n");
printf("Final Weights: ");
for (i = 0; i < 7; i++)
printf("%f ", weights[i]);
printf("\n");
printf("\n");
seed++;
}
return 0;
}
PseudoCode:

Why are my frequency values for iPhone FFT incorrect?

I've been trying to get exact frequencies using the FFT in Apple's Accelerate framework, but I'm having trouble working out why my values are off the true frequency.
I have been using this article http://www.dspdimension.com/admin/pitch-shifting-using-the-ft/ as the basis for my implementation, and after really struggling to get to the point I'm at now, I am totally stumped.
So far I've got audio in -> Hanning window -> FFT -> phase calculation -> weird final output. I'd think that there will be a problem with my maths somewhere, but I'm really out of ideas by now.
The outputs are a lot lower what they should be, e.g., I input 440Hz and it prints out 190Hz, or I input 880Hz and it prints out 400Hz. For the most part these results are consistent, but not always, and there doesn't seem to be any common factor between anything either...
Here is my code:
enum
{
sampleRate = 44100,
osamp = 4,
samples = 4096,
range = samples * 7 / 16,
step = samples / osamp
};
NSMutableArray *fftResults;
static COMPLEX_SPLIT A;
static FFTSetup setupReal;
static uint32_t log2n, n, nOver2;
static int32_t stride;
static float expct = 2*M_PI*((double)step/(double)samples);
static float phase1[range];
static float phase2[range];
static float dPhase[range];
- (void)fftSetup
{
// Declaring integers
log2n = 12;
n = 1 << log2n;
stride = 1;
nOver2 = n / 2;
// Allocating memory for complex vectors
A.realp = (float *) malloc(nOver2 * sizeof(float));
A.imagp = (float *) malloc(nOver2 * sizeof(float));
// Allocating memory for FFT
setupReal = vDSP_create_fftsetup(log2n, FFT_RADIX2);
// Setting phase
memset(phase2, 0, range * sizeof(float));
}
// For each sample in buffer...
for (int bufferCount = 0; bufferCount < audioBufferList.mNumberBuffers; bufferCount++)
{
// Declaring samples from audio buffer list
SInt16 *samples = (SInt16*)audioBufferList.mBuffers[bufferCount].mData;
// Creating Hann window function
for (int i = 0; i < nOver2; i++)
{
double hannMultiplier = 0.5 * (1 - cos((2 * M_PI * i) / (nOver2 - 1)));
// Applying window to each sample
A.realp[i] = hannMultiplier * samples[i];
A.imagp[i] = 0;
}
// Applying FFT
vDSP_fft_zrip(setupReal, &A, stride, log2n, FFT_FORWARD);
// Detecting phase
vDSP_zvphas(&A, stride, phase1, stride, range);
// Calculating phase difference
vDSP_vsub(phase2, stride, phase1, stride, dPhase, stride, range);
// Saving phase
memcpy(phase2, phase1, range * sizeof(float));
// Extracting DSP outputs
for (size_t j = 0; j < nOver2; j++)
{
NSNumber *realNumbers = [NSNumber numberWithFloat:A.realp[j]];
NSNumber *imagNumbers = [NSNumber numberWithFloat:A.imagp[j]];
[real addObject:realNumbers];
[imag addObject:imagNumbers];
}
// Combining real and imaginary parts
[resultsCombined addObject:real];
[resultsCombined addObject:imag];
// Filling FFT output array
[fftResults addObject:resultsCombined];
}
}
int fftCount = [fftResults count];
NSLog(#"FFT Count: %d",fftCount);
// For each FFT...
for (int i = 0; i < fftCount; i++)
{
// Declaring integers for peak detection
float peak = 0;
float binNumber = 0;
// Declaring integers for phase detection
float deltaPhase;
static float trueFrequency[range];
for (size_t j = 1; j < range; j++)
{
// Calculating bin magnitiude
float realVal = [[[[fftResults objectAtIndex:i] objectAtIndex:0] objectAtIndex:j] floatValue];
float imagVal = [[[[fftResults objectAtIndex:i] objectAtIndex:1] objectAtIndex:j] floatValue];
float magnitude = sqrtf(realVal*realVal + imagVal*imagVal);
// Peak detection
if (magnitude > peak)
{
peak = magnitude;
binNumber = (float)j;
}
// Getting phase difference
deltaPhase = dPhase[j];
// Subtract expected difference
deltaPhase -= (float)j*expct;
// Map phase difference into +/- pi interval
int qpd = deltaPhase / M_PI;
if (qpd >= 0)
qpd += qpd&1;
else
qpd -= qpd&1;
deltaPhase -= M_PI * (float)qpd;
// Getting bin deviation from +/i interval
float deltaFrequency = osamp * deltaPhase / (2 * M_PI);
// Calculating true frequency at the j-th partial
trueFrequency[j] = (j * (sampleRate/samples)) + (deltaFrequency * (sampleRate/samples));
}
UInt32 mag;
mag = binNumber;
// Extracting frequency at bin peak
float f = trueFrequency[mag];
NSLog(#"True frequency = %fHz", f);
float b = roundf(binNumber*(sampleRate/nOver2));
NSLog(#" Bin frequency = %fHz", b);
}
Note that the expected phase difference (even for a bin-centered frequency) depends on both the window offset or overlap of the FFT pairs, and the bin number or frequency of the FFT result. e.g. If you offset the windows by very little (1 sample), then the unwrapped phase difference between 2 FFTs will be smaller than with a larger offset. At the same offset, if the frequency is higher, the expected phase difference between the same bin of two FFTs will be greater (or it will wrap more).

Sampling custom float texture2D in HLSL

I am wondering how to actually sample the data I am passing to the shader file. I am using two methods, is it the same for both? Are there any resources online for me to actually read up on this sort of thing?
Compiling at 5.0 but the version number does not matter so much.
I have two methods to pass the data.
The first;
for( UINT row = 0; row < textureDesc.Height; row++ )
{
UINT rowStart = row * mappedResource.RowPitch;
for( UINT col = 0; col < textureDesc.Width; col++ )
{
//width * number of channels (r,g,b,a)
UINT colStart = col * 4;
pTexels[rowStart + colStart + 0] = 10.0f; // Red
pTexels[rowStart + colStart + 1] = 10.0f; // Green
pTexels[rowStart + colStart + 2] = 255.0f; // Blue
pTexels[rowStart + colStart + 3] = 255.0f; // Alpha
}
}
The second;
float elements[416][416];
int elementsCount = 416*416;
for( int i = 0; i < 416; i++ )
{
for( int k = 0; k < 416; k++ )
{
elements[i][k] = 0;
}
}
memcpy(mappedResource.pData, elements, sizeof(float) * elementsCount);
Seems that I missed an important part of all of this.
When creating a texture, in the texture description, the Format is the type that will be returned when the object is sampled. Many thanks to Drop for the help.

Cuda-memcheck not reporting out of bounds shared memory access

I am runnig the follwoing code using shared memory:
__global__ void computeAddShared(int *in , int *out, int sizeInput){
//not made parameters gidata and godata to emphasize that parameters get copy of address and are different from pointers in host code
extern __shared__ float temp[];
int tid = blockIdx.x * blockDim.x + threadIdx.x;
int ltid = threadIdx.x;
temp[ltid] = 0;
while(tid < sizeInput){
temp[ltid] += in[tid];
tid+=gridDim.x * blockDim.x; // to handle array of any size
}
__syncthreads();
int offset = 1;
while(offset < blockDim.x){
if(ltid % (offset * 2) == 0){
temp[ltid] = temp[ltid] + temp[ltid + offset];
}
__syncthreads();
offset*=2;
}
if(ltid == 0){
out[blockIdx.x] = temp[0];
}
}
int main(){
int size = 16; // size of present input array. Changes after every loop iteration
int cidata[] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
/*FILE *f;
f = fopen("invertedList.txt" , "w");
a[0] = 1 + (rand() % 8);
fprintf(f, "%d,",a[0]);
for( int i = 1 ; i< N; i++){
a[i] = a[i-1] + (rand() % 8) + 1;
fprintf(f, "%d,",a[i]);
}
fclose(f);*/
int* gidata;
int* godata;
cudaMalloc((void**)&gidata, size* sizeof(int));
cudaMemcpy(gidata,cidata, size * sizeof(int), cudaMemcpyHostToDevice);
int TPB = 4;
int blocks = 10; //to get things kicked off
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start, 0);
while(blocks != 1 ){
if(size < TPB){
TPB = size; // size is 2^sth
}
blocks = (size+ TPB -1 ) / TPB;
cudaMalloc((void**)&godata, blocks * sizeof(int));
computeAddShared<<<blocks, TPB,TPB>>>(gidata, godata,size);
cudaFree(gidata);
gidata = godata;
size = blocks;
}
//printf("The error by cuda is %s",cudaGetErrorString(cudaGetLastError()));
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
float elapsedTime;
cudaEventElapsedTime(&elapsedTime , start, stop);
printf("time is %f ms", elapsedTime);
int *output = (int*)malloc(sizeof(int));
cudaMemcpy(output, gidata, sizeof(int), cudaMemcpyDeviceToHost);
//Cant free either earlier as both point to same location
cudaError_t chk = cudaFree(godata);
if(chk!=0){
printf("First chk also printed error. Maybe error in my logic\n");
}
printf("The error by threadsyn is %s", cudaGetErrorString(cudaGetLastError()));
printf("The sum of the array is %d\n", output[0]);
getchar();
return 0;
}
Clearly, the first while loop in computeAddShared is causing out of bounds error because I am allocating 4 bytes to shared memory. Why does cudamemcheck not catch this. Below is the output of cuda-memcheck
========= CUDA-MEMCHECK
time is 12.334816 msThe error by threadsyn is no errorThe sum of the array is 13
6
========= ERROR SUMMARY: 0 errors
Shared memory allocation granularity. The Hardware undoubtedly has a page size for allocations (probably the same as the L1 cache line side). With only 4 threads per block, there will "accidentally" be enough shared memory in a single page to let you code work. If you used a sensible number of threads block (ie. a round multiple of the warp size) the error would be detected because there would not be enough allocated memory.

Gaussian Blur Questions

I'm writing a gaussian filter, and my goal is to match the gaussian blur filter in photoshop as closely as possible. This is my first image processing endeavor. Some problems/questions I have are...
Further blurring an image with my filter darkens it, while photoshop’s seems to lighten it.
The deviation value (“sigma,” in my code) I’m using is r/3, which results in the gaussian curve having approached about 0.0001 within the matrix...is there a better way to determine this value?
How does photoshop (or most people) handle image borders for this type of blur?
int matrixDimension = (radius*2)+1;
float sigma = radius/3;
float twoSigmaSquared = 2*pow(sigma, 2);
float oneOverSquareRootOfTwoPiSigmaSquared = 1/(sqrt(M_PI*twoSigmaSquared));
float kernel[matrixDimension];
int index = 0;
for (int offset = -radius; offset <= radius; offset++) {
float xSquared = pow(offset, 2);
float exponent = -(xSquared/twoSigmaSquared);
float eToThePower = pow(M_E, exponent);
float multFactor = oneOverSquareRootOfTwoPiSigmaSquared*eToThePower;
kernel[index] = multFactor;
index++;
}
//Normalize the kernel such that all its values will add to 1
float sum = 0;
for (int i = 0; i < matrixDimension; i++) {
sum += kernel[i];
}
for (int i = 0; i < matrixDimension; i++) {
kernel[i] = kernel[i]/sum;
}
//Blur horizontally
for (int row = 0; row < imageHeight; row++) {
for (int column = 0; column < imageWidth; column++) {
int currentPixel = (row*imageWidth)+column;
int sum1 = 0;
int sum2 = 0;
int sum3 = 0;
int sum4 = 0;
int index = 0;
for (int offset = -radius; offset <= radius; offset++) {
if (!(column+offset < 0) && !(column+offset > imageWidth-1)) {
int firstByteOfPixelWereLookingAtInSrcData = (currentPixel+offset)*4;
int in1 = srcData[firstByteOfPixelWereLookingAtInSrcData];
int in2 = srcData[firstByteOfPixelWereLookingAtInSrcData+1];
int in3 = srcData[firstByteOfPixelWereLookingAtInSrcData+2];
int in4 = srcData[firstByteOfPixelWereLookingAtInSrcData+3];
sum1 += (int)(in1 * kernel[index]);
sum2 += (int)(in2 * kernel[index]);
sum3 += (int)(in3 * kernel[index]);
sum4 += (int)(in4 * kernel[index]);
}
index++;
}
int currentPixelInData = currentPixel*4;
destData[currentPixelInData] = sum1;
destData[currentPixelInData+1] = sum2;
destData[currentPixelInData+2] = sum3;
destData[currentPixelInData+3] = sum4;
}
}
//Blur vertically
for (int row = 0; row < imageHeight; row++) {
for (int column = 0; column < imageWidth; column++) {
int currentPixel = (row*imageWidth)+column;
int sum1 = 0;
int sum2 = 0;
int sum3 = 0;
int sum4 = 0;
int index = 0;
for (int offset = -radius; offset <= radius; offset++) {
if (!(row+offset < 0) && !(row+offset > imageHeight-1)) {
int firstByteOfPixelWereLookingAtInSrcData = (currentPixel+(offset*imageWidth))*4;
int in1 = destData[firstByteOfPixelWereLookingAtInSrcData];
int in2 = destData[firstByteOfPixelWereLookingAtInSrcData+1];
int in3 = destData[firstByteOfPixelWereLookingAtInSrcData+2];
int in4 = destData[firstByteOfPixelWereLookingAtInSrcData+3];
sum1 += (int)(in1 * kernel[index]);
sum2 += (int)(in2 * kernel[index]);
sum3 += (int)(in3 * kernel[index]);
sum4 += (int)(in4 * kernel[index]);
}
index++;
}
int currentPixelInData = currentPixel*4;
finalData[currentPixelInData] = sum1;
finalData[currentPixelInData+1] = sum2;
finalData[currentPixelInData+2] = sum3;
finalData[currentPixelInData+3] = sum4;
}
}
To reverse engineer a filter, you need to find its impulse response. On a background of a very dark value, say 32, place a nearly white pixel, say 223. You don't want to use 0 and 255 because some filters will try to create values beyond the starting values. Run the filter on this image, and take the output values and stretch them from 0.0 to 1.0: (value-32)/(223-32). Now you have the exact weights needed to emulate the filter.
There are lots of ways to treat the image edges. I would suggest taking the filter weights and summing them, then dividing the result by that sum; if you're trying to go beyond the edge, use 0.0 for both the pixel value and the filter weight on that pixel.
Boundary conditions sometimes depend on exactly what you're doing and what kind of data you're working with, but I think for general purpose image manipulation the best thing to do is to extend the values at the borders beyond the edges of the image. Not literally of course, but if the filter tries to read a pixel that's outside the image borders, you substitute the value of the nearest pixel on the edge of the image. Which is really the same as just clamping the row to be between 0 and height, and the column to be between 0 and width.

Resources