I am parsing a 3D file into OpenGL ES on an iOS device and after I get the vertices I can't seem to add them to the GLfloat containing my vertices. At the top of my file I declare this GLFloat:
GLfloat gFileVertices[] = {
-0.686713, 0.346845, 3.725390, -0.000288, -0.000652, -0.000109,
-0.677196, 0.350971, 3.675733, -0.000288, -0.000652, -0.000109,
-0.673889, 0.340921, 3.726985, -0.000288, -0.000652, -0.000109,
-0.677424, 0.337048, 3.775731, -0.000283, -0.000631, -0.000071,
And so on...
}
But how can I put that same data (x,y,z normal.x, normal.y, normal.z) into that array in an instance in which each of those are variables and there are a variable number of rows?
The solution is to allocate the vertices buffer dynamically at runtime, rather than statically at compile time. In your code there is no way to change the size of gFileVertices once the program is compiled.
For management purposes I will instead use separate normals and vertices array instead of interleaving the data into one.
To parse the file, determine the number of vertices and allocate buffers.
GLfloat* verticesBuff = malloc(sizeof(GLfloat) * vertCount * 3); /* 3 floats per vert */
GLfloat* normalsBuff = malloc(sizeof(GLfloat) * vertCount * 3); /* 3 floats per vert */
Then copy each element into the new array:
/* read from file or whatevs */
for (int i = 0; i < vertCount; i ++)
{
verticesBuff[i * 3] = ...
verticesBuff[i * 3 + 1] = ...
verticesBuff[i * 3 + 2] = ...
normalsBuff[i * 3] = ...
normalsBuff[i * 3 + 1] = ...
normalsBuff[i * 3 + 2] = ...
}
The dynamic arrays can be used in OpenGL just like the static:
glVertexPointer(3, GL_FLOAT, 0, verticesBuff);
glNormalPointer(GL_FLOAT, 0, normalsBuff);
Thats it! Just make sure to delete when you are done:
free(verticesBuff);
free(normalsBuff);
Related
I've downloaded apple's truedepth streamer example and am trying to add a compute pipeline. I think I'm retrieving the results of the computation but am not sure as they all seem to be zero.
I'm a beginner at iOS development so there maybe quite a few mistakes so please bear with me!
The pipeline set up: (i wasn't quite sure how to create the resultsbuffer, since the kernel outputs a float3)
int resultsCount = CVPixelBufferGetWidth(depthFrame) * CVPixelBufferGetHeight(depthFrame);
//because I will be output 3 floats for each value in depthframe
id<MTLBuffer> resultsBuffer = [self.device newBufferWithLength:(sizeof(float) * 3 * resultsCount) options:MTLResourceOptionCPUCacheModeDefault];
_threadgroupSize = MTLSizeMake(16, 16, 1);
// Calculate the number of rows and columns of threadgroups given the width of the input image
// Ensure that you cover the entire image (or more) so you process every pixel
_threadgroupCount.width = (inTexture.width + _threadgroupSize.width - 1) / _threadgroupSize.width;
_threadgroupCount.height = (inTexture.height + _threadgroupSize.height - 1) / _threadgroupSize.height;
// Since we're only dealing with a 2D data set, set depth to 1
_threadgroupCount.depth = 1;
id<MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoder];
[computeEncoder setComputePipelineState:_computePipelineState];
[computeEncoder setTexture: inTexture atIndex:0];
[computeEncoder setBuffer:resultsBuffer offset:0 atIndex:1];
[computeEncoder setBytes:&intrinsics length:sizeof(intrinsics) atIndex:0];
[computeEncoder dispatchThreadgroups:_threadgroupCount
threadsPerThreadgroup:_threadgroupSize];
[computeEncoder endEncoding];
// Finalize rendering here & push the command buffer to the GPU
[commandBuffer commit];
//for testing
[commandBuffer waitUntilCompleted];
I have added the following compute kernel:
kernel void
calc(texture2d<float, access::read> inTexture [[texture(0)]],
device float3 *resultsBuffer [[buffer(1)]],
constant float3x3& cameraIntrinsics [[ buffer(0) ]],
uint2 gid [[thread_position_in_grid]])
{
float val = inTexture.read(gid).x * 1000.0f;
float xrw = (gid.x - cameraIntrinsics[2][0]) * val / cameraIntrinsics[0][0];
float yrw = (gid.y - cameraIntrinsics[2][1]) * val / cameraIntrinsics[1][1];
int vertex_id = ((gid.y * inTexture.get_width()) + gid.x);
resultsBuffer[vertex_id] = float3(xrw, yrw, val);
}
Code for seeing buffer result: (I tried two different ways and both are outputting all zeroes at the moment)
void *output = [resultsBuffer contents];
for (int i = 0; i < 10; ++i) {
NSLog(#"value is %f", *(float *)(output) ); //= *(float *)(output + 4 * i);
}
NSData *data = [NSData dataWithBytesNoCopy:resultsBuffer.contents length:(sizeof(float) * 3 * resultsCount)freeWhenDone:NO];
float *finalArray = new float [resultsCount * 3];
[data getBytes:&finalArray[0] length:sizeof(finalArray)];
for (int i = 0; i < 10; ++i) {
NSLog(#"here is output %f", finalArray[i]);
}
I see a couple of problems here, but neither of them are related to your Metal code per se.
In your first output loop, as written, you're just printing the first element of the results buffer 10 times. The first element may legitimately be 0, leading you to believe all of the results are zero. But when I changed the first log line to
NSLog(#"value is %f", ((float *)output)[i]);
I saw different values printed when running your kernel on a test image.
The other issue is related to your getBytes:length: call. You want to pass the number of bytes to copy, but sizeof(finalArray) is actually the size of the finalArray pointer, i.e., 4 bytes, not the total size of the buffer it points to. This is an extremely common error in C and C++ code.
Instead, you can use the same byte count as the one you used when allocating space:
[data getBytes:&finalArray[0] length:(sizeof(float) * 3 * resultsCount)];
You should then find that you get the same (non-zero) values printed as in the previous step.
I would like to be able to define a MTLBuffer and populate data directly to the buffer (or as efficiently as possible).
If I do the following, the values used in the shader are 1.0 and 2.0 (for X and Y respectively), not 3.0 and 4.0 which are set after the MTLBuffer is created.
int bufferLength = 128 * 128;
float pointBuffer[bufferLength * 2]; // 2 for X and Y
//Populate array with test values
for (int i = 0; i < (bufferLength * 2); i += 2) {
pointBuffer[i] = 1.0; //X
pointBuffer[i + 1] = 2.0; //Y
}
id<MTLBuffer> pointDataBuffer = [device newBufferWithBytes:&pointBuffer length:sizeof(pointBuffer) options:MTLResourceOptionCPUCacheModeDefault];
//Populate array with updated test values
for (int i = 0; i < (bufferLength * 2); i += 2) {
pointBuffer[i] = 3.0; //X
pointBuffer[i + 1] = 4.0; //Y
}
//In the (Swift) class with the pipeline:
commandEncoder!.setBuffer(pointDataBuffer, offset: 0, index: 4)
Based on the docs, it seems like I need to call didModifyRange: but pointDataBuffer does not seem to recognize the selector.
Is there a way to update the array without having to recreate the MTLBuffer?
-newBufferWithBytes:... makes a copy of the passed in bytes. It does not keep referencing them. So, subsequent changes to pointBuffer do not affect it.
However, buffers like this one (whose storage mode is not private) provide access to their storage through the -contents method. So, you could do something like this:
float *points = pointDataBuffer.contents;
for (int i = 0; i < (bufferLength * 2); i += 2) {
points[i] = 3.0; //X
points[i + 1] = 4.0; //Y
}
Be careful, though. The CPU and GPU operate asynchronously relative to each other. If there might be commands being processed by the GPU that reference the buffer, then modifying it from the CPU may interfere with the operation of those commands. So, you'll want to take steps to synchronize access to the buffer or otherwise avoid simultaneous CPU and GPU access.
I have a function defined by Intel IPP to operate on an Image / Region of Image.
The input to the image are the pointer to the image, parameters to define the size to process and parameters of the filter.
The IPP function is single threaded.
Now, I have an image of size M x N.
I want to apply the filter on it in parallel.
The main idea is simple, break the image into 4 sub images which are independent of each other.
Apply the filter to each sub image and write the result to a sub block of an empty image where each thread write to a distinct set of pixels.
It's really like processing 4 images each on it own core.
This is the program I'm doing it with:
void OpenMpTest()
{
const int width = 1920;
const int height = 1080;
Ipp32f input_image[width * height];
Ipp32f output_image[width * height];
IppiSize size = { width, height };
int step = width * sizeof(Ipp32f);
/* Splitting the image */
IppiSize section_size = { width / 2, height / 2};
Ipp32f* input_upper_left = input_image;
Ipp32f* input_upper_right = input_image + width / 2;
Ipp32f* input_lower_left = input_image + (height / 2) * width;
Ipp32f* input_lower_right = input_image + (height / 2) * width + width / 2;
Ipp32f* output_upper_left = input_image;
Ipp32f* output_upper_right = input_image + width / 2;
Ipp32f* output_lower_left = input_image + (height / 2) * width;
Ipp32f* output_lower_right = input_image + (height / 2) * width + width / 2;
Ipp32f* input_sections[4] = { input_upper_left, input_upper_right, input_lower_left, input_lower_right };
Ipp32f* output_sections[4] = { output_upper_left, output_upper_right, output_lower_left, output_lower_right };
/* Filter Params */
Ipp32f pKernel[7] = { 1, 2, 3, 4, 3, 2, 1 };
omp_set_num_threads(4);
#pragma omp parallel for
for (int i = 0; i < 4; i++)
ippiFilterRow_32f_C1R(
input_sections[i], step,
output_sections[i], step,
section_size, pKernel, 7, 3);
}
Now, the issues is I see no gain versus working Single Threaded mode on all image.
I tried to change the image size or filter size and nothing will the change the picture.
The most I could gain was nothing significant (10-20%).
I thought it might have something to do with that I can't "Promise" each thread the zone it received is "Read Only".
Moreover to let it know the memory location it writes to is also belongs only to himself.
I read about defining variables as private and share, yet I couldn't find a guide to deal with arrays and pointers.
What would be the proper way to deal with pointers and sub arrays in OpenMP?
How does the performance of threaded IPP compare?
Assuming no race conditions, performance problems with writing to shared arrays are most likely to occur in cache lines where part of the line is written by one thread and another part is read by another.
It's likely to require a data region larger than a 10 megabytes or so before full parallel speedup is seen.
You would need deeper analysis, e.g. by Intel VTune Amplifier, to see whether memory bandwidth or data overlaps are limiting performance.
Using Intel IPP Filter, the best solution was using:
int height = dstRoiSize.height;
int width = dstRoiSize.width;
Ipp32f *pSrc1, *pDst1;
int nThreads, cH, cT;
#pragma omp parallel shared( pSrc, pDst, nThreads, width, height, kernelSize,\
xAnchor, cH, cT ) private( pSrc1, pDst1 )
{
#pragma omp master
{
nThreads = omp_get_num_threads();
cH = height / nThreads;
cT = height % nThreads;
}
#pragma omp barrier
{
int curH;
int id = omp_get_thread_num();
pSrc1 = (Ipp32f*)( (Ipp8u*)pSrc + id * cH * srcStep );
pDst1 = (Ipp32f*)( (Ipp8u*)pDst + id * cH * dstStep );
if( id != ( nThreads - 1 )) curH = cH;
else curH = cH + cT;
ippiFilterRow_32f_C1R( pSrc1, srcStep, pDst1, dstStep,
width, curH, pKernel, kernelSize, xAnchor );
}
}
Thank You.
I've read these question:
Using the Apple FFT and Accelerate Framework
How do I set up a buffer when doing an FFT using the Accelerate framework?
iOS FFT Accerelate.framework draw spectrum during playback
They all describe how to setup fft with the accelerate framework. With their help I was able to setup fft and get a basic spectrum analyzer. Right now, I'm displaying all values I got from the fft. However, I only want to show 10-15, or a variable number, of bars respreseting certain frequencies. Just like the iTunes or WinAmp Level Meter.
1. Do I need to average magnitude values from a range of frequencies? Or do they just show you a magnitude for the specific frequency bar?
2. Also, do I need to convert my magnitude values to db?
3. How do I map my data to a certain range. Do I map against the max db range for my sounds bitdepth? Getting the max Value for a bin will lead to jumping max mapping values.
My RenderCallback:
static OSStatus PlaybackCallback(void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData)
{
UInt32 maxSamples = kAudioBufferNumFrames;
UInt32 log2n = log2f(maxSamples); //bins
UInt32 n = 1 << log2n;
UInt32 stride = 1;
UInt32 nOver2 = n/2;
COMPLEX_SPLIT A;
float *originalReal, *obtainedReal, *frequencyArray, *window, *in_real;
in_real = (float *) malloc(maxSamples * sizeof(float));
A.realp = (float *) malloc(nOver2 * sizeof(float));
A.imagp = (float *) malloc(nOver2 * sizeof(float));
memset(A.imagp, 0, nOver2 * sizeof(float));
obtainedReal = (float *) malloc(n * sizeof(float));
originalReal = (float *) malloc(n * sizeof(float));
frequencyArray = (float *) malloc(n * sizeof(float));
//-- window
UInt32 windowSize = maxSamples;
window = (float *) malloc(windowSize * sizeof(float));
memset(window, 0, windowSize * sizeof(float));
// vDSP_hann_window(window, windowSize, vDSP_HANN_DENORM);
vDSP_blkman_window(window, windowSize, 0);
vDSP_vmul(ioBuffer, 1, window, 1, in_real, 1, maxSamples);
//-- window
vDSP_ctoz((COMPLEX*)in_real, 2, &A, 1, maxSamples/2);
vDSP_fft_zrip(fftSetup, &A, stride, log2n, FFT_FORWARD);
vDSP_fft_zrip(fftSetup, &A, stride, log2n, FFT_INVERSE);
float scale = (float) 1.0 / (2 * n);
vDSP_vsmul(A.realp, 1, &scale, A.realp, 1, nOver2);
vDSP_vsmul(A.imagp, 1, &scale, A.imagp, 1, nOver2);
vDSP_ztoc(&A, 1, (COMPLEX *) obtainedReal, 2, nOver2);
vDSP_zvmags(&A, 1, obtainedReal, 1, nOver2);
Float32 one = 1;
vDSP_vdbcon(obtainedReal, 1, &one, obtainedReal, 1, nOver2, 0);
for (int i = 0; i < nOver2; i++) {
frequencyArray[i] = obtainedReal[i];
}
// Extract the maximum value
double fftMax = 0.0;
vDSP_maxmgvD((double *)obtainedReal, 1, &fftMax, nOver2);
float max = sqrt(fftMax);
}
Playing some music, I get values from -96db to 0db.
Plotting a point at:
CGPointMake(i, kMaxSpectrumHeight * (1 - frequencyArray[i]/-96.));
is giving my a rather rounded curve:
plot1
If I don't convert to db I can plot by multiplying my array value by 10000 and get nice peaks.
plot2
Am I doing something totally wrong? And how do I get to showing a variable number of bars?
Do I need to average magnitude values from a range of frequencies? Or do they just show you a magnitude for the specific frequency bar?
Yes, you definitely need to average values across the bands you've defined. Showing just one FFT bin is madness.
Also, do I need to convert my magnitude values to db?
Yes: dB is a log scale. Not coincidentally, human hearing also works (roughly) on a log scale. The values will therefore look more natural to humans if you take log2() of the values before plotting them.
How do I map my data to a certain range. Do I map against the max db range for my sounds bitdepth? Getting the max Value for a bin will
lead to jumping max mapping values.
I find the easiest thing to do (conceptually at least) is to convert your values from whatever format into a 0..1, i.e. 'normalised and scaled' float value. Then from there you can convert if necessary to something you need to plot. For example
SInt16 rawValue = fft[0]; // let's say this comes back as 12990
float scaledValue = rawValue/32767.; // This is MAX_INT for 16-bit;
// dividing we get .396435438 which is much easier for most people
// to see conceptually as 39% of our max possible value
float displayValue = log2(scaledValue);
my_fft[0] = displayValue;
UPDATE 2016-03-15
Please take a look at this project: https://github.com/ooper-shlab/aurioTouch2.0-Swift. It has been ported to Swift and contains every answer you're looking for, if you cam here.
I did a lot of research and learned a lot about FFT and the Accelerate Framework. But after days of experiments I'm kind of frustrated.
I want to display the frequency spectrum of an audio file during playback in a diagram. For every time interval it should show the magnitude in db on the Y-axis (displayed by a red bar) for every frequency (in my case 512 values) calculated by a FFT on the X-Axis.
The output should look like this:
I fill a buffer with 1024 samples extracting only the left channel for the beginning. Then I do all this FFT stuff.
Here is my code so far:
Setting up some variables
- (void)setupVars
{
maxSamples = 1024;
log2n = log2f(maxSamples);
n = 1 << log2n;
stride = 1;
nOver2 = maxSamples/2;
A.realp = (float *) malloc(nOver2 * sizeof(float));
A.imagp = (float *) malloc(nOver2 * sizeof(float));
memset(A.imagp, 0, nOver2 * sizeof(float));
obtainedReal = (float *) malloc(n * sizeof(float));
originalReal = (float *) malloc(n * sizeof(float));
setupReal = vDSP_create_fftsetup(log2n, FFT_RADIX2);
}
Doing the FFT. FrequencyArray is just a data structure that holds 512 float values.
- (FrequencyArry)performFastFourierTransformForSampleData:(SInt16*)sampleData andSampleRate:(UInt16)sampleRate
{
NSLog(#"log2n %i n %i, nOver2 %i", log2n, n, nOver2);
// n = 1024
// log2n 10
// nOver2 = 512
for (int i = 0; i < n; i++) {
originalReal[i] = (float) sampleData[i];
}
vDSP_ctoz((COMPLEX *) originalReal, 2, &A, 1, nOver2);
vDSP_fft_zrip(setupReal, &A, stride, log2n, FFT_FORWARD);
float scale = (float) 1.0 / (2 * n);
vDSP_vsmul(A.realp, 1, &scale, A.realp, 1, nOver2);
vDSP_vsmul(A.imagp, 1, &scale, A.imagp, 1, nOver2);
vDSP_ztoc(&A, 1, (COMPLEX *) obtainedReal, 2, nOver2);
FrequencyArry frequencyArray;
for (int i = 0; i < nOver2; i++) {
frequencyArray.frequency[i] = log10f(obtainedReal[i]); // Magnitude in db???
}
return frequencyArray;
}
The output looks always kind of weird although it some how seems to move according to the music.
I'm happy that I came so far thanks to some very good posts here like this:
Using the apple FFT and accelerate Framework
But now I don't know what to do. What am I missing?
Firstly you're not applying a window function prior to the FFT - this will result in smearing of the spectrum due to spectral leakage.
Secondly, you're just using the real component of the FFT output bins to calculate dB magnitude - you need to use the complex magnitude:
magnitude_dB = 10 * log10(re * re + im * im);