This code mimics some image processing with malloced memory, it's a distilled example of a problem. It runs fine if optimized at other levels including "Fastest Smallest", but fails on GCC_OPTIMIZATION_LEVEL = 3 AKA Fastest [-03] and Fastest Aggressive. It crashes only on the device, seen on a 6,5s,5 and various IOS 9.3, 8.4.
There's something about the allocation sizes that aggravates the issue. There are some notes in the code about what helps make it fail.
Reproduce by creating an single view app project, set the optimization level to "Fastest" and paste this code into main and call it from inside the autorelease pool, or paste it in the view controller and call it from viewDidLoad or anywhere you like.
The debugger isn't very useful with optimizations turned on, but the crash comes in the while loop at "*writeIter = readIter->d;" a EXC_BAD_ACCESS code=1
So that tells me it's reading and the address that triggers the EXC_BAD_ACCESS is the same as readEnd. That should never happen as that's the condition the while is supposed to prevent... optimizer bug or stupid mistake?
#import <stdlib.h>
#import <stdio.h>
/**
Requires this to fail -> GCC_OPTIMIZATION_LEVEL = 3
This won't do it -> GCC_OPTIMIZATION_LEVEL = s
*/
typedef struct {
unsigned char a, b, c, d;
} foo;
void boom()
{
char* memory[1000];
// these sizes are important to reproducing this issue, changing them by +-1 will make it go away
int height = 960; //480,960,1920
int width = 1280; //640,1280,2560
int depth = sizeof(foo);
printf("height = %d, width = %d, total = %d\n\n", height, width, height*width*depth);
for (int i = 0; i < 1000; ++i)
{
memory[i] = malloc(20000); // allocate memory to force the allocations of readBuf and writeBuf to move, numbers
// less than 15k don't effect the alloced addresses of the bufs, so we keep getting
// the same ones and no boom.
foo* readBuf = malloc(height*width*depth);
unsigned char* writeBuf = malloc(height*width); // smaller than read
foo *readIter = readBuf;
foo *readEnd = readBuf + height*width; // only read size of smaller
unsigned char* writeIter = writeBuf;
printf("test: i = %d, readIter = %p, readEnd = %p, writeIter = %p\n", i, readIter, readEnd, writeIter);
while (readIter < readEnd)
{
*writeIter = readIter->d; // you died here during a read, and readIter == readEnd, look at the EXC_BAD_ACCESS address
// (printfed) it's readEnd, and that isn't supposed to happen with the conditional.
++writeIter;
++readIter;
}
free(readBuf);
free(writeBuf);
}
for (int i = 0; i < 1000; ++i)
{
free(memory[i]);
}
}
Related
first this my code
#pragma pack (4)
typedef struct _Login{
char user[32];
char pwd[32];
int userID;
}Login,*PLogin;
const unsigned long MSG_TAG_HEADER_YXHY = 0x59485859;
#pragma pack (2)
typedef struct tagTcpPacketHeader
{
int ulHtag;
char ucVersion;
char ucCmd;
int ulUserId;
short usPacketNum;
int ulDataLen;
}TcpPacketHeader,*LPTcpPacketHeader;
#pragma pack ()
const unsigned int TCP_HEADER_PACKET_LEN = sizeof(TcpPacketHeader);
- (NSData*)sendDataFileWithUserId:(const int)nUserId nCmd:(const int)nCmd pData:(NSData*)data{
NSData* sendData;
void* sendObj = malloc(data.length);
[data getBytes:sendObj length:data.length];
static int nPacketNum = 0;
int nLen = (int)data.length + TCP_HEADER_PACKET_LEN;
char *pTmpBuf = malloc(nLen);
LPTcpPacketHeader tcpHeader = (LPTcpPacketHeader)pTmpBuf;
tcpHeader->ulHtag = MSG_TAG_HEADER_YXHY;
tcpHeader->ucVersion = 1;
tcpHeader->ucCmd = nCmd;
tcpHeader->ulUserId = nUserId;
tcpHeader->usPacketNum = nPacketNum;
tcpHeader->ulDataLen = nLen;
memcpy(tcpHeader + TCP_HEADER_PACKET_LEN,sendObj, data.length);
sendData = [NSData dataWithBytes:pTmpBuf length:nLen];
nPacketNum++;
free(pTmpBuf);
free(sendObj);
return sendData;
}
- (NSData*)get_File_Login:(NSString*)userID{
int length = sizeof(Login);
Login log_in = {"123","456",userID.intValue};
NSData* login_data = [NSData dataWithBytes:&log_in length:length];
NSData* ret = [self sendDataFileWithUserId:log_in.userID nCmd:5 pData:login_data];
return ret;
}
Use
NSData* ms = [self get_File_Login:#"123"];
NSLog(#"%#",ms);
After frequent use can be a problem
question
This question makes me very headache why appear “ set a breakpoint in malloc_error_break to debug ”
I have added the "malloc_error_break" the breakpoint,But it doesn't work......
Who can tell me the answer???
When you use the pointer in memcpy this way
memcpy(tcpHeader + TCP_HEADER_PACKET_LEN,sendObj, data.length);
this means that you want to copy into memory location pointed by tcpHeader plus TCP_HEADER_PACKET_LEN times the size of the data the pointer points to. It is the same as doing &tcpHeader[TCP_HEADER_PACKET_LEN].
Assuming you want to write to a location right after the header there are two ways to fix it:
1) use a pointer with a size of 1, meaning a char*. In your code you have a pointer pTmpBuf that is such so just change the code to:
memcpy(pTmpBuf + TCP_HEADER_PACKET_LEN, sendObj, data.length);
2) use the size 1 for this calculation. Since the size of the data it points to is the same as TCP_HEADER_PACKET_LEN then multiplying it by one gives the correct location:
memcpy(tcpHeader + 1, sendObj, data.length);
I would recommend the first since it's clear what you are calculating. In the second it is unclear why you would add one, as well as using a pointer to one type when copying data that isn't that type.
i am trying to build an iOS 7 application that detecting the sound/song pitch(or frequency), For example: 349.23Hz, 392.00Hz, 440.00Hz......
So, I download the "Auto Correllation" project (it's a Musician's ket http://musicianskit.com/developer.php), I run it on iOS 7 Simulator, it works fine, The "hanning fft window" have value (not NaN), and able get the frequency finally.
But, it doesn't work on iPhone device, it cannot has any value in "hanning fft window".
Can anybody have a look into these classes by Kevin Murphy and tell me how I could modify them to work on iPhone device(not the iOS simulator)?
Many many thanks~
I've pasted my code below:
// PitchDetector.m
-(id) initWithSampleRate: (float) rate lowBoundFreq: (int) low hiBoundFreq: (int) hi andDelegate: (id<PitchDetectorDelegate>) initDelegate {
self.lowBoundFrequency = low;
self.hiBoundFrequency = hi;
self.sampleRate = rate;
self.delegate = initDelegate;
bufferLength = self.sampleRate/self.lowBoundFrequency;
hann = (float*) malloc(sizeof(float)*bufferLength);
// applied the Hanning windows, the 'hann' is the Hanning fft Window
vDSP_hann_window(hann, bufferLength, vDSP_HANN_NORM);
sampleBuffer = (SInt16*) malloc(512);
samplesInSampleBuffer = 0;
result = (float*) malloc(sizeof(float)*bufferLength);
return self;
}
-(void) performWithNumFrames: (NSNumber*) numFrames;
{
int n = numFrames.intValue;
float freq = 0;
SInt16 *samples = sampleBuffer;
int returnIndex = 0;
float sum;
bool goingUp = false;
float normalize = 0;
for(int i = 0; i<n; i++) {
sum = 0;
for(int j = 0; j<n; j++) {
//here I found the hann[j] is NaN. seems doesn't have value in hann('hann' is the Hanning fft Window)
//if hann[j] is Not a Number (NaN), the value of sum also to be NaN.
sum += (samples[j]*samples[j+i])*hann[j];
}
if(i ==0 ) normalize = sum;
result[i] = sum/normalize;
}
......
......
}
I am using this same program from:
https://github.com/fotock/PitchDetectorExample/tree/1c68491f9c9bff2e851f5711c47e1efe4092f4de
Although I have not put this on an iPhone yet, only simulator, I was having problems from time time with the program crashing. I found that I needed to manually update it with from a "fork" of the code on github found here:
https://github.com/fotock/PitchDetectorExample/network
I added Jordan Liggitt's bug fixes manually and now the app does not crash. I hope this helps because if it does not, then I will be facing the same issues when I load this app on an iPhone.
Hope it works!
Update
I have now installed this on an iPhone vs the simulator and it works as it should without errors or crashing.
I want to count the total non-zero points number in an image using OpenCL.
Since it is an adding work, I used the atom_inc.
And the kernel code is shown here.
__kernel void points_count(__global unsigned char* image_data, __global int* total_number, __global int image_width)
{
size_t gidx = get_global_id(0);
size_t gidy = get_global_id(1);
if(0!=*(image_data+gidy*image_width+gidx))
{
atom_inc(total_number);
}
}
My question is, by using atom_inc it will be much redundant right?
Whenever we meet a non-zero point, we should wait for the atom_inc.
I have a idea like this, we can separate the whole row into hundreds groups, we find the number in different groups and add them at last.
If we can do something like this:
__kernel void points_count(__global unsigned char* image_data, __global int* total_number_array, __global int image_width)
{
size_t gidx = get_global_id(0);
size_t gidy = get_global_id(1);
if(0!=*(image_data+gidy*image_width+gidx))
{
int stepy=gidy%10;
atom_inc(total_number_array+stepy);
}
}
We will separate the whole problem into more groups.
In that case, we can add the numbers in the total_number_array one by one.
Theoretically speaking, it will have a great performance improvement right?
So, does anyone have some advice about the summing issue here?
Thanks!
Like mentioned in the comments this is a reduction problem.
The idea is to keep separate counts and then put them back together at the end.
Consider using local memory to store the values.
Declare a local buffer to be used by each work group.
Keep track of the number of occurrences in this buffer by using the local_id as the index.
Sum these values at the end of execution.
A very good introduction to the reduction problem using Opencl is shown here:
http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-case-study-simple-reductions/
The reduction kernel could look like this (taken from the link above):
__kernel
void reduce(
__global float* buffer,
__local float* scratch,
__const int length,
__global float* result) {
int global_index = get_global_id(0);
int local_index = get_local_id(0);
// Load data into local memory
if (global_index < length) {
scratch[local_index] = buffer[global_index];
} else {
// Infinity is the identity element for the min operation
scratch[local_index] = INFINITY;
}
barrier(CLK_LOCAL_MEM_FENCE);
for(int offset = get_local_size(0) / 2;
offset > 0;
offset >>= 1) {
if (local_index < offset) {
float other = scratch[local_index + offset];
float mine = scratch[local_index];
scratch[local_index] = (mine < other) ? mine : other;
}
barrier(CLK_LOCAL_MEM_FENCE);
}
if (local_index == 0) {
result[get_group_id(0)] = scratch[0];
}
}
For further explanation see the proposed link.
I have posted screenshot of my error code.
heights output
please any one can help me?
I think the static analyzer is not seeing how _numberOfColumns can become non-zero, and hence its insistence that garbage is being assigned. You need to check that you are actually providing some means for _numberOfColumns to become non-zero.
Generally when I am writing loops that want to find the largest or the smallest value, I initialize the size variable to the largest (if I want the smallest) or smallest (if I want the largest) amount, and I think this will solve most of your issues:
float shortestHeight = FLT_MAX;
for (unsigned i = 0; i < _numberOfColumns; i++)
{
// etc.
}
The analyzer is correct. Your code will access garbage memory if _numberOfColumns is 0, thus allocating 0 bytes for heights, making heights[0] garbage. The analyzer doesn't know what values _numberOfColumns can have, but you can tell it by using assert(_numberOfColumns>0).
Take this C program for example:
int main(int argc, const char * argv[])
{
int n = argc-1;
int *a = malloc(n*sizeof(int));
for (int i=0; i<n; i++) {
a[i] = i;
}
int foo = a[0];
free(a);
return foo;
}
the size of a is determined by the number of arguments. If you have no arguments n == 0. If you are sure that your program (or just that part of your program) will always assign something greater than 0 to a, you can use an assertion. Adding assert(n>0) will tell the analyzer exactly that.
I am trying to eliminate memory leaks in my following program
int main (int argc, char **argv) {
node_ref head = NULL;
for (int argi = 0; argi < 5; ++argi) {
node_ref node = malloc (sizeof (struct node));
assert (node != NULL);
node->word = argv[argi];
node->link = head;
head = node;
}
for (node_ref curr = head; curr->link != NULL; curr = curr->link) {
printf ("%p->node {word=%p->[%s], link=%p}\n",
curr, curr->word, curr->word, curr->link);
}
while(head != NULL){
struct node* temp;
temp = head;
head++;
free(temp);
}
return 9;
}
Yet when I run valgrind it goes crazy with memory leaks, any idea here on what I'm doing wrong?
You are allocating memory inside the loop which is resulting in multiple memory areas. It looks like you should be calling malloc() before your loop instead.
EDIT:
After looking at this again, I think the second loop that frees the memory is incorrect. You are incrementing with head++; rather than setting head = temp->link; It is incorrect to assume that malloc will give you contiguous memory segments.
In your free loop, you use head++ which will give you garbage. You want head = head->link