Need urgent help on thread: the goal here is the separtemask will take each image and separate different contours and for each contour in the image it will call handleobject thread. So every for loop will call the handeobject thread. However, object index variable needs to be passed in each thread. But only last value of objectndex is passed, this is becuase the speratemask function loops and repalces the value of obj.objindx and only the last value of obj.objindx is
passed to all the threads. Is there anyway to pass each objectindex
value in handleobject. The code runs fine if we uncomment the pthread_join(tid[objectIndex],NULL); but it will not give a parralel program
void separateMask(IplImage *maskImg)
{
for(r = contours; r != NULL; r = r->h_next)
{
cvSet(objectMaskImg, cvScalarAll(0), NULL);
CvScalar externalColor = cvScalarAll(0xff);
CvScalar holeColor = cvScalarAll(0x00);
int maxLevel = -1;
int thinkness = CV_FILLED;
int lineType = 8; /* 8-connected */
cvDrawContours(objectMaskImg, r, externalColor, holeColor, maxLevel, thinkness,lineType, cvPoint(0,0));;
obj.objectMaskImg1[objectIndex]=(IplImage *) malloc(sizeof(IplImage));
obj.objectMaskImg1[objectIndex]=objectMaskImg;
obj.objindx=objectIndex;
obj.intensityOut1=intensityOut;
obj.tasOut1=tasOut;
pthread_create(&tid[objectIndex],NULL,handleObject,(void *)&obj);
//pthread_join(tid[objectIndex],NULL);
printf("objectindx %d\n",obj.objindx);
objectIndex++;
}
// cvReleaseImage(&objectMaskImg);
//cvReleaseMemStorage(&storage);
printf("Exitng Separatemask\n");
}
void* handleObject(void *arg)
{
int i, j;
handle *hndl;
hndl=(handle *) malloc(sizeof(handle));
hndl=(handle*)arg;
pthread_mutex_t lock=PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(&lock);
IplImage *pImg;
float statistics_ratio[3][9];
pthread_t tid3;
tas3 tas2;
pImg = cvLoadImage("image.tif", CV_LOAD_IMAGE_ANYCOLOR | CV_LOAD_IMAGE_ANYDEPTH);
if(pImg == NULL)
{
fprintf(stderr, "Fail to load image %s\n", "tiff file");
return ;
}
tas2.pImg1=pImg;
printf("tst%d\n",hndl->objindx);
tas2.x=hndl->objindx;
tas2.objectMaskImg1=hndl->objectMaskImg1[tas2.x];
tas2.statistics_ratio[3][9]=statistics_ratio[3][9];
double mean = average_intensity(pImg, tas2.objectMaskImg1);
int total = total_white(pImg, tas2.objectMaskImg1);
pthread_mutex_unlock(&lock);
printf("Exiting handle object thread_id %d\n\n", pthread_self());
}
This function appears to have issues
void* handleObject(void *arg)
Firstly
pthread_mutex_t lock=PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_lock(&lock);
this is a locally created mutex - but created WITHIN the thread function. So you lock it but since nothing else can see the mutex, why do you need it??? It gives no synchronization functionality if no other threads can see it.
Secondly
float statistics_ratio[3][9];
pthread_t tid3;
tas3 tas2;
pImg = cvLoadImage("image.tif", CV_LOAD_IMAGE_ANYCOLOR | CV_LOAD_IMAGE_ANYDEPTH);
if(pImg == NULL){
fprintf(stderr, "Fail to load image %s\n", "tiff file");
return ;
}
tas2.pImg1=pImg;
printf("tst%d\n",hndl->objindx);
tas2.x=hndl->objindx;
tas2.objectMaskImg1=hndl->objectMaskImg1[tas2.x];
tas2.statistics_ratio[3][9]=statistics_ratio[3][9];
you create a local uninitialised 2d float array statistics_ratio, do nothing with it then assign it to another locally created object member. This appears to be meaningless, as does the declaration of another pthread instance tid3.
It doesn't really matter since nothing else can see the thread but you return from inside this function if pImg == NULL without first unlocking the mutex.
It is very hard to see why your code doesnt work or what it is meant to do, but perhaps the things highlighted above may help. You are creating a lot of local variables within your thread functions which are not being used. I am not sure if you need some of these to be global instead - particularly the mutex (if indeed you need one at all).
I think your initial problem is that you're reusing the obj structure that you're passing to the created threads so you'll have data races where the thread just created will read information that's been overwritten with data intended for another thread.
The loop that creates the threads has the following structure:
for(r = contours; r != NULL; r = r->h_next)
{
// initialize obj with information for the thread
// ...
// create a thread and pass it a pointer to obj
pthread_create(&tid[objectIndex],NULL,handleObject,(void *)&obj);
// some other bookkeeping
}
Since you immediately re-initialize obj on the next loop iteration, who knows what data the thread function is going to get? That's why things work if you join the thread after creating it - the obj structure remains stable because the loop blocks until the thread is finished.
Change the loop to look something like:
for(r = contours; r != NULL; r = r->h_next)
{
// instead of using `obj`, allocate a struct using malloc
handle* threaddata = malloc(sizeof(handle); // note: I'm not sure if `handle` is the right type
// initialize *threaddata with information for the thread
// ...
// create a thread and pass it the threaddata pointer
pthread_create(&tid[objectIndex],NULL,handleObject,threaddata);
// some other bookkeeping
}
Then free() the data in the thread function after it is finished with with (ie., the thread creation code creates and initializes the block of data then passes ownership of it to the thread).
Note that this might not be as straightforward as it often is, becuase it looks like your obj structure already has some per-thread information in it (the objectMaskImg1 element looks to be an array it each element being intended for a separate thread). So you might need to do some refactoring of the data structure as well.
Finally, there are several other outright bugs such as immediately overwriting pointers to blocks allocated by malloc():
obj.objectMaskImg1[objectIndex]=(IplImage *) malloc(sizeof(IplImage));
obj.objectMaskImg1[objectIndex]=objectMaskImg;
and
hndl=(handle *) malloc(sizeof(handle));
hndl=(handle*)arg;
In addition to the pointless use of the mutex in the handleObject() thread function as mentioned by mathematician1975 (http://stackoverflow.com/a/11460092/12711).
A fair bit of the code in the thread function (copying or attempting to copy data locally, the mutex) appears to be stuff thrown in to try to fix problems without actually understanding what the problem is. I think you really need to get an understanding of where various data lives, how to copy it (as opposed to just copying a pointer to it), and how to manage the ownership of the data.
Related
After reading the excellent blog post by Mike Ash "Friday Q&A 2014-05-09: When an Autorelease Isn't" on ARC, I decided to check out the details of the optimisations that ARC applies to speed up the retain/release process.
The trick I'm referring to is called "Fast autorelease" in which the caller and callee cooperate to keep the returned object out of the autorelease pool. This works best in situation like the following:
- (id) myMethod {
id obj = [MYClass new];
return [obj autorelease];
}
- (void) mainMethod {
obj = [[self myMethod] retain];
// Do something with obj
[obj release];
}
that can be optimised by skipping the autorelease pool completely:
- (id) myMethod {
id obj = [MYClass new];
return obj;
}
- (void) mainMethod {
obj = [self myMethod];
// Do something with obj
[obj release];
}
The way this optimisation is implemented is very interesting. I quote from Mike's post:
"There is some extremely fancy and mind-bending code in the Objective-C runtime's implementation of autorelease. Before actually sending an autorelease message, it first inspects the caller's code. If it sees that the caller is going to immediately call objc_retainAutoreleasedReturnValue, it completely skips the message send. It doesn't actually do an autorelease at all. Instead, it just stashes the object in a known location, which signals that it hasn't sent autorelease at all."
So far so good. The implementation for x86_64 on NSObject.mm is quite straightforward. The code analyses the assembler located after the return address of objc_autoreleaseReturnValue for the presence of a call to objc_retainAutoreleasedReturnValue.
static bool callerAcceptsFastAutorelease(const void * const ra0)
{
const uint8_t *ra1 = (const uint8_t *)ra0;
const uint16_t *ra2;
const uint32_t *ra4 = (const uint32_t *)ra1;
const void **sym;
//1. Navigate the DYLD stubs to get to the real pointer of the function to be called
// 48 89 c7 movq %rax,%rdi
// e8 callq symbol
if (*ra4 != 0xe8c78948) {
return false;
}
ra1 += (long)*(const int32_t *)(ra1 + 4) + 8l;
ra2 = (const uint16_t *)ra1;
// ff 25 jmpq *symbol#DYLDMAGIC(%rip)
if (*ra2 != 0x25ff) {
return false;
}
ra1 += 6l + (long)*(const int32_t *)(ra1 + 2);
sym = (const void **)ra1;
//2. Check that the code to be called belongs to objc_retainAutoreleasedReturnValue
if (*sym != objc_retainAutoreleasedReturnValue)
{
return false;
}
return true;
}
But when it comes to ARM, I just can't understand how it works. The code looks like this (I've simplified a little bit):
static bool callerAcceptsFastAutorelease(const void *ra)
{
// 07 70 a0 e1 mov r7, r7
if (*(uint32_t *)ra == 0xe1a07007) {
return true;
}
return false;
}
It looks like the code is identifying the presence of objc_retainAutoreleasedReturnValue not by looking up the presence of a call to that specific function, but by looking instead for a special no-op operation mov r7, r7.
Diving into LLVM source code I found the following explanation:
"The implementation of objc_autoreleaseReturnValue sniffs the instruction stream following its return address to decide whether it's a call to objc_retainAutoreleasedReturnValue. This can be prohibitively expensive, depending on the relocation model, and so on some targets it instead sniffs for a particular instruction sequence. This functions returns that instruction sequence in inline assembly, which will be empty if none is required."
I was wondering why is that so on ARM?
Having the compiler put there a certain marker so that a specific implementation of a library can find it sounds like a strong coupling between compiler and the library code. Why can't the "sniffing" be implemented the same way as on the x86_64 platform?
IIRC (been a while since I've written ARM assembly), ARM's addressing modes don't really allow for direct addressing across the full address space. The instructions used to do addressing -- loads, stores, etc... -- don't support direct access to the full address space as they are limited in bit width.
Thus, any kind of go to this arbitrary address and check for that value, then use that value to go look over there will be significantly slower on ARM as you have to use indirect addressing which involves math and... math eats CPU cycles.
By having a compiler emit a NO-OP instruction that can easily be checked, it eliminates the need for indirection through the DYLD stubs.
At least, I'm pretty sure that is what is going on. Two ways to know for sure; take the code for those two functions and compile it with -Os for x86_64 vs. ARM and see what the resulting instruction streams look like (i.e. both functions on each architecture) or wait until Greg Parker shows up to correct this answer.
Is there an easy way to create a new pthread each time a method is called?
I have a method activating in certain circumstances and it is the only way to commumicate with another program. I need to engage sleep and execute another method after said method is called, there is an option of another call during waiting - this is the reason i wanted to use threads.
I wanted to use standard:
pthread_create(&new_thread, NULL, threadbody() );
Put like this
std::vector<phtread_t> thread(20)
...
pthread_t new_thread;
int rc;
rc = pthread_create(&new_thread, NULL, threadbody() );
threads.push_back(new_thread);
But i either get the errors of bad using of (void *) functions.
argument of type ‘void* (App::)(void*)’ does not match ‘void* (*)(void*)
What am I doing wrong?
Your function is a non-static member of the class. This is a no-no since pthread_create is only meant to work in the C realm.
To offset this, you can have
class X
{
public:
static void* callback(void* p)
{
X* x = reinterpret_cast<X*>(p);
x->function();
return 0;
}
void function(void)
{
// do work here
}
};
X obj;
pthread_create(&new_thread, &obj, &X::callback);
Why am I deadlocking?
- (void)foo
{
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
[self foo];
});
// whatever...
}
I expect foo to be executed twice on first call.
Neither of the existing answers are quite accurate (one is dead wrong, the other is a bit misleading and misses some critical details). First, let's go right to the source:
void
dispatch_once_f(dispatch_once_t *val, void *ctxt, dispatch_function_t func)
{
struct _dispatch_once_waiter_s * volatile *vval =
(struct _dispatch_once_waiter_s**)val;
struct _dispatch_once_waiter_s dow = { NULL, 0 };
struct _dispatch_once_waiter_s *tail, *tmp;
_dispatch_thread_semaphore_t sema;
if (dispatch_atomic_cmpxchg(vval, NULL, &dow)) {
dispatch_atomic_acquire_barrier();
_dispatch_client_callout(ctxt, func);
dispatch_atomic_maximally_synchronizing_barrier();
//dispatch_atomic_release_barrier(); // assumed contained in above
tmp = dispatch_atomic_xchg(vval, DISPATCH_ONCE_DONE);
tail = &dow;
while (tail != tmp) {
while (!tmp->dow_next) {
_dispatch_hardware_pause();
}
sema = tmp->dow_sema;
tmp = (struct _dispatch_once_waiter_s*)tmp->dow_next;
_dispatch_thread_semaphore_signal(sema);
}
} else {
dow.dow_sema = _dispatch_get_thread_semaphore();
for (;;) {
tmp = *vval;
if (tmp == DISPATCH_ONCE_DONE) {
break;
}
dispatch_atomic_store_barrier();
if (dispatch_atomic_cmpxchg(vval, tmp, &dow)) {
dow.dow_next = tmp;
_dispatch_thread_semaphore_wait(dow.dow_sema);
}
}
_dispatch_put_thread_semaphore(dow.dow_sema);
}
}
So what really happens is, contrary to the other answers, the onceToken is changed from its initial state of NULL to point to an address on the stack of the first caller &dow (call this caller 1). This happens before the block is called. If more callers arrive before the block is completed, they get added to a linked list of waiters, the head of which is contained in onceToken until the block completes (call them callers 2..N). After being added to this list, callers 2..N wait on a semaphore for caller 1 to complete execution of the block, at which point caller 1 will walk the linked list signaling the semaphore once for each caller 2..N. At the beginning of that walk, onceToken is changed again to be DISPATCH_ONCE_DONE (which is conveniently defined to be a value that could never be a valid pointer, and therefore could never be the head of a linked list of blocked callers.) Changing it to DISPATCH_ONCE_DONE is what makes it cheap for subsequent callers (for the rest of the lifetime of the process) to check the completed state.
So in your case, what's happening is this:
The first time you call -foo, onceToken is nil (which is guaranteed by virtue of statics being guaranteed to be initialized to 0), and gets atomically changed to become the head of the linked list of waiters.
When you call -foo recursively from inside the block, your thread is considered to be "a second caller" and a waiter structure, which exists in this new, lower stack frame, is added to the list and then you go to wait on the semaphore.
The problem here is that this semaphore will never be signaled because in order for it to be signaled, your block would have to finish executing (in the higher stack frame), which now can't happen due to a deadlock.
So, in short, yes, you're deadlocked, and the practical takeaway here is, "don't try to call recursively into a dispatch_once block." But the problem is most definitely NOT "infinite recursion", and the flag is most definitely not only changed after the block completes execution -- changing it before the block executes is exactly how it knows to make callers 2..N wait for caller 1 to finish.
You could alter code a little, so that the calls are outside the block and there's no deadlock, something like this:
- (void)foo
{
static dispatch_once_t onceToken;
BOOL shouldRunTwice = NO;
dispatch_once(&onceToken, ^{
shouldRunTwice = YES;
});
if (shouldRunTwice) {
[self foo];
}
// whatever...
}
Ive spent last week trying to figure out this memory leak and I am desperate at this point. Id be glad for any help.
I have class Solver which creates instance of class PartialGraph in every iteration in method solve (performing Depth First Search). In every iteration PartialGraph should be copied to stack, and destroyed
Solver.h
class Solver {
public:
Solver(Graph pg);
PartialGraph solve(PartialGraph p, int bestest);
Graph pg;
stack<PartialGraph> stackk;
bool isSpanningTree(PartialGraph* p);
Solver(const Solver& orig);
~Solver();
Solver.cpp
Solver:: Solver(const Solver& orig){
this->pg=*new Graph(orig.pg);
}
Solver::Solver(Graph gpg) {
this->pg=gpg;
}
PartialGraph Solver::solve(PartialGraph init, int bestest){
int best=bestest;
int iterace=0;
PartialGraph bestGraph;
stackk.push(init);
while(stackk.size()!=0) {
PartialGraph m = stackk.top();
stackk.pop();
for(int i=m.rightestEdge+1;i<pg.edgeNumber;i++){
*******(line 53 )PartialGraph* pnew= m.addEdge(pg.edges[i]);
if(m.generatedNodes==pnew->generatedNodes){
pnew->~PartialGraph();
continue; }
if(isSpanningTree(pnew)){
if(best>pnew->maxDegree){
best=pnew->maxDegree;
bestGraph=*pnew;
}
if(pnew->maxDegree==2){
pnew->~PartialGraph();
return bestGraph;
}
pnew->~PartialGraph();
continue;
}
if(pnew->maxDegree==best){
pnew->~PartialGraph();
continue; }
stackk.push(*pnew);
*******(line 101 )pnew->~PartialGraph();
}
}
return bestGraph;
}
bool Solver::isSpanningTree(PartialGraph* p){
if(p->addedEdges!=this->pg.nodeNumber-1){return false;}
return p->generatedNodes==this->pg.nodeNumber;
}
Solver::~Solver(){
this->pg.~Graph();
};
PartialGraph looks like this, it has two arrays, both deleted in destructor. Every constructor and operator= allocates new memory for the arrays. (Class Edge holds three ints)
PartialGraph::PartialGraph(int nodeNumber,int edgeNumber) {
nodeCount=nodeNumber;
edgeCount=0;
nodes=new int[nodeCount];
edges=new Edge[0];
rightestEdge=-1;
generatedNodes=0;
addedEdges=0;
for(int i=0;i<nodeCount;i++){
this->nodes[i]=0;
}
maxDegree=0;
}
PartialGraph::PartialGraph(const PartialGraph& orig){
this->nodes=new int[orig.nodeCount];
edges=new Edge[orig.edgeCount];
this->nodeCount=orig.nodeCount;
this->rightestEdge=orig.rightestEdge;
this->edgeCount=orig.edgeCount;
this->maxDegree=orig.maxDegree;
this->addedEdges=orig.addedEdges;
this->generatedNodes=orig.generatedNodes;
for(int i=0;i<this->nodeCount;i++){
this->nodes[i]=orig.nodes[i];
}
for(int i=0;i<this->edgeCount;i++){
this->edges[i]=orig.edges[i];
}
}
PartialGraph::PartialGraph(){
}
PartialGraph::PartialGraph(const PartialGraph& orig, int i){
this->nodes=new int[orig.nodeCount];
edges=new Edge[orig.edgeCount+1];
this->nodeCount=orig.nodeCount;
this->rightestEdge=orig.rightestEdge;
this->edgeCount=orig.edgeCount;
this->maxDegree=orig.maxDegree;
this->addedEdges=orig.addedEdges;
this->generatedNodes=orig.generatedNodes;
for(int i=0;i<this->nodeCount;i++){
this->nodes[i]=orig.nodes[i];
}
for(int i=0;i<this->edgeCount;i++){
this->edges[i]=orig.edges[i];
}
}
PartialGraph &PartialGraph::operator =(const PartialGraph &orig){
nodes=new int[orig.nodeCount];
edges=new Edge[orig.edgeCount];
this->nodeCount=orig.nodeCount;
this->rightestEdge=orig.rightestEdge;
this->edgeCount=orig.edgeCount;
this->maxDegree=orig.maxDegree;
this->addedEdges=orig.addedEdges;
this->generatedNodes=orig.generatedNodes;
for(int i=0;i<this->nodeCount;i++){
this->nodes[i]=orig.nodes[i];
}
for(int i=0;i<this->edgeCount;i++){
this->edges[i]=orig.edges[i];
}
}
PartialGraph* PartialGraph::addEdge(Edge e){
PartialGraph* npg=new PartialGraph(*this, 1);
npg->edges[this->edgeCount]=e;
npg->addedEdges++;
npg->edgeCount++;
if(e.edgeNumber>npg->rightestEdge){npg->rightestEdge=e.edgeNumber;}
npg->nodes[e.node1]=npg->nodes[e.node1]+1;
npg->nodes[e.node2]=npg->nodes[e.node2]+1;
if(npg->nodes[e.node1]>npg->maxDegree){npg->maxDegree=npg->nodes[e.node1];}
if(npg->nodes[e.node2]>npg->maxDegree){npg->maxDegree=npg->nodes[e.node2];}
if(npg->nodes[e.node1]==1){npg->generatedNodes++;}
if(npg->nodes[e.node2]==1){npg->generatedNodes++;}
return npg;
}
PartialGraph:: ~PartialGraph() //destructor
{
delete [] nodes;
delete [] edges;
};
PartialGraph.h
class PartialGraph {
public:
PartialGraph(int nodeCount,int edgeCount);
PartialGraph* addEdge(Edge e);
PartialGraph(const PartialGraph& orig);
PartialGraph();
~PartialGraph();
static int counter;
PartialGraph(const PartialGraph& orig, int i);
void toString();
int nodeCount;
int edgeCount;
int generatedNodes;
int *nodes;
Edge *edges;
int maxDegree;
int rightestEdge;
int addedEdges;
PartialGraph &operator =(const PartialGraph &other); // Assn. operator
};
It runs fine, but when input data are too big, I get bad alloc. Valgrind says I am leaking on line 53 of PartialGraph.cpp, but Im almost sure all instances are destroyed at line 101, or earlier in the iteration.
(244,944 direct, 116 indirect) bytes in 5,103 blocks are definitely lost in
at 0x4C2AA37: operator new(unsigned long)
(in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x4039F6: PartialGraph::addEdge(Edge) (PartialGraph.cpp:107)
by 0x404197: Solver::solve(PartialGraph, int) (Solver.cpp:53)
by 0x4016BA: main (main.cpp:35)
LEAK SUMMARY:
definitely lost: 246,305 bytes in 5,136 blocks
indirectly lost: 1,364 bytes in 12 blocks
I have even made an instance counter and it seemed that I destroy all of the instances. As I said I am really desperate, and help would be welcome
The short answer: you should never be calling the destructor directly. Use delete instead, so everywhere where you have pnew->~PartialGraph();, you should have delete pnew;. In general, every new should have a corresponding delete somewhere. Just be careful, this rule has some trickiness to it, because multiple deletes may map to one new, and vice versa.
Bonus leaks that I found while looking at the code:
The first line of executable code in your post: this->pg=*new Graph(orig.pg);. Another general rule: if you have code that does *new Foo(...), you're probably creating a leak (and probably creating unnecessary work for yourself). In this case, this->pg = orig.pg should work fine. You're current code copies orig.pg into a newly allocated object, and then copies the newly created object into this->pg, which results in two copy operations. this->pg = orig.pg is just one copy.
The first couple of lines PartialGraph::operator=(). Copy constructors, assignment operators, and destructors can be difficult to get right. In all of your constructors, you new nodes and edges, and you have matching deletes in the destructor, so that's ok. But when you do the assignment operator, you overwrite the pointers to the existing arrays but don't delete them. You need to delete the existing arrays before creating new ones.
Lastly, yikes. I know it can be a pain to format your code for StackOverflow, but trying to read the code in Solver::solve() is beyond painful because the indentation doesn't match the code structure. When I looked at this post, there were 23 views and no responses. That's 23 people that were interested in solving your problem, but were probably put off by the formatting. If you spent an extra 23 minutes formatting the code, and it saved each of those people more than one minute, you would have saved the universe some time (besides probably getting your answer faster).
here is a simple question. Can you help me find the memory leak in this Vala code ?
If it helps I can post the generated c code too ^^
using GLib;
using Gtk;
using Gee;
void test1 ()
{
Gee.ArrayList<Gdk.Pixbuf> list = new Gee.ArrayList<Gdk.Pixbuf>();
for( int a = 0; a < 10000; a++)
{
string path = "/usr/share/icons/gnome/48x48/stock/data/stock_lock.png";
list.add( new Gdk.Pixbuf.from_file( path ) );
}
list.clear();
// when the function returns it *should* free all alocated memory, or am I missing something?
}
int main (string[] args)
{
Gtk.init( ref args);
// the memory usage here is ~3mb
test1();
// here it is ~94mb
Gtk.main();
return 0;
}
I've reproduced this on the latest versions of Vala (0.10.1 and 0.11.1). I've looked over the .c code valac generates and don't see an obvious problem there, although it's obvious the pixbufs are leaking (valgrind confirms this). I reported it as a bug here:
https://bugzilla.gnome.org/show_bug.cgi?id=633869
Update: The bug is closed. Investigation shows there is no memory leak per se, but (most likely) that the memory is being allocated and held when it's freed by a suballocator or some-such. As Evan pointed out, if you call the test function in a loop, the total memory size never exceeds 90MB, indicating it's not a memory leak.