I have a C++-object obj which I want to access within a block:
MyCppClass obj;
void(^myBlock)() = ^{
obj.test();
};
The problem here is that obj gets copied into the block, but I want to use a reference to the original object. I could use a pointer instead:
MyCppClass obj;
MyCppClass *objP = &obj;
void(^myBlock)() = ^{
objP->test();
};
But I think dereferencing pointers takes more time than using references. (Performance is critical, because in my project such a block is called for each pixel of a large image.)
How can I access the object within the block?
Dereferencing a pointer is faster than a function call or a C++ method call,which involves several pointer dereferences to look up the method in the class, and then an actual function call, which actually allocates memory on the stack, saves away some registers to RAM, and then does the reverse when it returns. So don't worry about performance here. If the profiler shows you that a single pointer de-reference causes performance issues, you have bigger problems, and you should probably drop down to hand-crafted assembler, and forget using C++ classes or blocks. Also, it's a very rare thing.
Use a pointer, or a C++ reference (the latter is the same under the hood, but it's harder to shoot yourself in the foot with it, and code stays more readable). Just keep in mind that this is only a solution if you are going to use the block in the same function as (or a function called by) the current function.
Because the C++ object is on the stack, it will go away, so if whoever you hand the block to copies it and calls it later, the pointer will be invalid. In the best case you will crash. In the worst case, the spot on the RAM chip that was used for that object is already in use for something else again, and you'll overwrite some other bit of memory, and cause a seemingly unrelated crash.
So, if you want to keep the block around, you'll have to do 'objP = new MyCppClass' to create the object, and later do 'delete objP' (maybe in the last call to the block, if you can detect that and don't need the object after that) to get rid of it.
Related
It is hard to find a good heading for this, but i think my problem comes clear if i post a small code snipped:
SomeObject instance = SomeObject(importantParameter);
// -> "instance" is now a reference to the instance somewhere in RAM
instance = SomeObject(anotherImportantParameter);
// -> "instance" is now a reference to a different instance somewhere in RAM
My question is now, is the used RAM that was allocated at the first construction reused at the second construction? Or is the RAM of the first instance marked as unused for the garbage collector and the second construction is done with a completely new instance with a different portion of RAM?
If the first is true, what with this:
while(true) {
final SomeObject instance = SomeObject(importantParameter);
}
Will then, each time the while is repeated, the RAM be reused?
It's unspecified. The answer is a resounding "maybe".
The language specification never says what happens to unreachable objects, since it's unobservable to the program. (That's what being unreachable means).
In practice, the native Dart implementation uses a generational garbage collector.
The default behavior would be to allocate a new object in "new-space" and overwrite the reference to the previous object. That makes the previous object unreachable (as long as you haven't store other references to it), and it can therefore be garbage collected. If you really go through objects quickly, that will be cheap, since the unreachable object is completely ignored on the next new-space garbage collection.
Allocating a lot of short-lived objects still has an overhead since it causes new-space GC to happen more often, even if the individual objects don't themselves cost anything.
There is also a number of optimization that may change this behavior.
If your object is sufficiently simple and the compiler can see that no reference to it ever escapes, or is used in an identical check, or ... any other number of relevant restrictions, then it might "allocation sink" the object. That means it never actually allocates the object, it just stores the contents somewhere, perhaps even on the stack, and it also inlines the methods so they refer to the data directly instead of going through a this pointer.
In that case, your code may actually reuse the memory of the previous object, because the compiler recognizes that it can.
Do not try to predict whether an optimization like this happens. The requirements can change at any time. Just write code that is correct and not unnecessarily complex, then the compiler will do its best to optimize in all the ways that it can.
Are there any conditions in Objective-C (Objective-C++) where the compiler can detect that a variable capture in a block is never used and thus decide to not capture the variable in the first place?
For example, assume you have an NSArray that contains a large number of items which might take a long time to deallocate. You need to access the NSArray on the main thread, but once you're done with it, you're willing to deallocate it on a background queue. The background block only needs to capture the array and then immediately deallocate. It doesn't actually have to do anything with it. Can the compiler detect this and, "erroneously", skip the block capture altogether?
Example:
// On the main thread...
NSArray *outgoingRecords = self.records;
self.records = incomingRecords;
dispatch_async(background_queue, ^{
(void)outgoingRecords;
// After this do-nothing block exits, then outgoingRecords
// should be deallocated on this background_queue.
});
Am I guaranteed that outgoingRecords will always be captured in that block and that it will always be deallocated on the background_queue?
Edit #1
I'll add a bit more context to better illustrate my issue:
I have an Objective-C++ class that contains a very large std::vector of immutable records. This could easily be 1+ million records. They are basic structs in a vector and accessed on the main thread to populate a table view. On a background thread, a different set of database records might be read into a separate vector, which could also be quite large.
Once the background read has occurred, I jump over to the main thread to swap Objective-C objects and repopulate the table.
At that point, I don't care at all about the contents of the older vector or its parent Objective-C class. There's no fancy destructors or object-graph to teardown, but deallocating hundreds of megabytes, maybe even gigabytes of memory is not instantaneous. So I'm willing to punt it off to a background_queue and have the memory deallocation occur there. In my tests, that appears to work fine and gives me a little bit more time on the main thread to do other stuff before 16ms elapses.
I'm trying to understand if I can get away with simply capturing the object in an "empty" block or if I should do some sort of no-op operation (like call count) so that the compiler cannot optimize it away somehow.
Edit #2
(I originally tried to keep the question as simple as possible, but it seems like it's more nuanced then that. Based on Ken's answer below, I'll add another scenario.)
Here's another scenario that doesn't use dispatch_queues but still uses blocks, which is the part I'm really interested in.
id<MTLCommandBuffer> commandBuffer = ...
// A custom class that manages an MTLTexture that is backed by an IOSurface.
__block MyTextureWrapper *wrapper = ...
// Issue some Metal calls that use the texture inside the wrapper.
// Wait for the buffer to complete, then release the wrapper.
[commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> cb) {
wrapper = nil;
}];
In this scenario, the order of execution is guaranteed by Metal. Unlike the example above, in this scenario performance is not the issue. Rather, the IOSurface that is backing the MTLTexture is being recycled into a CVPixelBufferPool. The IOSurface is being shared between processes and, from what I can tell, MTLTexture does not appear to increase the useCount on the surface. My wrapper class does. When my wrapper class is deallocated, the useCount is decremented and the bufferPool is then free to recycling the IOSurface.
This is all working as expected but I end up with silly code like above just out of uncertainty whether I need to "use" the wrapper instance in the block to ensure it's captured or not. If the wrapper is deallocated before the completion handler runs, then the IOSurface will be recycled and the texture will get overwritten.
Edit to address question edits:
From the Clang Language Specification for Blocks:
Local automatic (stack) variables referenced within the compound
statement of a Block are imported and captured by the Block as const
copies. The capture (binding) is performed at the time of the Block
literal expression evaluation.
The compiler is not required to capture a variable if it can prove
that no references to the variable will actually be evaluated.
Programmers can force a variable to be captured by referencing it in a
statement at the beginning of the Block, like so:
(void) foo;
This matters when capturing the variable has side-effects, as it can
in Objective-C or C++.
(Emphasis added.)
Note that using this technique guarantees that the referenced object lives at least as long as the block, but does not guarantee it will be released with the block, nor by which thread.
There's no guarantee that the block submitted to the background queue will be the last code to hold a strong reference to the array (even ignoring the question of whether the block captures the variable).
First, the block may in fact run before the context which submitted it returns and releases its strong reference. That is, the code which called dispatch_async() could be swapped off the CPU and the block could run first.
But even if the block runs somewhat later than that, a reference to the array may be in an autorelease pool somewhere and not released for some time. Or there may be a strong reference someplace else that will eventually be cleared but not under you explicit control.
the code below will crash because of EXC_BAD_ACCESS
typedef void(^myBlock)(void);
- (void)viewDidLoad {
[super viewDidLoad];
NSArray *tmp = [self getBlockArray];
myBlock block = tmp[0];
block();
}
- (id)getBlockArray {
int val = 10;
//crash version
return [[NSArray alloc] initWithObjects:
^{NSLog(#"blk0:%d", val);},
^{NSLog(#"blk1:%d", val);}, nil];
//won't crash version
// return #[^{NSLog(#"block0: %d", val);}, ^{NSLog(#"block1: %d", val);}];
}
the code runs in iOS 9 with ARC enabled. And I was trying to figure out the reason that lead to crash.
by po tmp in lldb I found
(lldb) po tmp
<__NSArrayI 0x7fa0f1546330>(
<__NSMallocBlock__: 0x7fa0f15a0fd0>,
<__NSStackBlock__: 0x7fff524e2b60>
)
whereas in the won't crash version
(lldb) po tmp
<__NSArrayI 0x7f9db481e6a0>(
<__NSMallocBlock__: 0x7f9db27e09a0>,
<__NSMallocBlock__: 0x7f9db2718f50>
)
So the most possible reason I could come up with is when ARC release the NSStackBlock the crash happen. But why would so?
First, you need to understand that if you want to store a block past the scope where it's declared, you need to copy it and store the copy instead.
The reason for this because of an optimization where blocks which capture variables are initially located on the stack, rather than dynamically allocated like a regular object. (Let's ignore blocks which don't capture variables for the moment, since they can be implemented as a global instance.) So when you write a block literal, like foo = ^{ ...};, that's effectively like assigning to foo a pointer to a hidden local variable declared in that same scope, something like some_block_object_t hiddenVariable; foo = &hiddenVariable; This optimization reduces the number of object allocations in the many cases where a block is used synchronously and never outlives the scope where it was created.
Like a pointer to a local variable, if you bring the pointer outside the scope of the thing it pointed to, you have a dangling pointer, and dereferencing it leads to undefined behavior. Performing a copy on a block moves a stack to the heap if necessary, where it is memory-managed like all other Objective-C objects, and returns a pointer to the heap copy (and if the block is already a heap block or global block, it simply returns the same pointer).
Whether the particular compiler uses this optimization or not in a particular circumstance is an implementation detail, but you cannot assume anything about how it's implemented, so you must always copy if you store a block pointer in a place that will outlive the current scope (e.g. in a instance or global variable, or in a data structure that may outlive the scope). Even if you knew how it was implemented, and know that in a particular case copying is not necessary (e.g. it is a block that doesn't capture variables, or copying must already have been done), you should not rely on that, and you should still always copy when you store it in a place that will outlive the current scope, as good practice.
Passing a block as an argument to a function or method is somewhat complicated. If you pass a block pointer as an argument to a function parameter whose declared compile-time type is a block-pointer type, then that function would in turn be responsible for copying it if it were to outlive its scope. So in this case, you wouldn't need to worry about copying it, without needing to know what the function did.
If, on the other hand, you pass a block pointer as an argument to a function parameter whose declared compile-time type is a non-block object pointer type, then that function wouldn't be taking responsibility for any block copying, because for all it knows it's just a regular object, that just needs to be retained if stored in a place that outlives the current scope. In this case, if you think that the function may possibly store the value beyond the end of the call, you should copy the block before passing it, and pass the copy instead.
By the way, this is also true for any other case where a block-pointer type is assigned or converted to a regular object-pointer type; the block should be copied and the copy assigned, because anyone who gets the regular object-pointer value wouldn't be expected to do any block copying considerations.
ARC complicates the situation somewhat. The ARC specification specifies some situations where blocks are implicitly copied. For example, when storing to a variable of compile-time block-pointer type (or any other place where ARC requires a retain on a value of compile-time block-pointer type), ARC requires that the incoming value be copied instead of retained, so the programmer doesn't have to worry about explicitly copying blocks in those cases.
With the exception of retains done as part of initializing a
__strong parameter variable or reading a __weak variable, whenever
these semantics call for retaining a value of block-pointer type, it
has the effect of a Block_copy.
However, as an exception, the ARC specification does not guarantee that blocks only passed as arguments are copied.
The optimizer may remove such copies when it sees that the result is
used only as an argument to a call.
So whether to explicitly copy blocks passed as arguments to a function is still something the programmer has to consider.
Now, the ARC implementation in recent versions of Apple's Clang compiler has an undocumented feature where it will add implicit block copies to some of the places where blocks are passed as arguments, even though the ARC specification doesn't require it. ("undocumented" because I cannot find any Clang documentation to this effect.) In particular, it appears that it defensively always adds implicit copies when passing an expression of block-pointer type to a parameter of non-block object pointer type. In fact, as demonstrated by CRD, it also adds an implicit copy when converting from a block-pointer type to a regular object-pointer type, so this is the more general behavior (since it includes the argument passing case).
However, it appears that the current version of the Clang compiler does not add implicit copies when passing a value of block-pointer type as varargs. C varargs are not type-safe, and it is impossible for the caller to know what types the function expects. Arguably, if Apple wants to error on the side of safety, since there's no way of knowing what the function expects, they should add implicit copies always in this case too. However, since this whole thing is an undocumented feature anyway, I wouldn't say it's a bug. In my opinion, then programmer should never rely on blocks that are only passed as arguments being implicitly copied in the first place.
Short Answer:
You have found a compiler bug, possibly a re-introduced one, and you should report it at http://bugreport.apple.com.
Longer Answer:
This wasn't always a bug, it used to be a feature ;-) When Apple first introduced blocks they also introduced an optimisation in how they implemented them; however unlike normal compiler optimisations which are essentially transparent to the code they required programmers to sprinkle calls to a special function, block_copy(), in various places to make the optimisation work.
Over the years Apple removed the need for this, but only for programmers using ARC (though they could have done so for MRC users as well), and today the optimisation should be just that and programmers should no longer need to help the compiler along.
But you've just found a case where the compiler gets it wrong.
Technically you have a case a type loss, in this case where something known to be a block is passed as id - reducing the known type information, and in particular type loss involving the second or subsequent argument in a variable argument list. When you look at your array with po tmp you see the first value is correct, the compiler gets that one right despite there being type loss, but it fails on the next argument.
The literal syntax for an array does not rely on variadic functions and the code produced is correct. However initWithObjects: does, and it goes wrong.
Workaround:
If you add a cast to id to the second (and any subsequent) blocks then the compiler produces the correct code:
return [[NSArray alloc] initWithObjects:
^{NSLog(#"blk0:%d", val);},
(id)^{NSLog(#"blk1:%d", val);},
nil];
This appears to be sufficient to wake the compiler up.
HTH
The question here is more of an educational one. I began to think of this an hour ago while
flipping around a lego block (silly, I know).
A block is an object created on stack, from what I understand.
Let say A is an object, which means we can do:
[A message];
Based on that, if a block is an object, we could also do:
[block message];
Am I correct?
And when the runtime sees that, it would call:
objc_msgSend(block, #selector(message), nil);
So my question is, how can we send a block a message?
And if that is possible, I would imagine that it would also be possible to send a block a message with the arguments being blocks as well?
And, if we could call a block by doing:
block();
Does that mean we could even make a block as a message (SEL) as well, because blocks have the signature void (^)(void) which resembles that of a method?
Because if it would be possible, then the following would really surprise me:
objc_msgSend(block, #selector(block), block);
or:
objc_msgSend(block1, #selector(block2), block3);
I hope my imagination is not running a bit wild and that my understanding is not off here (correct me, if it is).
Blocks are objects only for the purposes of storage and referencing. By making them objects, blocks can be retain/release'd and can, therefore, be shoved into arrays or other collection classes. They also respond to copy.
That's about it. Even that a block starts on the stack is largely a compiler implementation detail.
When a block's code is invoked, that is not done through objc_msgSend(). If you were to read the source for the block runtime and the llvm compiler, then you'd find that:
a block is really a C structure that contains a description of the data that has been captured (so it can be cleaned up) and a pointer to a function -- to a chunk of code -- that is the executable portion of the block
the block function is a standard C function where the first argument must always be a reference to the block. The rest of the argument list is arbitrary and works just like any old C function or Objective-C method.
So, your manual calls to objc_msgSend() treat the block like any other random ObjC object and, thus, won't call the code in the block, nor, if it did (and there is SPI that can do this from a method... but, don't use it) could it pass an argument list that was fully controllable.
One aside, though of relevance.
imp_implementationWithBlock() takes a block reference and returns an IMP that can be plugged into an Objective-C class (see class_addMethod()) such that when the method is invoked, the block is called.
The implementation of imp_implementationWithBlock() takes advantage of the layout of the call site of an objective-c method vs. a block. A block is always:
blockFunc(blockRef, ...)
And an ObjC method is always:
methodFunc(selfRef, SEL, ...)
Because we want the imp_implementationWithBlock() block to always take the target object to the method as the first block parameter (i.e. the self), imp_implementationWithBlock() returns a trampoline function that when called (via objc_msgSend()):
- slides the self reference into the slot for the selector (i.e. arg 0 -> arg 1)
- finds the implementing block puts the pointer to that block into arg 0
- JMPs to the block's implementation pointer
The finds the implementing block bit is kinda interesting, but irrelevant to this question (hell, the imp_implementationWithBlock() is a bit irrelevant, too, but may be of interest).
Thanks for response. It's definitely an eye opener. The part about
blocks calling is not done thru objc_msgSend() tells me that it is
because blocks are not part of the normal object-hierachry (but coda's
mentioning of NSBlock seems to refute what I understand so far,
because NSBlock would make it part of the object-hierachy). Feel free
to take a stab at me, if my understanding is still off so far. I am
very interested in hearing more about the followings 1: the SPI and
the way (how) to call that method. 2: the underlying mechanisms of:
sliding the self reference into the slot. 3: finds the implementing
block and puts the pointer to that block into arg 0. If you have time
to share and write a bit more about those in detail, I am all ears; I
find this all very fascinating. Thanks very much in advance.
The blocks, themselves, are very much just a standard Objective-C object. A block instance contains a pointer to some executable code, any captured state, and some helpers used to copy said state from stack to heap (if requested) and cleanup the state on the block's destruction.
The block's executable code is not invoked like a method. A block has other methods -- retain, release, copy, etc... -- that can be invoked directly like any other method, but the executable code is not publicly one of those methods.
The SPI doesn't do anything special; it only works for blocks that take no arguments and it is nothing more than simply doing block().
If you want to know how the whole argument slide thing works (and how it enables tail calling through to the block), I'd suggest reading this or this. As well, the source for the blocks runtime, the objc runtime, and llvm are all available.
That includes the fun bit where the IMP grabs the block and shoves it into arg0.
Yes, blocks are objects. And yes, that means you can send messages to them.
But what message do you think a block responds to? We are not told of any messages that a block supports, other than the memory management messages retain, release, and copy. So if you send an arbitrary message to a block, chances are that it will throw a "does not recognize selector" exception (the same thing that would happen if you sent an arbitrary message to any object you don't know the interface of).
Invoking a block happens through a different mechanism than message sending, and is magic implemented by the compiler, and not exposed to the programmer otherwise.
Blocks and selectors are very different. (A selector is just an interned string, the string of the method name.) Blocks and IMPs (functions implementing methods) are somewhat similar. However, they are still different in that methods receive the receiver (self) and the selector called as special parameters, whereas blocks do not have them (the function implementing the block only receives the block itself as a hidden parameter, not accessible to the programmer).
~ Will ARC always release an object the line after the last strong pointer is removed? Or is it undetermined and at some unspecified point in the future it will be released? Similarly, assuming that you don't change anything with your program, will ARC always be the same each time you run and compile your program?
~ How do you deal with handing an object off to other classes? For example, suppose we are creating a Cake object in a Bakery class. This process would probably take a long time and involve many different methods, so it may be reasonable for us to put the cake in a strong property. Now suppose we want to hand this cake object off to a customer. The customer would also probably want to have a strong pointer to it. Is this ok? Having two classes with strong pointers to the same object? Or should we nil out the Bakery's pointer as soon as we hand off?
Your code should be structured so the answer to this doesn't matter - if you want to use an object, keep a pointer to it, don't rely on ARC side effects to keep it around :) And these side effects might change with different compilers.
Two strong pointers is absolutely fine. ARC will only release the object when both pointers are pointing to something else (or nothing!)
ARC will implement the proper retains and releases at compile time. It will not behave any different than if you put them in there yourself so it will always do the same compilation and to answer your question should always behave the same. But that said it does not mean that your object will always be released immediately after the pointer is removed. Because you never call dealloc directly in any form of objective C you are only telling it that there is no reference count and that it is safe to release. This usually means that it will be released right away though.
If you pass an object from one class to another and the receiving class has a strong property associated with it and the class that passes it off eventually nils its pointer it will still have a reference count of at least 1 and will be fine.
Ok, first this answer might helpt you also a little bit: ARC equivalent of autorelease?
Generally after the last strong variable is nilled, the object is released immediately. If you store it in a property, you can nil the property, assign it to something like __strong Foo *temp = self.bar; before you nil, and return that local __strong variable (although arc normally detects the return, and inferes the __strong byitself).
Some more details on that: Handling Pointer-to-Pointer Ownership Issues in ARC
DeanWombourne's answer is correct; but to add to (1).
In particular, the compiler may significantly re-order statements as a part of optimization. While method calls will always occur in the order written in code (because any method call may have side effects), any atomic expression may be re-ordered by the compiler as long as that re-order doesn't impact behavior. Same thing goes for local variable re-use, etc...
Thus, the ARC compiler will guarantee that a pointer is valid for as long as it is needed, no more. But there is no guarantee when the pointed to object might be released other than that it isn't going to happen beyond the scope of declaration. There is also no guarantee that object A is released before B simply because A is declared and last used before B.
IN other words, as long as you write your code without relying on side effects and race conditions, it should all just work.
Please keep you code proper as it has diffrent behaviour on diffrent complier.