Best way to "intercept" result of METAL vertex/fragment shader - metal

I currently have a MTLTexture for input and am rendering that piece-wise using a set of 20-30 vertices. This is currently done at the tail end of my drawRect handler of an MTKView:
[encoder setVertexBuffer:mBuff offset:0 atIndex:0]; // buffer of vertices
[encoder setVertexBytes:&_viewportSize length:sizeof(_viewportSize) atIndex:1];
[encoder setFragmentTexture:inputTexture atIndex:0];
[encoder drawPrimitives:MTLPrimitiveTypeTriangle vertexStart:0 vertexCount:_vertexInfo.metalVertexCount];
[encoder endEncoding];
[commandBuffer presentDrawable:self.currentDrawable];
[commandBuffer commit];
However, before doing the final presentDrawable I would like intercept the resulting texture (I'm going to send a region of it off to a separate MTKView). In other words, I need access to some manner of an output MTLTexture after the drawPrimitives call.
What is the most efficient way to do this?
One idea is to introduce an additional drawPrimitives render to an intermediate output MTLTexture instead. I'm not sure how to do this, but I'd scoop that output texture in the process. I suspect that this would even be done elsewhere (ie. off-screen).
Then I'd issue a second drawPrimitives using a single massive textured quad with that outputTexture and then a presentDrawable on it. That code would exist where my previous code was.
There may be a simple method in the Metal API (that I'm missing) that will allow me to capture an output texture of drawPrimitives.
I have looked into using an MTLBlitCommandEncoder but there are some issues around that on certain MacOSX hardware.
UPDATE#1:
idoogy, here is the code you were requesting:
Here is where I create the initial "brightness output" texture... we're mid-flight in a vertex shader to do so:
...
[encoder setFragmentTexture:brightnessOutput atIndex:0];
[encoder drawPrimitives:MTLPrimitiveTypeTriangle vertexStart:0 vertexCount:_vertexInfo.metalVertexCount];
[encoder endEncoding];
for (AltMonitorMTKView *v in self.downstreamOutputs). // ancillary MTKViews
[v setInputTexture:brightnessOutput];
__block dispatch_semaphore_t block_sema = d.hostedAssetsSemaphore;
[commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer) {
dispatch_semaphore_signal(block_sema);
}];
[commandBuffer presentDrawable:self.currentDrawable];
[commandBuffer commit];
Below, we're in the ancillary view's drawRect handler with inputTexture as the texture that's being transferred, displaying a subregion of it. I should mention that this MTKView is configured to be drawn as a result of a setNeedsDisplay rather than as one with an internal timer:
id<MTLRenderCommandEncoder> encoder = [commandBuffer renderCommandEncoderWithDescriptor:renderPassDescriptor];
encoder.label = #"Vertex Render Encoder";
[encoder setRenderPipelineState:metalVertexPipelineState];
// draw main content
NSUInteger vSize = _vertexInfo.metalVertexCount*sizeof(AAPLVertex);
id<MTLBuffer> mBuff = [self.device newBufferWithBytes:_vertexInfo.metalVertices
length:vSize
options:MTLResourceStorageModeShared];
[encoder setVertexBuffer:mBuff offset:0 atIndex:0];
[encoder setVertexBytes:&_viewportSize length:sizeof(_viewportSize) atIndex:1];
[encoder setFragmentTexture:self.inputTexture atIndex:0];
[encoder drawPrimitives:MTLPrimitiveTypeTriangle vertexStart:0 vertexCount:_vertexInfo.metalVertexCount];
[encoder endEncoding];
[commandBuffer presentDrawable:self.currentDrawable];
[commandBuffer commit];
The above code seems to work fine. Having said that, I think we're telling a different story in the Xcode debugger. It's pretty obvious that I'm wasting huge swaths of time doing things this way... That long command buffer is the ancillary monitor view doing a LOT of waiting...

This should be doable. Before you call commit on your commandBuffer, add a completion handler for the command buffer by calling [commandBuffer addCompletedHandler:], then from within the completion handler, grab the color attachments from your renderPassDescriptor.
renderPassDescriptor holds the current set of attachments being drawn to, and is configured automatically by MTKView. The actual textures rotate per frame because MTKView is using triple-buffering to ensure continuous utilization of the GPU, but as long as you're within your completion handler, that particular attachment won't be released for use in a future frame, so you can safely read from it, copy it, etc.
NOTE: Make sure to complete your handler reasonably quickly or your frame rate will drop (because MTKView will quickly run out of render targets and will just sit there until they are released).
Here's a generic code snippet to get you started:
// Grab the current render pass descriptor from MTKView so it's accessible from within the completion block:
__block MTLRenderPassDescriptor *renderPassDescriptor = self.renderPassDescriptor;
[commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> commandBuffer) {
// This will be called once the GPU has completed rendering your frame.
// This is your output texture:
id <MTLTexture> outputTexture = renderPassDescriptor.colorAttachments[0].texture;
}];
[commandBuffer commit];

Related

Using the GPU on iOS for Overlaying one image on another Image (Video Frame)

I am working on some image processing in my app. Taking live video and adding an image onto of it to use it as an overlay. Unfortunately this is taking massive amounts of CPU to do which is causing other parts of the program to slow down and not work as intended. Essentially I want to make the following code use the GPU instead of the CPU.
- (UIImage *)processUsingCoreImage:(CVPixelBufferRef)input {
CIImage *inputCIImage = [CIImage imageWithCVPixelBuffer:input];
// Use Core Graphics for this
UIImage * ghostImage = [self createPaddedGhostImageWithSize:CGSizeMake(1280, 720)];//[UIImage imageNamed:#"myImage"];
CIImage * ghostCIImage = [[CIImage alloc] initWithImage:ghostImage];
CIFilter * blendFilter = [CIFilter filterWithName:#"CISourceAtopCompositing"];
[blendFilter setValue:ghostCIImage forKeyPath:#"inputImage"];
[blendFilter setValue:inputCIImage forKeyPath:#"inputBackgroundImage"];
CIImage * blendOutput = [blendFilter outputImage];
EAGLContext *myEAGLContext = [[EAGLContext alloc] initWithAPI:kEAGLRenderingAPIOpenGLES2];
NSDictionary *contextOptions = #{ kCIContextWorkingColorSpace : [NSNull null] ,[NSNumber numberWithBool:NO]:kCIContextUseSoftwareRenderer};
CIContext *context = [CIContext contextWithEAGLContext:myEAGLContext options:contextOptions];
CGImageRef outputCGImage = [context createCGImage:blendOutput fromRect:[blendOutput extent]];
UIImage * outputImage = [UIImage imageWithCGImage:outputCGImage];
CGImageRelease(outputCGImage);
return outputImage;}
Suggestions in order:
do you really need to composite the two images? Is an AVCaptureVideoPreviewLayer with a UIImageView on top insufficient? You'd then just apply the current ghost transform to the image view (or its layer) and let the compositor glue the two together, for which it will use the GPU.
if not then first port of call should be CoreImage — it wraps up GPU image operations into a relatively easy Swift/Objective-C package. There is a simple composition filter so all you need to do is make the two things into CIImages and use -imageByApplyingTransform: to adjust the ghost.
failing both of those, then you're looking at an OpenGL solution. You specifically want to use CVOpenGLESTextureCache to push core video frames to the GPU, and the ghost will simply permanently live there. Start from the GLCameraRipple sample as to that stuff, then look into GLKBaseEffect to save yourself from needing to know GLSL if you don't already. All you should need to do is package up some vertices and make a drawing call.
The biggest performance issue is that each frame you create EAGLContext and CIContext. This needs to be done only once outside of your processUsingCoreImage method.
Also if you want to avoid the CPU-GPU roundtrip, instead of creating a Core Graphics image (createCGImage ) thus Cpu processing you can render directly in EaglLayer like this :
[context drawImage:blendOutput inRect: fromRect: ];
[myEaglContext presentRenderBuffer:G:_RENDERBUFFER];

Is it safe to call [EAGLContext presentRenderBuffer:] on a secondary thread?

I have (multiple) UIViews with layers of type CAEAGLLayer, and am able to call [EAGLContext presentRenderBuffer:] on renderbuffers attached to these layers, on a secondary thread, without any kind of graphical glitches.
I would have expected to see at least some tearing, since other UI with which these UIViews are composited is updated on the main thread.
Does CAEAGLLayer (I have kEAGLDrawablePropertyRetainedBacking set to NO) do some double-buffering behind the scenes?
I just want to understand why it is that this works...
Example:
BView is a UIView subclass that owns a framebuffer with renderbuffer storage assigned to its OpenGLES layer, in a shared EAGLContext:
#implementation BView
-(id) initWithFrame:(CGRect)frame context:(EAGLContext*)context
{
self = [super initWithFrame:frame];
// Configure layer
CAEAGLLayer* eaglLayer = (CAEAGLLayer*)self.layer;
eaglLayer.opaque = YES;
eaglLayer.drawableProperties = #{ kEAGLDrawablePropertyRetainedBacking : [NSNumber numberWithBool:NO], kEAGLDrawablePropertyColorFormat : kEAGLColorFormatSRGBA8 };
// Create framebuffer with renderbuffer attached to layer
[EAGLContext setCurrentContext:context];
glGenFramebuffers( 1, &FrameBuffer );
glBindFramebuffer( GL_FRAMEBUFFER, FrameBuffer );
glGenRenderbuffers( 1, &RenderBuffer );
glBindRenderbuffer( GL_RENDERBUFFER, RenderBuffer );
[context renderbufferStorage:GL_RENDERBUFFER fromDrawable:(id<EAGLDrawable>)self.layer];
glFramebufferRenderbuffer( GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, RenderBuffer );
return self;
}
+(Class) layerClass
{
return [CAEAGLLayer class];
}`
A UIViewController adds a BView instance on the main thread at init time:
BView* view = [[BView alloc] initWithFrame:(CGRect){ 0.0, 0.0, 75.0, 75.0 } context:Context];
[self.view addSubView:view];
On a secondary thread, render to the framebuffer in the BView and present it; in this case it's in a callback from a video AVCaptureDevice, called regularly:
-(void) captureOutput:(AVCaptureOutput*)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection*)connection
{
[EAGLContext setCurrentContext:bPipe->Context.GlContext];
// Render into framebuffer ...
// Present renderbuffer
glBindRenderbuffer( GL_RENDERBUFFER, BViewsRenderBuffer );
[Context presentRenderbuffer:GL_RENDERBUFFER];
}
It used to not work. There used to be several issues with updating the view if the buffer was presented on any but the main thread. It seems this has been working for some time now but it is on your own risk to implement it as you do. Later versions may have issues with it as well as some older probably still do (not that you need to support some old OS versions anyway).
Apple was always a bit closed as to how things work internally but we may guess quite a few things. Since iOS seems to be the only platform that uses your main buffer as a FBO (frame buffer object) I would expect the main frame buffer is inaccessible for development and your main FBO is actually redrawn to the main frame buffer when you present the render buffer. The last time I checked the method to present the render buffer will block your current thread and seems to be limited to the screen refresh rate (60FPS in most cases) which implies there is still some locking mechanism. Some additional test should be done but I would expect there is some sort of a pool of buffers which need to be redrawn to the main buffer where in the pool only one unique buffer id can be present at the time or the calling thread is blocked. This would result in the first call to the present render buffer would not be blocked at all but each sequential would be if the previous buffer has not yet been redrawn.
If this is true then yes, a double buffering is mandatory at some point since you may immediately continue drawing to your buffer. Since the render buffer has the same id over the frames it may not be swapped (for what I know) but it could be redrawn/copied to another buffer (most likely a texture) which can be done on the fly at any given time. In this procedure then when you first present the buffer you will copy the buffer to the texture which will be locked. When the screen refreshes the texture will be collected and unlocked. So if this texture is locked your presentation call will block the thread, otherwise it will continue smoothly. It is hard to say this is double buffering then. It has 2 buffers but it still works with a locking mechanism.
I do hope you may then understand why it works. It is pretty much the same procedure you would use when loading large data structures on the separate shared context which runs on a separate thread.
Still most of this is just guessing unfortunately.

GPUImage apply filter to a buffer of images

In GPUImage there are some filters that works only for a stream of frames from a camera, for instance the low pass filter, or the high pass filter, but there are plenty of them.
I'm trying to create a buffer of UIImages that with a fixed timerate make possible to apply those filters also between just 2 images, and that for each pair of image produces a single filtered image. Something like that:
FirstImage+SecondImage-->FirstFilteredImage
SecondImage+ThirdImage-->SecondFilteredImage
I've found that filters that works with frames use a GPUImageBuffer, that is a subclass of GPUImageFilter (most probably just to inherit some methods and protocols) that loads a passthrough fragment shader. From what I understood this is a buffer that keeps incoming frames but already "texturized", textures are generated by binding the texture in the current context.
I've found also a -conserveMemoryForNextFrame that sounds good for what I want to achieve, but I didn't get how is working.
Is it possible to do that? in which method images are converted in texture?
I made something close about what I'd like to achieve, but in first instance I must say that probably I've misunderstood some aspects about current filters functionalities.
I thought that some filters could make some operations taking the time variable into account in their shader. That's because when I saw the low pass filter and hight pass filter I've instantly thought about time. The reality seems to be different, they take into account time but it doesn't seems that this affect the filtering operations.
Since I'm developing by myself a time-lapse application, that saves single images and that reassemble them into a different timeline to make a video without audio, I imagined that a filters function of time could be fun to apply to the subsequent frames. This is the reason about why I posted this question.
Now the answer: to apply a double input filter to still images you must do like in the snippet:
[sourcePicture1 addTarget:twoinputFilter];
[sourcePicture1 processImage];
[sourcePicture2 addTarget:twoinputFilter];
[sourcePicture2 processImage];
[twoinputFilter useNextFrameForImageCapture];
UIImage * image = [twoinputFilter imageFromCurrentFramebuffer];
If you forget to call the -useNextFrameForImageCapture the returned image will be nil, due to the buffer reuse.
Not happy I thought that maybe in the future the good Brad will make something like this, so I've created a GPUImagePicture subclass, that instead of returning kCMTimeIvalid to the appropriate methods returns a new ivar that contains the frame CMTime called -frameTime.
#interface GPUImageFrame : GPUImagePicture
#property (assign, nonatomic) CMTime frameTime;
#end
#implementation GPUImageFrame
- (BOOL)processImageWithCompletionHandler:(void (^)(void))completion;
{
hasProcessedImage = YES;
// dispatch_semaphore_wait(imageUpdateSemaphore, DISPATCH_TIME_FOREVER);
if (dispatch_semaphore_wait(imageUpdateSemaphore, DISPATCH_TIME_NOW) != 0)
{
return NO;
}
runAsynchronouslyOnVideoProcessingQueue(^{
for (id<GPUImageInput> currentTarget in targets)
{
NSInteger indexOfObject = [targets indexOfObject:currentTarget];
NSInteger textureIndexOfTarget = [[targetTextureIndices objectAtIndex:indexOfObject] integerValue];
[currentTarget setCurrentlyReceivingMonochromeInput:NO];
[currentTarget setInputSize:pixelSizeOfImage atIndex:textureIndexOfTarget];
[currentTarget setInputFramebuffer:outputFramebuffer atIndex:textureIndexOfTarget];
[currentTarget newFrameReadyAtTime:_frameTime atIndex:textureIndexOfTarget];
}
dispatch_semaphore_signal(imageUpdateSemaphore);
if (completion != nil) {
completion();
}
});
return YES;
}
- (void)addTarget:(id<GPUImageInput>)newTarget atTextureLocation:(NSInteger)textureLocation;
{
[super addTarget:newTarget atTextureLocation:textureLocation];
if (hasProcessedImage)
{
[newTarget setInputSize:pixelSizeOfImage atIndex:textureLocation];
[newTarget newFrameReadyAtTime:_frameTime atIndex:textureLocation];
}
}

A faster way to update interface using background threads

I'm building a spectrograph and would like to know how I can improve the performance of my UIView-based code. I know that I cannot update user interface for iPhone/iPad from a background thread, so I'm doing most of my processing using GCD. The issue that I'm running into is that my interface still updates way too slowly.
With the code below, I'm trying to take 32 stacked 4x4 pixel UIViews and change their background color (see the green squares on the attached image). The operation produces visible lag for other user interface.
Is there a way I can "prepare" these colors from some kind of background thread and then ask the main thread to refresh the interface all at once?
//create a color intensity map used to color pixels
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
colorMap = [[NSMutableDictionary alloc] initWithCapacity:128];
for(int i = 0; i<128; i ++)
{
[colorMap setObject:[UIColor colorWithHue:0.2 saturation:1 brightness:i/128.0 alpha:1] forKey:[NSNumber numberWithInt:i]];
}
});
-(void)updateLayoutFromMainThread:(id)sender
{
for(UIView* tempView in self.markerViews)
{
tempView.backgroundColor =[colorMap objectForKey:[NSNumber numberWithInt:arc4random()%128]];
}
}
//called from background, would do heavy processing and fourier transforms
-(void)updateLayout
{
//update the interface from the main thread
[self performSelectorOnMainThread:#selector(updateLayoutFromMainThread:) withObject:nil waitUntilDone:NO];
}
I ended up pre-calculating a dictionary of 256 colors and then asking the dictionary for the color based on the value that the circle is trying to display. Trying to allocate colors on the fly was the bottleneck.
, Yes, a couple of points.
While you shouldn't process UIView on the main thread, you can instantiate views on a background thread before using them. Not sure if that will help you at all. However beyond instantiating a view on a background thread, UIView's are really just a meta-data wrapper for CALayer objects and are optimised for flexibility rather than performance.
Your best bet is to draw to a layer object or an image object on a background thread (which is a slower process because drawing uses the CPU as well as the GPU), pass the layer object or image to the main thread, then draw the pre-rendered image to your view's layer (much faster because a simple call is made to get the Graphics Processor to blit the image to the UIView's backing store directly).
see this answer:
Render to bitmap then blit to screen
The code:
- (void)drawRect:(CGRect)rect {
CGContextRef context = UIGraphicsGetCurrentContext();
CGContextDrawImage(context, rect, image);
}
executes far faster than if you were to execute other drawing operations, such as drawing bezier curves, in the same method.

CIFilter with UISlider

So I have a UISlider that is changing a HUE using CIFilter.
Its insanely slow because I'm affecting the base image while the uislider is in use.
Any one have suggestions on how to do this more efficiently?
// UI SLIDER
-(IBAction)changeSlider:(id)sender {
[self doHueAdjustFilterWithBaseImage:currentSticker.image
hueAdjust:[(UISlider *)sender value]];
}
//Change HUE
-(void)doHueAdjustFilterWithBaseImage:(UIImage*)baseImage hueAdjust:(CGFloat)hueAdjust {
CIImage *inputImage = [[CIImage alloc] initWithImage:baseImage];
CIFilter * controlsFilter = [CIFilter filterWithName:#"CIHueAdjust"];
[controlsFilter setValue:inputImage forKey:kCIInputImageKey];
[controlsFilter setValue:[NSNumber numberWithFloat:hueAdjust] forKey:#"inputAngle"];
//NSLog(#"%#",controlsFilter.attributes);
CIImage *displayImage = controlsFilter.outputImage;
CIContext *context = [CIContext contextWithOptions:nil];
if (displayImage == nil){
NSLog(#"Display NADA");
} else {
NSLog(#"RETURN Image");
currentSticker.image = [UIImage imageWithCGImage:[context createCGImage:displayImage fromRect:displayImage.extent]];
}
displayImage = nil;
inputImage = nil;
controlsFilter = nil;
}
You can set UISlider's continuous property to NO. So that your changeSlider only gets called when user releaseS the finger. Here is Apple's doc?
Your problem here is that you keep redeclaring the context. Put the context as a property and initialize it once in your initializer, then use it over and over and see a massive performance gain.
I guess rather changing the awesome behaviour you want, a frequent update while sliding is good! and it's best for user experience. So Do not change that behaviour but rather work on your algorithm to achieve greater optimisation. Check this previously asked question which has some good tips.
How to implement fast image filters on iOS platform
My go would be, keep updating slider property but try to code something which makes the update when slider hits a number which is a whole decimal(funny term, and its probably wrong too) I mean detect when the slider passes thru 10.000, 20.000, 30.000. Only then update the image rather updating for every single point. Hope this makes sense.
EDIT:
Make your,
Input mage and filter variables as iVar. and check they have been already allocated then reuse it rather then creating every single time.

Resources