We have a VoIP app for iOS platform. Where we are using TPCircularBuffer for audio buffering and it's performance is so good.
So i was wondering if it's possible to use TPCircularBuffer for Video buffering also. I have searched a lot but didn't find anything useful on "Using TPCircularBuffer for Video". Is that even possible ?? If yes, then can anyone shade some light on it ? And any code sample would be highly appreciated.
I guess you could copy your video frame's pixels into a TPCircularBuffer, and you'd technically have a video ring buffer, but you've already lost the efficiency race at that point because you don't have time to copy that much data around. You need to keep a reference to your frames.
Or, if you really wanted to mash a solution into TPCircularBuffer, you could write the CMSampleBuffer pointers into the buffer (carefully respecting retain and release). But that seems heavy handed, as you're really not gaining anything from TPCircularBuffer's magical memory mapping wrapping because pointers are so small.
I would simply make my own CMSampleBufferRef ring buffer. You can grab a prebuilt circular buffer or do the clock arithmetic yourself:
CMSampleBufferRef ringBuffer[10]; // or some other number
ringBuffer[(++i) % 10] = frame;
Of course your real problem is not the ring buffer itself, but dealing with the fact that decompressed video is very high bandwidth, e.g. each frame is 8MB for 1080p, or 200MB to store 1 second's worth at 24fps, so you're going to have to get pretty creative if you need anything other than a microscopic video buffer.
Some suggestions:
the above numbers are for RGBA, so try working in YUV, where the numbers become 3MB and 75MB/s
try lower resolutions
Related
I'm using webgl to do YUV to RGB conversions on a custom video codec.
The video has to play at 30 fps. In order to make this happen I'm doing all my math every other requestAnimationFrame.
This works great, but I noticed when profiling that uploading the textures to the gpu takes the longest amount of time.
So I uploaded the "Y" texture and the "UV" texture separately.
Now the first "requestAnimationFrame" will upload the "Y" texture like this:
gl.activeTexture(gl.TEXTURE0);
gl.bindTexture(gl.TEXTURE_2D, yTextureRef);
gl.texImage2D(gl.TEXTURE_2D, 0, gl.LUMINANCE, textureWidth, textureHeight, 0, gl.LUMINANCE, gl.UNSIGNED_BYTE, yData);
The second "requestAnimationFrame" will upload the "UV" texture in the same way, and make a draw call to the fragment shader doing the math between them.
But this doesn't change anything in the profiler. I still show nearly 0 gpu time on the frame that uploads the "Y" texture, and the same amount of time as before on the frame that uploads the "UV" texture.
However if I add a draw call to my "Y" texture upload function, then the profiler shows the expected results. Every frame has nearly half the gpu time.
From this I'm guessing the Y texture isn't really uploaded to the gpu using the texImage2d function.
However I don't really want to draw the Y texture on the screen as it doesn't have the correct UV texture to do anything with until a frame later. So is there any way to force the gpu to upload this texture without performing a draw call?
Update
I mis-understood the question
It really depends on the driver. The problem is OpenGL/OpenGL ES/WebGL's texture API really sucks. Sucks is a technical term for 'has unintended consequences'.
The issue is the driver can't really fully upload the data until you draw because it doesn't know what things you're going to change. You could change all the mip levels in any order and any size and then fix them all in between and so until you draw it has no idea which other functions you're going to call to manipulate the texture.
Consider you create a 4x4 level 0 mip
gl.texImage2D(
gl.TEXTURE_2D,
0, // mip level
gl.RGBA,
4, // width
4, // height
...);
What memory should it allocate? 4(width) * 4(height) * 4(rgba)? But what if you call gl.generateMipmap? Now it needs 4*4*4+2*2*4+1*1*4. Ok but now you allocate an 8x8 mip on level 3. You intend to then replace levels 0 to 2 with 64x64, 32x32, 16x16 respectively but you did level 3 first. What should it do when you replace level 3 before replacing the levels above those? You then add in levels 4 8x8, 5 as 4x4, 6 as 2x2, and 7 as 1x1.
As you can see the API lets you change mips in any order. In fact I could allocate level 7 as 723x234 and then fix it later. The API is designed to not care until draw time when all the mips must be the correct size at which point they can finally allocate memory on the GPU and copy the mips in.
You can see a demonstration and test of this issue here. The test uploads mips out of order to verify that WebGL implementations correctly fail with they are not all the correct size and correctly start working once they are the correct sizes.
You can see this was arguably a bad API design.
They added gl.texStorage2D to fix it but gl.texStorage2D is not available in WebGL1 only WebGL2. gl.texStorage2D has new issues though :(
TLDR; textures get uploaded to the driver when you call gl.texImage2D but the driver can't upload to the GPU until draw time.
Possible solution: use gl.texSubImage2D since it does not allocate memory it's possible the driver could upload sooner. I suspect most drivers don't because you can use gl.texSubImage2D before drawing. Still it's worth a try
Let me also add that gl.LUMIANCE might be a bottleneck as well. IIRC DirectX doesn't have a corresponding format and neither does OpenGL Core Profile. Both support a RED only format but WebGL1 does not. So LUMIANCE has to be emulated by expanding the data on upload.
Old Answer
Unfortunately there is no way to upload video to WebGL except via texImage2D and texSubImage2D
Some browsers try to make that happen faster. I notice you're using gl.LUMINANCE. You might try using gl.RGB or gl.RGBA and see if things speed up. It's possible browsers only optimize for the more common case. On the other hand it's possible they don't optimize at all.
Two extensions what would allow using video without a copy have been proposed but AFAIK no browser as ever implemented them.
WEBGL_video_texture
WEBGL_texture_source_iframe
It's actually a much harder problem than it sounds like.
Video data can be in various formats. You mentioned YUV but there are others. Should the browser tell the app the format or should the browser convert to a standard format?
The problem with telling is lots of devs will get it wrong then a user will provide a video that is in a format they don't support
The WEBGL_video_texture extensions converts to a standard format by re-writing your shaders. You tell it uniform samplerVideoWEBGL video and then it knows it can re-write your color = texture2D(video, uv) to color = convertFromVideoFormatToRGB(texture(video, uv)). It also means they'd have to re-write shaders on the fly if you play different format videos.
Synchronization
It sounds great to get the video data to WebGL but now you have the issue that by the time you get the data and render it to the screen you've added a few frames of latency so the audio is no longer in sync.
How to deal with that is out of the scope of WebGL as WebGL doesn't have anything to do with audio but it does point out that it's not as simple as just giving WebGL the data. Once you make the data available then people will ask for more APIs to get the audio and more info so they can delay one or both and keep them in sync.
TLDR; there is no way to upload video to WebGL except via texImage2D and texSubImage2D
I have to extract all frames from video file and then save them to file.
I tried to use AVAssetImageGenerator, but it's very slow - it takes 1s - 3s per each frame ( sample 1280x720 MPEG4 video ) without saving to file process.
Is there anyway to make it much faster?
OpenGL, GPU, (...)?
I will be very grateful for showing me right direction.
AVAssetImageGenerator is a random access (seeking) interface, and seeking takes time, so one optimisation could be to use an AVAssetReader which will quickly and sequentially vend you frames. You can also choose to work in yuv format, which will give you smaller frames (and I think) faster decoding.
However, those raw frames are enormous: are 1280px * 720px * 4 bytes/pixel (if in RGBA), which is about 3.6MB each. You're going to need some pretty serious compression if you want to keep them all (MPEG4 # 720p comes to mind :).
So what are you trying to achieve?
Are you sure you want fill up your users' disks at a rate of 108MB/s (at 30fps) or 864MB/s (at 240fps)?
I have a 30fps Quicktime .mov of still images I created with AVAssetWriter. (It's only about 10 frames long). I would like the user to be able to slow it down using a UISlider to about 1fps, but when I adjust the AVPlayer .rate property from 1 down to 0, it doesn't get anywhere near 1fps, it just stops playback (because a 0 rate is effectively stopping/pausing it, which makes sense). But how can I slow the player down to about 1fps? I think I'd need to do some math to calculate the actual rate, but that's where I'm stuck. Would it end up being something like 0.000000000000001?
Thanks!
If this was a requirement of mine I would approach this as follows (also suggested by Inafziger in the comments). Use AVAssetReader and roll my own viewer for the images. This would give you precise control using a timer as stated in your comments. Make sure you reuse some preallocated image(s) memory area (you can probably get away with space for a single image). I would probably take a pull approach like CoreAudio. When you need an image pull it from some image buffer manager class which calls AVAssetReaders read function. This way you can have N buffers that will always be available. This may be a little overkill. I do believe AVAssetReader pre decodes some amount of the movie upon initialization. This is why I say you can more than likely just get away with using a single buffer for reading image data into.
From you comment about memory issues. I do believe there are some functions in the AVAssetReader and associated classes that use the create rule.
I have used the following method iOS4: how do I use video file as an OpenGL texture? to get video frames rendering in openGL successfully.
This method however seems to fall down when you want to scrub (jump to a certain point in the playback) as it only supplies you with video frames sequentially.
Does anyone know a way this behaviour can successfully be achieved?
One easy way to implement this is to export the video to a series of frames, store each frame as a PNG, and then "scrub" by seeing to a PNG at a specific offset. That gives you random access in the image stream at the cost of decoding the entire video first and holding all the data on disk. This would also involve decoding each frame as it is accessed, that would eat up CPU but modern iPhones and iPads can handle it as long as you are not doing too much else.
Im looking for an implemtation that uses a ring buffer in remoteio to output a very large audio file.
I have come across the CARingBuffer from apple but I've had a nightmare trying to implement it in my ios project.
As an alternative I came across this ring buffer that I've using (unsuccessfully).
Ring Buffer
How I tried to implement this is as follows.
Open an audio file which is perfectly cut using extaudiofileref.
Fully fill my ring buffer reading from the file (number of frame % inTimeSamples = readpoint)
In my callback if the ring buffer is less than 50% full I call performselector in background to add more samples.
If there is enough samples I just read from the buffer.
This all seems to work fine until I come close to the end of the file and want to loop it. When the reapoint + the number of samples needed to fill the ring buffer exceeds the total number of frames I extract some audio from the remainder of the file, seek to frame 0, then read the rest.
This always sounds glitchy. I think it may have something to do with the fact that the remoteio callback is running much quicker than the background thread so by the time the background thread has completed not only has the calculated readpoint changed but the head and tail of the buffer are not what they should be.
If example code would be too immense to post I would accept pseudo code as an answer. My methodology to solve this is lacking.
This may not be the answer you're looking for, but SFBAudioEngine compiles and runs on iOS and will handle this use case easily. It's basically a higher-level abstraction for the RemoteIO AU and supports many more formats than Core Audio does natively.