Implementing audio waveform view and audio timeline view in iOS? - ios

I am working on an app that will allow users to record from the mic, and I am using audio units for the purpose. I have the audio backend figured out and working, and I am starting to work on the views/controls etc.
There are two things I am yet to implement:
1) I will be using OpenGL ES to draw waveform of the audio input, there seems to be no easier way to do it for real-time drawing. I will be drawing inside a GLKView. After something is recorded, the user should be able to scroll back and forth and see the waveform without glitches. I know it's doable, but having a hard time understanding how that can be implemented. Suppose, the user is scrolling, would I need to re-read the recorded audio every time and re-draw everything? I obviously don't want to store the whole recording in memory, and reading from disk is slow.
2) For the scrolling etc., the user should see a timeline, and if I have an idea of the 1 question, I don't know how to implement the timeline.
All the functionality I'm describing is do-able since it can be seen in the Voice Memos app. Any help is always appreciated.

I have done just this. The way I did it was to create a data structure that holds different "zoom levels" data for the audio. Unless you are displaying the audio at a resolution that will display 1 sample per 1 pixel, you don't need every sample to be read from disk, so what you do is downsample your samples to a much smaller array that can be stored in memory ahead of time. A naive example is if your waveform were to display audio at a ratio of 64 samples per pixel. Lets say you have an array of 65536 stereo samples, you would average each L/R pair of samples into a positive mono value, then average 64 of the positive mono values into one float. Then your array of 65536 audio samples can be visualized with an array of 512 "visual samples". My real world implementation became much more complicated than this as I have ways to display all zoom levels and live resampling and such, but this is the basic idea. It's essentially a Mip map for audio.

Related

IOS Swift buffer 30FPS Video for realtime object-detection

I have trained an ObjectDetector for iOS. Now I want to use it on a Video with a frame rate of 30FPS.
The ObjectDetector is a bit too slow, needs 85ms for one frame. For the 30FPS it should be below 33ms.
Now I am wondering if it is possible to buffer the frames and the predictions for a specified time x and then play the video on the screen?
If you have already tried using a smaller/faster model (and also to ensured that your model is fully optimized to run in CoreML on the neural engine), we had success doing inference only every nth frame.
The results were suitable for our use-case and you couldn't really tell that we were only doing it at 5 fps because we were able to continue to display the camera output at full frame-rate.
If you don't need realtime then yes, certainly you could store the video and do the processing per frame afterwards; this would let you parallelize things into bigger batch sizes as well.

Is there a quality difference between output of AVCaptureMovieFileOutput and AVCaptureVideoDataOutput?

In the process of capturing a light trail photo, I noticed that for fast moving objects, there is slightly more discontinuity between successive frames if I use the sample buffers from AVCaptureVideoDataOutput compared to if I record a movie and extract frames and run the same algo.
Is there a refresh rate/frame rate difference if the two modes are used?
A colleague who has experience in professional photography claims that there is a visible lag even in Apple's default camera app when comparing the preview in Photo mode and Video mode but it is not something very obvious to me.
Furthermore, I am actually capturing video at a low frame rate (close to highest exposure)
To conclude these experiments, I need to know if there is any definitive proof to confirm or disprove the same

Converting raw data to displayable video for iOS

I have an interesting problem I need to research related to very low level video streaming.
Has anyone had any experience converting a raw stream of bytes(separated into per pixel information, but not a standard format of video) into a low resolution video stream? I believe that I can map the data into RGB value per pixel bytes, as the color values that correspond to the value in the raw data will be determined by us. I'm not sure where to go from there, or what the RGB format needs to be per pixel.
I've looked at FFMPeg but it's documentation is massive and I don't know where to start.
Specific questions I have include, is it possible to create CVPixelBuffer with that pixel data? If I were to do that, what sort of format for the per pixel data would I need to convert to?
Also, should I be looking deeper into OpenGL, and if so where would the best place to look for information on this topic?
What about CGBitmapContextCreate? For example, if I went I went with something like this, what would a typical pixel byte need to look like? Would this be fast enough to keep the frame rate above 20fps?
EDIT:
I think with the excellent help of you two, and some more research on my own, I've put together a plan for how to construct the raw RGBA data, then construct a CGImage from that data, in turn create a CVPixelBuffer from that CGImage from here CVPixelBuffer from CGImage.
However, to then play that live as the data comes in, I'm not sure what kind of FPS I would be looking at. Do I paint them to a CALayer, or is there some similar class to AVAssetWriter that I could use to play it as I append CVPixelBuffers. The experience that I have is using AVAssetWriter to export constructed CoreAnimation hierarchies to video, so the videos are always constructed before they begin playing, and not displayed as live video.
I've done this before, and I know that you found my GPUImage project a little while ago. As I replied on the issues there, the GPUImageRawDataInput is what you want for this, because it does a fast upload of RGBA, BGRA, or RGB data directly into an OpenGL ES texture. From there, the frame data can be filtered, displayed to the screen, or recorded into a movie file.
Your proposed path of going through a CGImage to a CVPixelBuffer is not going to yield very good performance, based on my personal experience. There's too much overhead when passing through Core Graphics for realtime video. You want to go directly to OpenGL ES for the fastest display speed here.
I might even be able to improve my code to make it faster than it is right now. I currently use glTexImage2D() to update texture data from local bytes, but it would probably be even faster to use the texture caches introduced in iOS 5.0 to speed up refreshing data within a texture that maintains its size. There's some overhead in setting up the caches that makes them a little slower for one-off uploads, but rapidly updating data should be faster with them.
My 2 cents:
I made an opengl game which lets the user record a 3d scene. Playback was done via replaying the scene (instead of playing a video because realtime encoding did not yield a comfortable FPS.
There is a technique which could help out, unfortunately I didn't have time to implement it:
http://allmybrain.com/2011/12/08/rendering-to-a-texture-with-ios-5-texture-cache-api/
This technique should cut down time on getting pixels back from openGL. You might get an acceptable video encoding rate.

iOS video playback

In my application i should play video in unusual way.
Something like interactive player for special purposes.
Main issues here:
video resolution can be from 200*200px up to 1024*1024 px
i should have ability to change speed from -60 FPS to 60 PFS (in this case video should be played slower or faster depending on selected speed, negative means that video should play in back direction)
i should draw lines and objects over the video and scale it with image.
i should have ability Zoom image and pan it if its content more than screen size
i should have ability to change brightness, contrast and invert colors of this video
Now im doing next thing:
I splited my video to JPG frames
created timer for N times per seconds (play speed control)
each timer tick im drawing new texture (next JPG frame) with OpenGL
for zoom and pan im playing with OpenGL ES transformations (translate, scale)
All looks fine until i use 320*240 px, but if i use 512*512px my play rate is going down. Maybe timer behavour problem, maybe OpenGL. Sometimes, if im trying to open big textures with high play rate (more than 10-15 FPS), application just crash with memory warnings.
What is the best practice to solve this issue? What direction should i dig? Maybe cocos2d or other game engines helps me? Mb JPG is not best solution for textures and i should use PNG or PVR or smth else?
Keep the video data as a video and use AVAssetReader to get the raw frames. Use kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange as the colorspace, and do YUV->RGB colorspace conversion in GLES. It will mean keeping less data in memory, and make much of your image processing somewhat simpler (since you'll be working with luma and chroma data rather than RGB values).
You don't need to bother with Cocos 2d or any game engine for this. I strongly recommend doing a little bit of experimenting with OpenGL ES 2.0 and shaders. Using OpenGL for video is very simple and straightforward, adding a game engine to the mix is unnecessary overhead and abstraction.
When you upload image data to the textures, do not create a new texture every frame. Instead, create two textures: one for luma, and one for chroma data, and simply reuse those textures every frame. I suspect your memory issues are arising from using many images and new textures every frame and probably not deleting old textures.
JPEG frames will be incredibly expensive to uncompress. First step: use PNG.
But wait! There's more.
Cocos2D could help you mostly through its great support for sprite sheets.
The biggest help, however, may come from packed textures a la TexturePacker. Using PVR.CCZ compression can speed things up by insane amounts, enough for you to get better frame rates at bigger video sizes.
Vlad, the short answer is that you will likely never be able to get all of these features you have listed working at the same time. Playing video 1024 x 1024 video at 60 FPS is really going to be a stretch, I highly doubt that iOS hardware is going to be able to keep up with those kind of data transfer rates at 60FPS. Even the h.264 hardware on the device can only do 30FPS at 1080p. It might be possible, but to then layer graphics rendering over the video and also expect to be able to edit the brightness/contrast at the same time, it is just too many things at the same time.
You should focus in on what is actually possible instead of attempting to do every feature. If you want to see an example Xcode app that pushes iPad hardware right to the limits, please have a look at my Fireworks example project. This code displays multiple already decoded h.264 videos on screen at the same time. The implementation is built around CoreGraphics APIs, but the key thing is that Apple's impl of texture uploading to OpenGL is very fast because of a zero copy optimization. With this approach, a lot of video can be streamed to the device.

Add constant latency to graphical output in XNA 4

does anyone know of an easy way to add a constant latency (about 30 ms) to the graphical output of an XNA 4 application?
I want to keep my graphical output in sync with a real-time buffered audio stream which inherently has a constant latency.
Thanks for any ideas on this!
Max
If you really need to delay your graphics, then what you could do is render your game to a cycling series of render-targets. So on frame n you display the frame you rendered at frame n-2. This will only work for small latencies, and requires a large amount of additional graphics memory and a small amount of extra GPU time.
A far better method is not to delay the graphical output at all, but delay the audio that is being used to generate the graphical output. Either by buffering it or having two read positions in your audio buffer. The "audio" read being X ms (the latency) ahead of the "game" read.
So if your computer's audio hardware has 100ms of latency (not uncommon), and your graphics hardware has a latency of 16ms: As you are feeding the sample at 100ms into the audio system, you are feeding the audio sample at 16ms into the your graphics calculation. At the same time, the audio from 0ms is hitting the speakers, and the matching graphic is hitting the screen.
Obviously this won't work if the thing generating the graphical output is also generating the audio. But the general principal of both these methods is that you have to buffer the input somewhere along your graphics chain, in order to introduce a delay that corresponds to the one you are experiencing for audio. Where along that chain it is easiest to insert a buffer is up to you.
For latencies of <100ms, I wouldn't worry about it for most games. You only really care about this kind of latency for audio programs and rhythm games.
I might not understand the question, but couldn't you keep track of how many times update is called and mod 2? 60fps mod 2 is 30...

Resources