Which API to play audio from a buffer in ios and osx? - ios

I would like to do this very simple thing: playing PCM audio data from memory.
The audio samples will come from sound-synthesis algorithms, pre-loaded sample files or whatever. My question is really about how to play the buffers, not how to fill them with data.
So I'm looking for the best way to re-implement my old, deprecated AudioWrapper (which was based on AudioUnits V1), but I could not find in the Apple Documentation an API that would fulfill the following:
Compatible with 10.5 through 10.7.
Available in ios.
Does not rely on a third-party library.
Be future proof (for example: not based on Carbon, 64 bits...).
I'm considering using OpenAL, but is it really the best option ? I've seen negative opinions about it, it might be too complex and overkill, and might add performance overhead ?
At worse, I could have two different implementations of that AudioWrapper, but if possible, I'd really like to avoid having one version for each system (ios, 10.5, 10.6, 10.7...). Also, it will be in C++.
EDIT: I need a good latency, the system must respond to user interactions in under 20 ms (the buffers must be between 128 and 512 samples at 44KHz)

AudioQueues are quite common. However, their I/O buffer sizes are large enough that they are not ideal for interactive I/O (e.g. a synth).
For lower latency, try AudioUnits -- the MixerHost sample may be a good starting point.

Not sure about OS X 10.5, but I'm directly using the Audio Units API for low-latency audio analysis and synthesis on OS X 10.6, 10.7, and iOS 3.x thru 5.x. My wrapper file to generalize the API came to only a few hundred lines of plain C, with a few ifdefs.
The latency of Audio Queues was too high for my low latency stuff on iOS, whereas the iOS RemoteIO Audio Unit seems to allow buffers as short as 256 samples (but sometimes only down to 1024 when the display goes off) at a 44100 sample rate.

Related

What does the Streaming stand for in Streaming SIMD Extensions (SSE)?

I've looked everywhere and I still can't figure it out. I know of two associations you can make with streams:
Wrappers for backing data stores meant as an abstraction layer between consumers and suppliers
Data becoming available with time, not all at once
SIMD stands for Single Instruction, Multiple Data; in the literature the instructions are often said to come from a stream of instructions. This corresponds to the second association.
I don't exactly understand why the Streaming in Streaming SIMD Extensions (or in Streaming Multiprocessor either), however. The instructions are coming from a stream, but can they come from anywhere else? Do we or could we have just SIMD extensions or just multiprocessors?
Tl;dr: can CPU instructions be non-streaming, i.e. not come from a stream?
SSE was introduced as an instruction set to improve performance in multimedia applications. The aim for the instruction set was to quickly stream in some data (some bit of a DVD to decode for example), process it quickly (using SIMD), and then stream the result to an output (e.g. the graphics ram). (Almost) All SSE instructions have a variant that allows it to read 16bytes from memory. The instruction set also contains instructions to control the CPU cache and HW prefetcher. It's pretty much just a marketing term.

MTAudioProcessingTap on iOS - Should I use VLC or build a video player from scratch?

I'm trying to build an iOS app that plays video files and does some interesting things using MTAudioProcessingTap. I need it to be able to play all sorts of formats, including some that are not supported by Apple. I'm thinking of branching out from VLC, but I can't figure out if it uses Core Audio/Video at any point or if it's running something else completely.
If it's not, is there a library I can use to take care of the 203572964 codecs being used out there?
Thanks.
Preliminary note: I'm the developer of VLC for iOS so the following may be biased.
MobileVLCKit for iOS includes 2 different audio output modules. One of them is a high level module based on AudioQueue which is fairly incomplex but a bit slow. The other is based on AudioUnit, the low level framework of CoreAudio, quite a bit more complex, but way faster. Depending on your current experience, either module would be a good way to start.
Regarding the one library supporting all codecs thing: basically there are two forks of the same library: libav and FFmpeg. VLC supports either flavor and abstracts the complexity and the ever-changing APIs (which are a real pain if you intend to keep maintaining your app across multiple releases of those libraries). Additionally, we include a quite well performing OpenGL ES 2 video output module which is using shaders to do chroma conversation. All you need to do is embedding a UIView. MobileVLCKit handles the rest.
Speaking of MobileVLCKit: this is a thin ObjC layer on top of libvlc simplifying the use of this library in third party applications by abstracting most commonly used features.
As implicitly mentioned by HalR, libvlc does not use hardware accelerated decoding on iOS yet. We are working with the libav developers on a generic approach, but we are not quite there yet. Thus, we have to do all the decoding on the CPU, which leads to the heating but allows us to play virtually anything instead of H264/MP4 using the default, accelerated API.
If you can't figure out how its playing the video, at its lower level, that perhaps is a sign that you should keep working with it instead of trying to outdo it. Video processing is pretty difficult and often unsupported formats are unsupported due to patent issues. I really haven't seen anything better than VLC that is publicly available.
VLC 2.1.x appears to use AudioToolbox and AVFoundation.
One other issue, though, is that when I was doing work with VLC, I was stunned how it turned my iPod Touch into a miniature iron, because it was working so hard to process the video. Manually processing video is very processor intensive and really is a drain. So your way or VLC could still have some additional issues.

Digital Audio Workstation Architecture on iOS

I am developing an architecture for digital audio workstation that works on iOS (mainly, but trying to support OS X too). I'm going slowly through miles of documentation by Apple and references of their frameworks.
I have experience with DSP, but iOS is more new to me and there are so many objects, tutorials (even for older versions of iOS) and different frameworks with different API's. I would just like to make sure I choose the right one on start, or combination of those.
The goals of the architecture are:
Sound track sample access (access samples in files)
iPod library songs
local file songs
songs on remote server
radio stations (infinite length songs)
Effect chaining (multiple equalizers, or pitch & tempo change at the same time)
Multiple channels and mixing (even surround)
Portability
Mac OS X at least
iOS 6+ support (iOS 5 or lower not needed)
Sample access in 32-bit floats, not signed integers.
Easy Objective-C API (DSP and processing done in C++ of course)
Recording, playing
Record to file (codec by choice), or send over network (VoIP)
Playing on different outputs (on Mac) or speakers/headphones on iOS
Changing of volume/mute
Background audio support
Real-time sample processing
Equalizer on any song that is currently played
Real-time sample manipulation
Multi-threading
I hope I did not miss anything, but those are the most important goals.
My research
I have looked through most of the frameworks (not so much in detail though) and here is what I have figured out. Apple lists following frameworks for using Audio on iOS:
Media Player framework
AV Foundation framework
Audio Toolbox framework
Audio Unit framework
OpenAL framework
Media Player and AV Foundation are too high-level API's and do not allow direct sample access. OpenAL on the other side cannot record audio. So that leaves Audio Toolbox and Audio Unit frameworks. Many of the differences are explained here: What's the difference between all these audio frameworks?
As much as I can understand, Audio Toolbox would be the way to go, since MIDI is currently not required. But there is very little information and tutorials on Audio Toolbox for more professional control, such as recording, playing, etc. There is much more on Audio Units though.
My first question: What exactly are Audio Queue Services and what framework they belong to?
And then the final question:
Which framework should be used to be able to achieve most of the desired goals?
You can suggest even mix and match of frameworks, classes, but I ask you kindly, to explain your answer and which classes would you use to achieve a goal in more detail. I encourage highest level API as possible, but as low level as it is needed to achieve the goals. Sample code links are also welcome.
Thank you very much for your help.
Audio Units is the lowest level iOS audio API, and the API that Audio Queues are built upon. And Audio Units will provide an app with the lowest latency, and thus closest to real-time processing possible. It is a C API though, so an app may have to do some of its own audio memory management.
The AVFoundation framework may provide an app with easier access to music library assets.
An app can only process sound from other apps that explicitly publish their audio data, which does not include the Music player app, but does include some of the apps using Apple's Inter-App Audio API, and the 3rd party Audiobus API.

Improve compression ratio with Delphi 6 app that uses the Windows AVIFile functions?

I have a Delphi 6 app that makes movies from an incoming video and audio stream from a robot. The PC receives the video stream as a series of JPEG frames and the audio as blocks of PCM audio data. I am using the Windows AVIFile functions (AVIStreamCreate, etc.) to create the movie. For the choice of video compressor I use the AVISaveOptions() function and let the user select one of the available compressors from those available on their system. For example: Microsoft Video 1, Cinepak Code by Radius, etc. Note several of the other available ones, like Microsoft H.263 or H.261 fail with AVIERR_BADFORMAT errors so I could not test with them. The audio is compressed using the GSM 6.10 compressor.
The problem is I can't seem to get near the compression ratio that I can using a tool like Adobe Premiere for comparison. Note, I am aware that Premiere is compressing using a different overall process than mine, and to a different file format like MPEG, or Quicktime, etc. But I would like to get a comparable compression ratio if I can.
No matter which compressor I choose from AVISaveOptions(), and no matter how low I crank the available compression quality settings for the compressor (for example, Temporal Quality Ratio & Compression Quality for Microsoft Video 1), a minutes worth of video always ends up creating an AVI file of approximately 14MB in size. For comparison, the file I can create using Adobe Premiere is less than 1 MB in size and looks about the same visual quality (in other words, good enough for my purposes. I don't care about actual quality loss here.).
If I examine the file output from my usage of the Windows AVI API I see that none of the settings I change with the compressor affect the frame rate. It is always identical to the input frame rate. Now if necessary obviously I can drop frames on the input side, but that would be a bit messy since it is synced to the audio and I'd like to avoid that if I could.
But more importantly is the Data rate. I can never get that below approximately 2.3 kbps no matter how low I crank the compressor settings down. The videos I create with Premiere, and other videos I've played with that have a healthy file size to duration ratio, are all about 1.2 kbps.
Overall the difference in between the file size of my AVI files and the ones I create with Premiere or other people have sent to me that compress well is 10 to 1. Therefore my compression ratio is 10 times worse than other video files, and those other files have no unpleasant difference in their video quality.
What can I do to get a comparable compression ratio?
UPDATE: The reply by David Heffernan contains a fast solution that worked for me. I am highlighting it because it also contains a vital licensing warning too. For those of you, like me, that want to make it as convenient for your users as possible to use the XVid codec, read the article below. It contains instructions on how to re-use a user's compressor choice, along with their chosen compression configuration choices, in future sessions without bothering the user again:
http://msdn.microsoft.com/en-us/magazine/hh580739.aspx
For the curious, the change in size from my previous output AVI file size to the file created using the XVid codec was 12.231 MB to 632 KB and the video quality was more than reasonable.
The truly simple answer is to install the XVID encoder. None of the codecs that are supplied with Windows are fit for your purpose. XVID is both high quality and free.
Regarding distribution and licensing implications, the XVID FAQ has this to say:
Can I distribute Xvid together with my proprietary program?
If your program calls Xvid functionality upon run-time it’s a derived work and hence, the terms of the GPL apply to the work as a whole including your program. So no, you cannot distribute Xvid together with your proprietary program then. If you want to distribute, you’ll have to publish your program under the GPL as well. That also requires e.g. the provision of the full apps source code. Refer to the GPL license text for more information.
We don’t link to Xvid at all, just call through the VfW interface upon run-time – can we distribute with our proprietary software?
No. It doesn’t matter in which way you link to Xvid or what you count as linking and what not. The GPL doesn’t focus on the term ‘linking’ at all but rather requires combined/derived works to be published as a whole under the terms of the GPL. Basically any two (or more) pieces make up a combined work when they are distributed for use in combination. Hence, if your program calls upon Xvid functionality at run-time it would make up a derived work - no matter how you technically implement the calls to Xvid. If you don’t want to publish your program under the GPL then refrain from distributing it in combination with Xvid.
What this means for you is that you could only distribute XVID with your program if your program is also licensed under the GPL. But it is perfectly fine for you to suggest to your users that they obtain XVID for themselves.

iOS: Audio Units vs OpenAL vs Core Audio

Could someone explain to me how OpenAL fits in with the schema of sound on the iPhone?
There seem to be APIs at different levels for handling sound. The higher level ones are easy enough to understand.
But my understanding gets murky towards the bottom. There is Core Audio, Audio Units, OpenAL.
What is the connection between these? Is openAL the substratum, upon which rests Core Audio (which contains as one of its lower-level objects Audio Units) ?
OpenAL doesn't seem to be documented by Xcode, yet I can run code that uses its functions.
This is what I have figured out:
The substratum is Core Audio. Specifically, Audio Units.
So Audio Units form the base layer, and some low-level framework has been built on top of this. And the whole caboodle is termed Core Audio.
OpenAL is a multiplatform API -- the creators are trying to mirror the portability of OpenGL. A few companies are sponsoring OpenAL, including Creative Labs and Apple!
So Apple has provided this API, basically as a thin wrapper over Core Audio. I am guessing this is to allow developers to pull over code easily. Be warned, it is an incomplete implementation, so if you want OpenAL to do something that Core Audio can do, it will do it. But otherwise it won't.
Kind of counterintuitive -- just looking at the source, it looks as if OpenAL is lower level. Not so!
Core Audio covers a lot of things, such as reading and writing various file formats, converting between encodings, pulling frames out of streams, etc. Much of this functionality is collected as the "Audio Toolbox". Core Audio also offers multiple APIs for processing streams of audio, for playback, capture, or both. The lowest level one is Audio Units, which works with uncompressed (PCM) audio and has some nice stuff for applying effects, mixing, etc. Audio Queues, implemented atop Audio Units, are a lot easier because they work with compressed formats (not just PCM) and save you from some threading challenges. OpenAL is also implemented atop Audio Units; you still have to use PCM, but at least the threading isn't scary. Difference is that since it's not from Apple, its programming conventions are totally different from Core Audio and the rest of iOS (most obviously, it's a push API: if you want to stream with OpenAL, you poll your sources to see if they've exhausted their buffers and push in new ones; by contrast, Audio Queues and Audio Units are pull-based, in that you get a callback when new samples are needed for playback).
Higher level, as you've seen, is nice stuff like Media Player and AV Foundation. These are a lot easier if you're just playing a file, but probably aren't going to give you deep enough access if you want to do some kind of effects, signal processing, etc.

Resources