Realtime audio input and output streaming in ios - ios

am newbie for multimedia work.i want to capture audio by samples and transfer to some other ios device via network.how to start my work??? .i have just gone through apple multi media guide and speakhere example ,it is full of c++ code and they are writing in file and then start services ,but i need buffer...please help me to start my work in correct way .
Thanks in advance

I just spent a bunch of time working on real time audio stuff you can use AudioQueue but it has latency issues around 100-200ms.
If you want to do something like the t-pain app, you have to use
RemoteIO API
Audio Unit API
They are equally difficult to implement, so I would just pick the remote IO path.
Source can be found here:
http://atastypixel.com/blog/using-remoteio-audio-unit/

I have upvoted the answer above, but I wanted to add a piece of information that took me a while to figure out. When using AudioQueue for recording, the intuitive notion is that the callback is done in regular intervals of whatever the number of samples represent. That notion is incorrect, AudioQueue seems to gather the samples for a long period of time, then deliver them in very fast iterations of the callback.
In my case, I was doing 20ms samples, and receiving 320 samples per callback. When printing out the timestamps for the call, I noticed a pattern of: 1 call every 2 ms, then after a while one call of ~180ms. Since I was doing VoIP, this presented the symptom of an increasing delay on the receiving end. Switching to Remote I/O seems to have solved the issue.

Related

How many sounds can be played at a time on iOS - AVAudioPlayer vs. AVAudioEngine & AVAudioPlayerNode

I have an application in which there is a set of about 50 sounds, which range in length from about 300 ms to about 4 seconds. Various combinations of sounds need to be played at precise times (up to 10 of them can be triggered at once). Some sounds need to be repeated at intervals as short as 100 ms.
I've implemented this is as a two dimensional array of AVAudioPlayers, all of which are loaded with sounds at application launch. There are several players for each sound, to accommodate rapidly repeating sounds. The players for a particular sound are reused in strict rotation. When a new sound is scheduled, the oldest player for that sound is stopped and its current time is set to 0, so the sound will repeat from the start, the next time it's scheduled using player.play(atTime:). There's a thread that schedules new sets of sounds about 300 ms before they are to be played.
It all works quite nicely, up to a point that varies with the device. Eventually, as sounds are played more rapidly, and/or more simultaneous sounds are scheduled, some sounds will refuse to play.
I'm contemplating switching to AVAudioEngine and AVAudioPlayerNodes, using a mixer node. Does anyone know if that approach is likely to handle more simultaneous sounds? My guess is that both approaches translate into a rather similar set of CoreAudio functions, but I haven't actually written the code to test that hypothesis - before I do that, I'm hoping that someone else may have explored this issue before me. I've been deep into CoreAudio before, and I'm hoping to be able to use these handy high-level functions instead!
Also, does anyone know of a way to trigger a closure when a sound initiates? The documented functionality allows for a callback closure, but the only way I've been able to trigger events when the sounds start, is to create a high quality of service queue for DispatchQueue. Unfortunately, depending on the system load, queued events may be executed at times that vary from the scheduled times by up to about 50 ms, which is not quite as precise as I'd prefer to be.
Using AVAudioEngine with AVAudioPlayersNodes provides much better performance, albeit at the cost of a bit of code complexity. I was able to easily increase the playback rate by a factor of five, with better buffer control.
The main drawback in switching to this approach was that Apple's documentation is less than stellar. A few additions to Apple's documentation would have made this task a LOT easier:
Mixer nodes are documented as being able to convert sample rates and channel counts, so I attempted to configure audioEngine.mainMixerNode to convert mono buffers to the output node's settings. Setting the main mixer node's output to the output node's format appeared to be accepted, but threw opaque errors at run time that complained about channel count mismatches.
It appears that the main mixer node is not actually a fully functional mixer node. To get this to work, I had to insert another mixer node that performed the channel conversion, and connect it to the main mixer node. If Apple's documentation had actually mentioned this, it would have saved me a lot of experimentation.
Also, just scheduling a buffer does not cause anything to play. You need to call play() on the player node before anything will happen. Apple's documentation is confusing here - it says that calling play() with no arguments will cause playback to occur immediately, which wasn't what I wanted. It took some experimentation to determine that play() just tells the player node to wake up, and that scheduled buffers will actually be played at the scheduled time, rather than immediately.
It would have been enormously helpful if Apple had provided more than the auto-generated class documentation. A bit of human-generated documentation would have saved me an awful lot of frustrating experimentation.
Chris Adamson's well-written "Learning Core Audio" was very helpful when I was working with Core Audio - it's a shame that the newer AVAudioEngine functionality isn't documented nearly as well.

Swift - How to remove delay when recording audio using AVFoundation

I'm using an app that records audio and streams it to another user. It's basically a VoIP call. The problem I'm running into is that the audio I'm streaming to the peer is delayed by about 0.5 seconds. This is quite noticeable, and a little annoying when you both try to talk at the same time.
I'm wondering if this is common among AVFoundation's AVAudioEngine, or if possibly it's something to do with the way I set it up.
I can include source code if this is NOT a known problem with AVAudioEngine, otherwise can you please suggest the best route to record audio with the least delay?
I would also prefer something that is fairly high-level, and compatible with swift 3/3.1. However, if there is not a solution that meets these needs, then recommend the tool you think seems best fit.
Thank you!
Ensure that you call "AVAudioEngine.inputNode.installTap" function with the minimum supported bufferSize of 100 ms or (sampleRate * 0.1) samples.

how to find an offset from two audio file ? one is noisy and one is clear

I have once scenario in which user capturing the concert scene with the realtime audio of the performer and at the same time device is downloading the live streaming from audio broadcaster device.later i replace the realtime noisy audio (captured while recording) with the one i have streamed and saved in my phone (good quality audio).right now i am setting the audio offset manually with trial and error basis while merging so i can sync the audio and video activity at exact position.
Now what i want to do is to automate the process of synchronisation of audio.instead of merging the video with clear audio at given offset i want to merge the video with clear audio automatically with proper sync.
for that i need to find the offset at which i should replace the noisy audio with clear audio.e.g. when user start the recording and stop the recording then i will take that sample of real time audio and compare with live streamed audio and take the exact part of that audio from that and sync at perfect time.
does any one have any idea how to find the offset by comparing two audio files and sync with the video.?
Here's a concise, clear answer.
• It's not easy - it will involve signal processing and math.
• A quick Google gives me this solution, code included.
• There is more info on the above technique here.
• I'd suggest gaining at least a basic understanding before you try and port this to iOS.
• I would suggest you use the Accelerate framework on iOS for fast Fourier transforms etc
• I don't agree with the other answer about doing it on a server - devices are plenty powerful these days. A user wouldn't mind a few seconds of processing for something seemingly magic to happen.
Edit
As an aside, I think it's worth taking a step back for a second. While
math and fancy signal processing like this can give great results, and
do some pretty magical stuff, there can be outlying cases where the
algorithm falls apart (hopefully not often).
What if, instead of getting complicated with signal processing,
there's another way? After some thought, there might be. If you meet
all the following conditions:
• You are in control of the server component (audio broadcaster
device)
• The broadcaster is aware of the 'real audio' recording
latency
• The broadcaster and receiver are communicating in a way
that allows accurate time synchronisation
...then the task of calculating audio offset becomes reasonably
trivial. You could use NTP or some other more accurate time
synchronisation method so that there is a global point of reference
for time. Then, it is as simple as calculating the difference between
audio stream time codes, where the time codes are based on the global
reference time.
This could prove to be a difficult problem, as even though the signals are of the same event, the presence of noise makes a comparison harder. You could consider running some post-processing to reduce the noise, but noise reduction in its self is an extensive non-trivial topic.
Another problem could be that the signal captured by the two devices could actually differ a lot, for example the good quality audio (i guess output from the live mix console?) will be fairly different than the live version (which is guess is coming out of on stage monitors/ FOH system captured by a phone mic?)
Perhaps the simplest possible approach to start would be to use cross correlation to do the time delay analysis.
A peak in the cross correlation function would suggest the relative time delay (in samples) between the two signals, so you can apply the shift accordingly.
I don't know a lot about the subject, but I think you are looking for "audio fingerprinting". Similar question here.
An alternative (and more error-prone) way is running both sounds through a speech to text library (or an API) and matching relevant part. This would be of course not very reliable. Sentences frequently repeat in songs and concert maybe instrumental.
Also, doing audio processing on a mobile device may not play well (because of low performance or high battery drain or both). I suggest you to use a server if you go that way.
Good luck.

Using Audio Units to play several short audio files with overlap

I have run through an audio units tutorial for a sine wave generator and done a bit of reading, and I understand basically how it is working. What I would actually like to do for my app, is play a short sound file in response to some external event. These sounds would be about 1-2 seconds in duration and occur at a rate of about about 1-2 per second.
Basically where I am at right now is trying to figure out how to play an actual audio file using my audio unit, rather than generating a sine wave. So basically my question is, how do I get an audio unit to play an audio file?
Do I simply read bytes from the audio file into the buffer in the render callback?
(if so what class do I need to deal with to open / convert / decompress / read the audio file)
or is there some simpler method where I could maybe just hand off the entire buffer and tell it to play?
Any names of specific classes or APIs I will need to look at to accomplish this would be very helpful.
OK, check this:
http://developer.apple.com/library/ios/samplecode/MixerHost/Introduction/Intro.html
EDIT: That is a sample project. This page has detailed instructions with inline code to setup common configurations: http://developer.apple.com/library/ios/ipad/#DOCUMENTATION/MusicAudio/Conceptual/AudioUnitHostingGuide_iOS/ConstructingAudioUnitApps/ConstructingAudioUnitApps.html#//apple_ref/doc/uid/TP40009492-CH16-SW1
If you don't mind being tied to IOS 5+, you should look into AUFilePlayer. It is much easer then using the callbacks and you don't have to worry about setting up your own ring buffer (something that you would need to do if you want to avoid loading all of your audio data into memory on start-up)

iOS: Sample code for simultaneous record and playback

I'm designing a simple proof of concept for multitrack recorder.
Obvious starting point is to play from file A.caf to headphones while simultaneously recording microphone input into file B.caf
This question -- Record and play audio Simultaneously -- points out that there are three levels at which I can work:
AVFoundation API (AVAudioPlayer + AVAudioRecorder)
Audio Queue API
Audio Unit API (RemoteIO)
What is the best level to work at? Obviously the generic answer is to work at the highest level that gets the job done, which would be AVFoundation.
But I'm taking this job on from someone who gave up due to latency issues (he was getting a 0.3sec delay between the files), so maybe I need to work at a lower level to avoid these issues?
Furthermore, what source code is available to springboard from? I have been looking at SpeakHere sample ( http://developer.apple.com/library/ios/#samplecode/SpeakHere/Introduction/Intro.html ). if I can't find something simpler I will use this.
But can anyone suggest something simpler/else? I would rather not work with C++ code if I can avoid it.
Is anyone aware of some public code that uses AVFoundation to do this?
EDIT: AVFoundation example here: http://www.iphoneam.com/blog/index.php?title=using-the-iphone-to-record-audio-a-guide&more=1&c=1&tb=1&pb=1
EDIT(2): Much nicer looking one here: http://www.switchonthecode.com/tutorials/create-a-basic-iphone-audio-player-with-av-foundation-framework
EDIT(3): How do I record audio on iPhone with AVAudioRecorder?
To avoid latency issues, you will have to work at a lower level than AVFoundation alright. Check out this sample code from Apple - Auriotouch. It uses Remote I/O.
As suggested by Viraj, here is the answer.
Yes, you can achieve very good results using AVFoundation. Firstly you need to pay attention to the fact that for both the player and the recorder, activating them is a two step process.
First you prime it.
Then you play it.
So, prime everything. Then play everything.
This will get your latency down to about 70ms. I tested by recording a metronome tick, then playing it back through the speakers while holding the iPhone up to the speakers and simultaneously recording.
The second recording had a clear echo, which I found to be ~70ms. I could have analysed the signal in Audacity to get an exact offset.
So in order to line everything up I just performSelector:x withObject: y afterDelay: 70.0/1000.0
There may be hidden snags, for example the delay may differ from device to device. it may even differ depending on device activity. It is even possible the thread could get interrupted/rescheduled in between starting the player and starting the recorder.
But it works, and is a lot tidier than messing around with audio queues / units.
I had this problem and I solved it in my project simply by changing the PreferredHardwareIOBufferDuration parameter of the AudioSession. I think I have just 6ms latency now, that is good enough for my app.
Check this answer that has a good explanation.

Resources