AKsampler in combination with AKmicrophonetracker - audiokit

I have merged the AKsampler example and the AKmicrophone(tracker), adding the microphone in the”conductor” Class, zo MIDI io is running and it is working but....
The volume of the sampler is reduced to about half
I tried the mix>tracker>booster setup but got random frequencies from the tracker (high values with the correct ones spread around)
With the AKmicrophonetracker it is working but the "getNote" algorithm of the microphone analysis example is consistent about half a note wrong
I had to raise the notefrequencytable entrys with ca 145 cents toget the notes right
My guess is its a samplerate-buffer-hop problem: anyone any suggestion? I have tried several different AKsettings and AKsessionsettings, stopping and restarting audiokit and connecting and disconnecting the nodes.

Related

AudioUnit recording glitches every 30 seconds

I've used this sample code to create an audio recorder. http://www.stefanpopp.de/capture-iphone-microphone/
I'm finding I get glitches about every 30 seconds. They sound a bit like buffer glitches to me, although I might be wrong. I've tried contacting the author of the article but not having much success. I'm really struggling to follow some of this code. I think it's missing a circular buffer but I'm not sure how important that is here. I'm hoping someone can point me in the right direction to either:
Point me to some different example code or suggest what I need to add to this (high level suggestion is fine - I'm happy to research and do the work, I'm just not confident what the work is)
Suggest some better values to use for things like the buffer data size.
Tell me that there's nothing wrong with this code and my bug is almost certainly elsewhere.
Suggest a library I can use that should take care of it (Amazing Audio Engine 2 looks good for me but I'm a bit worried about the note saying it's retired. AudioKit looks great too but it's missing a peak power reading, which would be a shame to have to implement myself after having imported such a complex library)
Why aren't I using AVAudioSession? I need the user to be able to set mic level while recording and to be able to listen back at the same time. Previously I did this with AVAudioSession but on more recent devices isInputGainSettable returns NO. It also returns NO for many hardware mics plugged in via lightning cable, which we're seeing more and more now the headphone jack is gone.
Several problems.
Apple recommends that object methods not be called in the audio context (the callbacks). Your code has several. Use C functions instead.
Newer iOS devices likely use a hardware sample rate of 48000, not 44100. Resampling potentially causes buffers to change sizes.
The code seems to assume that the play callback buffer was the same size as the input callback buffer. This is not guaranteed. Thus the playback might end up with too few samples, causing periodic glitches.
In my experience (iPhone 6) sample rate from microphone can be 48000 when a headset is not plugged in, and change to 44100 when a headset is plugged in.
If your audiounit is expecting a samplerate of 44100 then glitches like these are to be expected. To verify, you could try if your problem remains when you plug in a headset.
A workaround for the glitch problem seems to be to use an AVAudioEngine. Connect its inputNode to its mainMixerNode using the inputFormat of the inputNode. Connect the mainMixerNode to your AudioUnit in your desired format. Connect your AudioUnit to outputNode of the AVAudioEngine.
Using this mixerNode between inputNode and audioUnit is essential in this workaround.

how to find an offset from two audio file ? one is noisy and one is clear

I have once scenario in which user capturing the concert scene with the realtime audio of the performer and at the same time device is downloading the live streaming from audio broadcaster device.later i replace the realtime noisy audio (captured while recording) with the one i have streamed and saved in my phone (good quality audio).right now i am setting the audio offset manually with trial and error basis while merging so i can sync the audio and video activity at exact position.
Now what i want to do is to automate the process of synchronisation of audio.instead of merging the video with clear audio at given offset i want to merge the video with clear audio automatically with proper sync.
for that i need to find the offset at which i should replace the noisy audio with clear audio.e.g. when user start the recording and stop the recording then i will take that sample of real time audio and compare with live streamed audio and take the exact part of that audio from that and sync at perfect time.
does any one have any idea how to find the offset by comparing two audio files and sync with the video.?
Here's a concise, clear answer.
• It's not easy - it will involve signal processing and math.
• A quick Google gives me this solution, code included.
• There is more info on the above technique here.
• I'd suggest gaining at least a basic understanding before you try and port this to iOS.
• I would suggest you use the Accelerate framework on iOS for fast Fourier transforms etc
• I don't agree with the other answer about doing it on a server - devices are plenty powerful these days. A user wouldn't mind a few seconds of processing for something seemingly magic to happen.
Edit
As an aside, I think it's worth taking a step back for a second. While
math and fancy signal processing like this can give great results, and
do some pretty magical stuff, there can be outlying cases where the
algorithm falls apart (hopefully not often).
What if, instead of getting complicated with signal processing,
there's another way? After some thought, there might be. If you meet
all the following conditions:
• You are in control of the server component (audio broadcaster
device)
• The broadcaster is aware of the 'real audio' recording
latency
• The broadcaster and receiver are communicating in a way
that allows accurate time synchronisation
...then the task of calculating audio offset becomes reasonably
trivial. You could use NTP or some other more accurate time
synchronisation method so that there is a global point of reference
for time. Then, it is as simple as calculating the difference between
audio stream time codes, where the time codes are based on the global
reference time.
This could prove to be a difficult problem, as even though the signals are of the same event, the presence of noise makes a comparison harder. You could consider running some post-processing to reduce the noise, but noise reduction in its self is an extensive non-trivial topic.
Another problem could be that the signal captured by the two devices could actually differ a lot, for example the good quality audio (i guess output from the live mix console?) will be fairly different than the live version (which is guess is coming out of on stage monitors/ FOH system captured by a phone mic?)
Perhaps the simplest possible approach to start would be to use cross correlation to do the time delay analysis.
A peak in the cross correlation function would suggest the relative time delay (in samples) between the two signals, so you can apply the shift accordingly.
I don't know a lot about the subject, but I think you are looking for "audio fingerprinting". Similar question here.
An alternative (and more error-prone) way is running both sounds through a speech to text library (or an API) and matching relevant part. This would be of course not very reliable. Sentences frequently repeat in songs and concert maybe instrumental.
Also, doing audio processing on a mobile device may not play well (because of low performance or high battery drain or both). I suggest you to use a server if you go that way.
Good luck.

Creating a rtsp client for live audio and video broadcasting in objective C

I am trying to create a RTSP client which live broadcast Audio and Video. I modified the iOS code at link http://www.gdcl.co.uk/downloads.htm and able to broadcast the Video to server properly. But now i am facing issues in broadcasting the audio part. In the link example the code is written in such a way that it writes the Video data to file and than reads the data from the file and upload the NALU's video packets to RTSP server.
For Audio part i am not sure how to proceed on it. Right now what i have tried is that get the audio buffer from mic and than broadcast it to the server directly by adding RTP headers and ALU.. but This approach is not properly working as Audio starts lagging behind and lag increases with time. Can someone let me know if there is some better approach to achieve this and with lip sycn audio/video.
Are you losing any packets on the client? If so, you need to leave "space." If you receive packet 1,2,3,4,6,7, You need to leave space for the missing packet (5).
The other possibility is a what is known as a clock drift problem. The clock (crystal) on your client and server are not perfectly in sync with each other.
This can be caused by environment, temperature changes, etc.
Let's say in a perfect world your server is producing audio samples 20ms audio samples at 48000 hz. Your client is playing them back using a sample rate of 48000 hz. Realistically your client and server are not exactly 48000hz. Your server might be 48000.001 and your client might be 47999.9998. So your server might be delivering faster than your client or vise versa. You would either consume packets too fast and under run the buffer or lag too far behind and overflow the client buffer. In your case, it sounds like the client is playing back too slow and slowly lagging behind the server. You might only lag a couple milliseconds per minute but the issue will keep continuing and it will look like a 1970s lip synced Kung Fu movie.
In other devices, there is often a common clock line to keep things in sync. For example, Video camera clocks, midi clocks. multitrack recorder clocks.
When you deliver data over IP, there is no common clock shared between a client and server. So your issue concerns syncing clocks between disparate devices with no. I have successfully solved this problem using this general approach:
A) Let the client count the rate of packets that come in over a period of time.
B) Let the client count the rate that the packets are consumed (played back).
C) Adjust the sample rate of the client based on A and B.
So your client requires that you adjust the sample rate of the playback. So yes you play it faster or slower. Note that the playback rate change will be very very subtle. You might set the sample rate to be 48000.0001 hz instead of 48000 hz. The difference in pitch would be undetectable by humans as it would only cause a fraction a cent difference in pitch. I gave an explanation of a very simplified approach. There many other nuances and edge cases that must be considered when developing such a control system. You don't just set it and forget it. You need a control system to manage the playback.
An interesting test to demonstrate this is to take two devices with the exact same file. A long recording (say 3 hours) is best. Start them at the same time. After 3 hours of playback, you will notice that one is ahead of the other.
This post explains that it is NOT a trivial task to stream audio and video.

Establishing synchronized music streaming across devices

I am attempting to stream audio files from a server to iOS devices and play them completely synchronized. For example on my phone I might be 20 secs into a song and then my friend next to me should also be 20 secs into the song as well. I know this is not an easy problem to solve, but I am attempting to do so.
I can currently get them within one second of each other by calculating the difference in time between the devices and then have them sync up, however that is not good enough because the human ear can detect a major difference in a second and this is over WIFI.
My next approach is going to be to unicast the one file from the server and then have the all devices pick it up directly from the server and then implement some type of buffer system similar to netflix so that network connectivity would be a limiting factor. http://www.wowza.com/ is what I would use to help with that.
I know this can be done, because http://lysn.in/ is does it with their app and I want to be able to do something similar.
Any other recommendations after I try my unicast option?
Would implementing firebase help solve a lot of the heavy lifting problems?
(1) In answer to ONE of your questions (the final one):
Firebase is not "realtime" in "that sense" -- PubNub is probably (almost certainly) the fastest "realtime" messaging for and between apps/browser/etc.
But they don't mean real-time in the sense of real-time, say, as race game engineers mean it or indeed in your use-case.
So firebase is not relevant to you here and won't help.
(2) Regarding your second general question: "how to synchronise time on two or more devices, given that we have communications delays."
Now, this is a really well-travelled problem in computer science.
It would be pointless outlining it here, because it is fully explained here http://www.ntp.org/ntpfaq/NTP-s-algo.htm if you click on "How is time synchronised"?
So in fact, to get a good time base on both machines, you should use that! Have both machines really accurately set a time to NTP using the existing (perfected for decades) NTP synchronisation.
(So for example https://stackoverflow.com/a/6744978/294884 )
In fact are you doing this?
It's possible that doing that will solve all your problems; then just agree to start at a certain exact time.
Hope it helps!
I would recommend against using the data movement to synchronize the playback. This should be straightforward to do with a buffer and a periodic "sync" signal that is sent at a period of < 1/2 the buffer size. Worst case this should generate a small blip on devices that get ahead or behind relative to the sync signal.

iOS: Sample code for simultaneous record and playback

I'm designing a simple proof of concept for multitrack recorder.
Obvious starting point is to play from file A.caf to headphones while simultaneously recording microphone input into file B.caf
This question -- Record and play audio Simultaneously -- points out that there are three levels at which I can work:
AVFoundation API (AVAudioPlayer + AVAudioRecorder)
Audio Queue API
Audio Unit API (RemoteIO)
What is the best level to work at? Obviously the generic answer is to work at the highest level that gets the job done, which would be AVFoundation.
But I'm taking this job on from someone who gave up due to latency issues (he was getting a 0.3sec delay between the files), so maybe I need to work at a lower level to avoid these issues?
Furthermore, what source code is available to springboard from? I have been looking at SpeakHere sample ( http://developer.apple.com/library/ios/#samplecode/SpeakHere/Introduction/Intro.html ). if I can't find something simpler I will use this.
But can anyone suggest something simpler/else? I would rather not work with C++ code if I can avoid it.
Is anyone aware of some public code that uses AVFoundation to do this?
EDIT: AVFoundation example here: http://www.iphoneam.com/blog/index.php?title=using-the-iphone-to-record-audio-a-guide&more=1&c=1&tb=1&pb=1
EDIT(2): Much nicer looking one here: http://www.switchonthecode.com/tutorials/create-a-basic-iphone-audio-player-with-av-foundation-framework
EDIT(3): How do I record audio on iPhone with AVAudioRecorder?
To avoid latency issues, you will have to work at a lower level than AVFoundation alright. Check out this sample code from Apple - Auriotouch. It uses Remote I/O.
As suggested by Viraj, here is the answer.
Yes, you can achieve very good results using AVFoundation. Firstly you need to pay attention to the fact that for both the player and the recorder, activating them is a two step process.
First you prime it.
Then you play it.
So, prime everything. Then play everything.
This will get your latency down to about 70ms. I tested by recording a metronome tick, then playing it back through the speakers while holding the iPhone up to the speakers and simultaneously recording.
The second recording had a clear echo, which I found to be ~70ms. I could have analysed the signal in Audacity to get an exact offset.
So in order to line everything up I just performSelector:x withObject: y afterDelay: 70.0/1000.0
There may be hidden snags, for example the delay may differ from device to device. it may even differ depending on device activity. It is even possible the thread could get interrupted/rescheduled in between starting the player and starting the recorder.
But it works, and is a lot tidier than messing around with audio queues / units.
I had this problem and I solved it in my project simply by changing the PreferredHardwareIOBufferDuration parameter of the AudioSession. I think I have just 6ms latency now, that is good enough for my app.
Check this answer that has a good explanation.

Resources