Distorted sound after sample rate change - ios

This one keeps me awake:
I have an OS X audio application which has to react if the user changes the current sample rate of the device.
To do this I register a callback for both in- and output devices on ‘kAudioDevicePropertyNominalSampleRate’.
So if one of the devices sample rates get changed I get the callback and set the new sample rate on the devices with 'AudioObjectSetPropertyData' and 'kAudioDevicePropertyNominalSampleRate' as the selector.
The next steps were mentioned on the apple mailing list and i followed them:
stop the input AudioUnit and the AUGraph which consists of a mixer and the output AudioUnit
uninitalize them both.
check for the node count, step over them and use AUGraphDisconnectNodeInput to disconnect the mixer from the output
now set the new sample rate on the output scope of the input unit
and on the in- and output scope on the mixer unit
reconnect the mixer node to the output unit
update the graph
init input and graph
start input and graph
Render and Output callbacks start again but now the audio is distorted. I believe it's the input render callback which is responsible for the signal but I'm not sure.
What did I forget?
The sample rate doesn't affect the buffer size as far as i know.
If I start my application with the other sample rate everything is OK, it's the change that leads to the distorted signal.
I look at the stream format (kAudioUnitProperty_StreamFormat) before and after. Everything stays the same except the sample rate which of course changes to the new value.
As I said I think it's the input render callback which needs to be changed. Do I have to notify the callback that more samples are needed? I checked the callbacks and buffer sizes with 44k and 48k and nothing was different.
I wrote a small test application so if you want me to provide code, I can show you.
Edit: I recorded the distorted audio(a sine) and looked at it in Audacity.
What I found was that after every 495 samples the audio drops for another 17 samples.
I think you see where this is going: 495 samples + 17 samples = 512 samples. Which is the buffer size of my devices.
But I still don't know what I can do with this finding.
I checked my Input and Output render procs and their access of the RingBuffer(I'm using the fixed Version of CARingBuffer)
Both store and fetch 512 frames so nothing is missing here...

Got it!
After disconnecting the Graph it seems to be necessary to tell both devices the new sample rate.
I already did this before the callback but it seems this has to be done at a later time.

Related

FSK demodulation with GNU Radio

I'm trying to demodulate a signal using GNU Radio Companion. The signal is FSK (Frequency-shift keying), with mark and space frequencies at 1200 and 2200 Hz, respectively.
The data in the signal text data generated by a device called GeoStamp Audio. The device generates audio of GPS data fed into it in real time, and it can also decode that audio. I have the decoded text version of the audio for reference.
I have set up a flow graph in GNU Radio (see below), and it runs without error, but with all the variations I've tried, I still can't get the data.
The output of the flow graph should be binary (1s and 0s) that I can later convert to normal text, right?
Is it correct to feed in a wav audio file the way I am?
How can I recover the data from the demodulated signal -- am I missing something in my flow graph?
This is a FFT plot of the wav audio file before demodulation:
This is the result of the scope sink after demodulation (maybe looks promising?):
UPDATE (August 2, 2016): I'm still working on this problem (occasionally), and unfortunately still cannot retrieve the data. The result is a promising-looking string of 1's and 0's, but nothing intelligible.
If anyone has suggestions for figuring out the settings on the Polyphase Clock Sync or Clock Recovery MM blocks, or the gain on the Quad Demod block, I would greatly appreciate it.
Here is one version of an updated flow graph based on Marcus's answer (also trying other versions with polyphase clock recovery):
However, I'm still unable to recover data that makes any sense. The result is a long string of 1's and 0's, but not the right ones. I've tried tweaking nearly all the settings in all the blocks. I thought maybe the clock recovery was off, but I've tried a wide range of values with no improvement.
So, at first sight, my approach here would look something like:
What happens here is that we take the input, shift it in frequency domain so that mark and space are at +-500 Hz, and then use quadrature demod.
"Logically", we can then just make a "sign decision". I'll share the configuration of the Xlating FIR here:
Notice that the signal is first shifted so that the center frequency (middle between 2200 and 1200 Hz) ends up at 0Hz, and then filtered by a low pass (gain = 1.0, Stopband starts at 1 kHz, Passband ends at 1 kHz - 400 Hz = 600 Hz). At this point, the actual bandwidth that's still present in the signal is much lower than the sample rate, so you might also just downsample without losses (set decimation to something higher, e.g. 16), but for the sake of analysis, we won't do that.
The time sink should now show better values. Have a look at the edges; they are probably not extremely steep. For clock sync I'd hence recommend to just go and try the polyphase clock recovery instead of Müller & Mueller; chosing about any "somewhat round" pulse shape could work.
For fun and giggles, I clicked together a quick demo demod (GRC here):
which shows:

Get peak volume of audio input on iOS

On iOS 7, how do I get the current microphone input volume in a range between 0 and 1?
I've seen several approaches like this one, but the results I get baffle me.
The return values of peakPowerForChannel: are documented to be in the range of -160 to 0 with 0 being the loudest and -160 near absolute silence.
Problem: Given a quite room and a short but loud noise, the power goes all the way up in an instant but takes very long time to drop back to quite level (way longer than the actual noise...)
What I want: Essentially I want an exact copy of the Audio Input patch of Quartz Composer with its Volume Peak output. Any tips?
To get a similar volume peak measurement, you might have to input raw audio via the iOS Audio Queue API (or the RemoteIO Audio Unit), and analyze the raw PCM waveform samples in each audio callback, looking for a magnitude maxima over your desired frame width or analysis time.

ios endless video recording

I'm trying to develop an iPhone app that will use the camera to record only the last few minutes/seconds.
For example, you record some movie for 5 minutes click "save", and only the last 30s will be saved. I don't want to actually record five minutes and then chop last 30s (this wont work for me). This idea is called "Loop recording".
This results in an endless video recording, but you remember only last part.
Precorder app do what I want to do. (I want use this feature in other context)
I think this should be easily simulated with a Circular buffer.
I started a project with AVFoundation. It would be awesome if I could somehow redirect video data to a circular buffer (which I will implement). I found information only on how to write it to a file.
I know I can chop video into intervals and save them, but saving it and restarting camera to record another part will take time and it is possible to lose some important moments in the movie.
Any clues how to redirect data from camera would be appreciated.
Important! As of iOS 8 you can use VTCompressionSession and have direct access to the NAL units instead of having to dig through the container.
Well luckily you can do this and I'll tell you how, but you're going to have to get your hands dirty with either the MP4 or MOV container. A helpful resource for this (though, more MOV-specific) is Apple's Quicktime File Format Introduction manual
http://developer.apple.com/library/mac/#documentation/QuickTime/QTFF/QTFFPreface/qtffPreface.html#//apple_ref/doc/uid/TP40000939-CH202-TPXREF101
First thing's first, you're not going to be able to start your saved movie from an arbitrary point 30 seconds before the end of the recording, you'll have to use some I-Frame at approximately 30 seconds. Depending on what your Keyframe Interval is, it may be several seconds before or after that 30 second mark. You could use all I-frames and start from an arbitrary point, but then you'll probably want to re-encode the video afterward because it will be quite large.
SO knowing that, let's move on.
First step is when you set up your AVAssetWriter, you will want to set its AVAssetWriterInput's expectsMediaDataInRealTime property to YES.
In the captureOutput callback you'll be able to do an fread from the file you are writing to. The first fread will get you a little bit of MP4/MOV (whatever format you're using) header (i.e. 'ftyp' atom, 'wide' atom, and the beginning of the 'mdat' atom). You want what's inside the 'mdat' section. So the offset you'll start saving data from will be 36 or so.
Each read will get you 0 or more AVC NAL Units. You can find a listing of NAL unit types from ISO/IEC 14496-10 Table 7-1. They will be in a slightly different format than specified in Annex B, but it's fine. Additionally, there will only be IDR slices and non-IDR slices in the MP4/MOV file. IDR will be the I-Frame you're looking to hang onto.
The NAL unit format in the MP4/MOV container is as follows:
4 bytes - Size
[Size] bytes - NALU Data
data[0] & 0x1F - NALU Type
So now you have the data you're looking for. When you go to save this file, you'll have to update the MPV/MOV container with the correct length, sample count, you'll have to update the 'stsz' atom with the correct sizes for each sample and things like updating the media headers and track headers with the correct duration of the movie and so on. What I would probably recommend doing is creating a sample container on first run that you can more or less just overwrite/augment with the appropriate data for that particular movie. You'll want to do this because the encoders on the various iDevices don't all have the same settings and the 'avcC' atom contains encoder information.
You don't really need to know much about the AVC stream in this case, so you'll probably want to concentrate your experimenting around updating the container format you choose correctly. Good luck.

Why is DirectShow dragging in unnecessary intermediate filters when making multiple input connections to my DirectShow Transform filter?

I have a DirectShow Transform filter written in Delphi 6 using the DSPACK component library. It is a simple audio mixer that creates a new input pin whenever a new connection is attempted. I say simple because once its media format is set, all connections to the its input pins or singular output pin are forced to conform to that media format. I build the filter chain manually, making all pin connections explicitly myself. I do not use any of the "intelligent rendering" calls, unless there is some way to trigger that unwanted behavior (in my case) accidentally.
NOTE: The Capture Filter is a standard DirectShow filter external to my application. My push source audio filter and simple audio mixer filters are being used as private, unregistered filters and are internal to my application.
I am having a weird problem that only occurs when I try to make multiple input connections to my mixer, which does indeed accept them. Currently, I am attempting to connect both a Capture Filter and my custom Push Source audio filter to my mixer filter. Whenever I try to do that the second upstream filter connection fails. Regardless of whether I connect the Capture Filter first or Push Source audio filter first, the second upstream filter connection always fails.
The first test I ran was to try connecting just the Capture Filter to the mixer. That worked fine.
The second test I ran was to try connecting just the Push Source audio filter to the mixer. That worked fine.
But as soon as try to do both I get a "no combination of intermediate filters could be found" error. I did several hours of deep digging into the media negotiation calls hitting my filter from the graph builder and then I found the problem. For some reason, the filter graph is dragging in the ancient "Indeo (R) Audio Software" codec into the chain.
I discovered this because despite the fact that codec did have a media format that matched my filter in almost every regard (major type, sub type, format type, wave format parameters), it had an extra 2 bytes at the end of it's pbFormat data member and that was enough to fail the equals test since that test does a comparison between the source and target pbFormat areas by comparing the cbFormat value of each media type. The Indeo codec has a cbFormat value of 20 while my filter has a cbFormat value of 18, which is the size of a _tWAVEFORMATEX data structure. In a way it's a good thing the Indeo pbFormat has that weird size because the first 18 bytes of its 20 byte area were exactly equal to the pbFormat area of my mixer filter's supported media type. Without that anomaly I never would have known that ancient codec was being drug in. I'm surprised it's being drug in at all since it has known exploits and vulnerabilities. What is most confusing is that this is happening on my mixer filter's output pin, not one of the input pins, and I have not made a single downstream connection yet when building up my pin connections.
Can anyone tell me why DirectShow is trying to drag in that codec despite the fact that the media formats for the both incoming filters, the Capture Filter and the Push Source filter, are identical and don't need any intermediate filters at all since they match my mixer filter's input pins supported format exactly? How can I fix this problem?
Also, I noticed that even in the single filter attachment tests above that succeeded, my mixer output pin was still getting queried for media formats. Why is that when as I said, at this point in building up my pin connections I have not connected anything to the output pin of my mixer filter?
--------------------------- UPDATE: 1 ----------------------------
I have learned that you can avoid the "intelligent connection" behavior entirely by using IFilterGraph.ConnectDirect() instead of IGraphBuilder.Connect(). I switched over to DirectConnect() and turns out that the input pin on my mixer filter is coming back as "already connected". That may be what is causing the graph builder to drag in the Indeo codec filter. Now that I have this new diagnostic information I will correct the problem and update this post with my results.
--------------------------- RESOLUTION ----------------------------
The root problem of all of this was my re-use of the input pin I obtained from the first destination/downstream filter I connected to my simple audio mixer filter, at the top of my application code. In other words my filter was working correctly, but I was not getting a fresh input pin with each upstream filter I tried to connect to it. Once I started doing that the connection process worked fine. I don't know why the code behind the IGraphBuilder.Connect() interface tried to bring in the Indeo codec filter, perhaps something to do with trying to connect to an already connected input pin, but it did. For my needs, I prefer the tight control that IFilterGraph.ConnectDirect() provides since it eliminates any interference from the intelligent connection code in IGraphBuilder, but I could see when video filters get involved it could become useful.

iOS: Sound generation on iPad given Hz parameter?

Is there an API in one of the iOS layers that I can use to generate a tone by just specifying its Hertz. What I´m looking to do is generate a DTMF tone. This link explains how DTMF tones consists of 2 tones:
http://en.wikipedia.org/wiki/Telephone_keypad
Which basically means that I should need playback of 2 tones at the same time...
So, does something like this exist:
SomeCleverPlayerAPI(697, 1336);
If spent the whole morning searching for this, and have found a number of ways to playback a sound file, but nothing on how to generate a specific tone. Does anyone know, please...
Check out the AU (AudioUnit) API. It's pretty low-level, but it can do what you want. A good intro (that probably already gives you what you need) can be found here:
http://cocoawithlove.com/2010/10/ios-tone-generator-introduction-to.html
There is no iOS API to do this audio synthesis for you.
But you can use the Audio Queue or Audio Unit RemoteIO APIs to play raw audio samples, generate an array of samples of 2 sine waves summed (say 44100 samples for 1 seconds worth), and then copy the results in the audio callback (1024 samples, or whatever the callback requests, at a time).
See Apple's aurioTouch and SpeakHere sample apps for how to use these audio APIs.
The samples can be generated by something as simple as:
sample[i] = (short int)(v1*sinf(2.0*pi*i*f1/sr) + v2*sinf(2.0*pi*i*f2/sr));
where sr is the sample rate, f1 and f1 are the 2 frequencies, and v1 + v2 sum to less than 32767.0. You can add rounding or noise dithering to this for cleaner results.
Beware of clicking if your generated waveforms don't taper to zero at the ends.

Resources