AudioConverterRef with different number of channels - ios

I have an audio format with 4 channels that I want to convert into 2 channels format. kAudioConverterChannelMap can only discard the extra inputs:
When In > Out, the first Out inputs are routed to the first Out outputs, and the remaining inputs are discarded.
Is it possible to specify channel map with kAudioConverterChannelMap to merge 1 + 2 channel into one and 3 + 4 channels into the second of the output format?
If not, what should I use for this conversion?

AFAIK Apple does not provide an API that reduces the channel count of an audio stream, apart from discarding extra channels.
Why? I guess because how you do this is such a personal choice that if they did think about it at all they may have come to the conclusion that there was no one approach would make everyone happy and so best instead to choose the approach that would make everyone angry: discard.
So what's the big deal? I think it's mainly to do with the "meaning" of the individual channels. Consider the "obviously simple" case of converting stereo to mono - it's surprisingly nuanced, depending on how the stereo audio was recorded. And that's in an ideal case where your 2-channel audio is actually declared to be "stereo" (e.g. via an AudioChannelLayoutTag like kAudioChannelLayoutTag_Stereo).
So even "tagged" multi-channel audio is terribly ambiguous, so the only person who knows how to interpret and reduce the channel count of your 4 channel audio is you.
That said, would it have killed them to provide a function that summed even channels with even and odd with odd using some Accelerate/vDSP array functions, like vDSP_vadd? Because that's probably what you want.

Related

Splitting a futures::Stream into multiple streams based on a property of the stream item

I have a Stream of items (u32, Bytes) where the integer is an index in the range 0..n I would like to split this stream into n streams, basically filtering by the integer.
I considered several possibilities, including
creating n streams each of which peeks at the underlying stream to determine if the next item is for it
pushing the items to one of n sinks when they arrive, and then use the other side of the sink as a stream again. (This seems to be related to
Forwarding from a futures::Stream to a futures::Sink.).
I feel that neither of these possibilities is convincing. The first one seems to create unnecessary overhead and the second one is just not elegant (if it even works, I am not sure).
What's a good way of splitting the stream?
At one point I had a similar requirement and wrote a group_by operator for Stream.
I haven't yet published this to crates.io as I didn't really feel it was ready for consumption but feel free to take a look at the code at https://github.com/Lukazoid/lz_stream_tools or attempt to use it for yourself.
Add the following to your cargo.toml:
[dependencies]
lz_stream_tools = { git = "https://github.com/Lukazoid/lz_stream_tools" }
And extern crate lz_stream_tools; to your bin.rs/lib.rs.
Then from your code you may use it like so:
use lz_stream_tools::StreamTools;
let groups = some_stream.group_by(|x| x.0);
groups will now be a Stream of (u32, Stream<Item=Bytes)).
You could use channels to represent the index-specific streams. You'd have to spawn one Task that pulls from the original stream and has a map of Senders.

FSK demodulation with GNU Radio

I'm trying to demodulate a signal using GNU Radio Companion. The signal is FSK (Frequency-shift keying), with mark and space frequencies at 1200 and 2200 Hz, respectively.
The data in the signal text data generated by a device called GeoStamp Audio. The device generates audio of GPS data fed into it in real time, and it can also decode that audio. I have the decoded text version of the audio for reference.
I have set up a flow graph in GNU Radio (see below), and it runs without error, but with all the variations I've tried, I still can't get the data.
The output of the flow graph should be binary (1s and 0s) that I can later convert to normal text, right?
Is it correct to feed in a wav audio file the way I am?
How can I recover the data from the demodulated signal -- am I missing something in my flow graph?
This is a FFT plot of the wav audio file before demodulation:
This is the result of the scope sink after demodulation (maybe looks promising?):
UPDATE (August 2, 2016): I'm still working on this problem (occasionally), and unfortunately still cannot retrieve the data. The result is a promising-looking string of 1's and 0's, but nothing intelligible.
If anyone has suggestions for figuring out the settings on the Polyphase Clock Sync or Clock Recovery MM blocks, or the gain on the Quad Demod block, I would greatly appreciate it.
Here is one version of an updated flow graph based on Marcus's answer (also trying other versions with polyphase clock recovery):
However, I'm still unable to recover data that makes any sense. The result is a long string of 1's and 0's, but not the right ones. I've tried tweaking nearly all the settings in all the blocks. I thought maybe the clock recovery was off, but I've tried a wide range of values with no improvement.
So, at first sight, my approach here would look something like:
What happens here is that we take the input, shift it in frequency domain so that mark and space are at +-500 Hz, and then use quadrature demod.
"Logically", we can then just make a "sign decision". I'll share the configuration of the Xlating FIR here:
Notice that the signal is first shifted so that the center frequency (middle between 2200 and 1200 Hz) ends up at 0Hz, and then filtered by a low pass (gain = 1.0, Stopband starts at 1 kHz, Passband ends at 1 kHz - 400 Hz = 600 Hz). At this point, the actual bandwidth that's still present in the signal is much lower than the sample rate, so you might also just downsample without losses (set decimation to something higher, e.g. 16), but for the sake of analysis, we won't do that.
The time sink should now show better values. Have a look at the edges; they are probably not extremely steep. For clock sync I'd hence recommend to just go and try the polyphase clock recovery instead of Müller & Mueller; chosing about any "somewhat round" pulse shape could work.
For fun and giggles, I clicked together a quick demo demod (GRC here):
which shows:

Objective-C: Cross correlation of two audio files

I want to perform a cross-correlation of two audio files (which are actually NSData objects). I found a vDSP_convD function in accelerate framework. NSData has a property bytes which returns a pointer to an array of voids - that is the parameter of the filter and signal vector.
I struggled with other parameters. What is the length of these vectors or the length of the result vectors?
I guess:
it's the sum of the filter and signal vector.
Could anyone give me an example of using the vDSP_convD function?
Apple reference to the function is here
Thanks
After reading a book - Learning Core Audio, I have made a demo which demonstrates delay between two audio files. I used new iOS 8 API to get samples from the audio files and a good performance optimization.
Github Project.
a call would look like this:
vDSP_conv ( signal *, signalStride, filter *, filterStride, result*, resultStride, resultLenght, filterLength );
where we have:
signal*: a pointer to the first element of your signal array
signalStride: the lets call it "step size" throug your signal array. 1 is every element, 2 is every second ...
same for filter and result array
length for result and filter array
How long do the arrays have to be?:
As stated in the docs you linked our signal array has to be lenResult + lenFilter - 1 which it is where it gets a little messy. You can find a demonstration of this by Apple here or a shorter answer by SO user Rasman here.
You have to do the zero padding of the signal array by yourself so the vector functions can apply the sliding window without preparation.
Note: You might consider using the Fast-Fourier-Transformation for this, because when you work with audio files i assume, that you have quite some data and there is a significant performance increase from a certain point onwards when using:
FFT -> complex multiplication in frequency domain (which results in a correlation in time domain) -> reverse FFT
here you can find a useful piece of code for this!

FMOD FMOD_DSP_READCALLBACK - specifying channels

I would like to create a DSP plugin which takes an input of 8 channels (the 7.1 speaker mode), does some processing then returns the data to 2 output channels. My plan was to use setspeakermode to FMOD_SPEAKERMODE_7POINT1 and FMOD_DSP_DESCRIPTION.channels to 2 but that didnt work, both in and out channels were showing as 2 in my FMOD_DSP_READCALLBACK function.
How can I do this?
You cannot perform a true downmix in FMODEx using the DSP plugin interface. The best you can do is process the incoming 8ch data, then fill just the front left and front right parts of the output buffer leaving the rest silent.
Setting the channel count to 2 tells FMOD your DSP can only handle stereo signal, setting the count to 0 means any channel count.

iOS: Sound generation on iPad given Hz parameter?

Is there an API in one of the iOS layers that I can use to generate a tone by just specifying its Hertz. What I´m looking to do is generate a DTMF tone. This link explains how DTMF tones consists of 2 tones:
http://en.wikipedia.org/wiki/Telephone_keypad
Which basically means that I should need playback of 2 tones at the same time...
So, does something like this exist:
SomeCleverPlayerAPI(697, 1336);
If spent the whole morning searching for this, and have found a number of ways to playback a sound file, but nothing on how to generate a specific tone. Does anyone know, please...
Check out the AU (AudioUnit) API. It's pretty low-level, but it can do what you want. A good intro (that probably already gives you what you need) can be found here:
http://cocoawithlove.com/2010/10/ios-tone-generator-introduction-to.html
There is no iOS API to do this audio synthesis for you.
But you can use the Audio Queue or Audio Unit RemoteIO APIs to play raw audio samples, generate an array of samples of 2 sine waves summed (say 44100 samples for 1 seconds worth), and then copy the results in the audio callback (1024 samples, or whatever the callback requests, at a time).
See Apple's aurioTouch and SpeakHere sample apps for how to use these audio APIs.
The samples can be generated by something as simple as:
sample[i] = (short int)(v1*sinf(2.0*pi*i*f1/sr) + v2*sinf(2.0*pi*i*f2/sr));
where sr is the sample rate, f1 and f1 are the 2 frequencies, and v1 + v2 sum to less than 32767.0. You can add rounding or noise dithering to this for cleaner results.
Beware of clicking if your generated waveforms don't taper to zero at the ends.

Resources