What is the supported format for compressed 4-channel audio file in iOS? - ios

First of all I'm a noob in both iOS and audio programming, so bear with me if I don't use the correct technical terms, but I'll do my best!
What we want to do:
In an iOS app we are developing, we want to be able to play sounds throughout 4 different outputs to have a mini surround system. That is, we want to have the Left and Right channels play through the Headphones, while the Center and Center surround play through an audio hardware connected to the lightning port. Since the audio files will be streamed/dowloaded from a remote server, using raw (PCM) audio files is not an option.
The problem:
Apple has, since iOS 6, made it possible to play an audio file using a multiroute configuration... and that is grate and exactly what we need... but, when ever we try to play a 4-channel audio file, AAC-encoded and encapsulated in an m4a (or CAF) file format, we get the following error:
ERROR: [0x19deee000] AVAudioFile.mm:86: AVAudioFileImpl: error 1718449215
(Which is the status code for "kAudioFileUnsupportedDataFormatError" )
We get the same error when we use the same audio encoded as lossless (ALAC) instead, but we don't get this error when playing the same audio befor encoding (PCM format).
We don't get the error neither when we use a stereo audio file, or a 5.1 audio file encoded, the same way as the 4-channels one, in both AAC and ALAC.
What we tried:
The encoding
The file was encoded using Apple's audio tools provided with Mac OS X: afconvert using this command:
afconvert -v -f 'm4af' -d "aac#44100" 4ch_master.caf 4ch_44100_AAC.m4a
and
afconvert -v -f 'caff' -d "alac#44100" 4ch_master.caf 4ch_44100_ALAC.caf
in the case of lossless encoding.
The audio format, as given by afinfo for the master (PCM) audio file:
File: 4ch_master.caf
File type ID: caff
Num Tracks: 1
----
Data format: 4 ch, 44100 Hz, 'lpcm' (0x0000000C) 16-bit little-endian signed integer
no channel layout.
estimated duration: 582.741338 sec
audio bytes: 205591144
audio packets: 25698893
bit rate: 2822400 bits per second
packet size upper bound: 8
maximum packet size: 8
audio data file offset: 4096
optimized
audio 25698893 valid frames + 0 priming + 0 remainder = 25698893
source bit depth: I16
The AAC-encoded format info:
File: 4ch_44100_AAC.m4a
File type ID: m4af
Num Tracks: 1
----
Data format: 4 ch, 44100 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
Channel layout: Quadraphonic
estimated duration: 582.741338 sec
audio bytes: 18338514
audio packets: 25099
bit rate: 251730 bits per second
packet size upper bound: 1039
maximum packet size: 1039
audio data file offset: 106496
optimized
audio 25698893 valid frames + 2112 priming + 371 remainder = 25701376
source bit depth: I16
format list:
[ 0] format: 4 ch, 44100 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
Channel layout: Quadraphonic
----
And for the lossless encoded audio file:
File: 4ch_44100_ALAC.caf
File type ID: caff
Num Tracks: 1
----
Data format: 4 ch, 44100 Hz, 'alac' (0x00000001) from 16-bit source, 4096 frames/packet
Channel layout: 4.0 (C L R Cs)
estimated duration: 582.741338 sec
audio bytes: 83333400
audio packets: 6275
bit rate: 1143862 bits per second
packet size upper bound: 16777
maximum packet size: 16777
audio data file offset: 20480
optimized
audio 25698893 valid frames + 0 priming + 3507 remainder = 25702400
source bit depth: I16
----
The code
In the code part, at the beginning, we followed the implementation presented at session 505 of WWDC12 using AVAudioPlayer API. At that level, multirouting didn't seemed to work reliably.. we didn't suspect that that might have been related to the audio format, so we moved on experimenting with AVAudioEngine API, presented at session 502 of WWDC14 and the sample code associated to it. We made the multirouting work for the master 4-channels audio file (after some adaptations), but then we hit the error mentioned above when calling scheduleFile, as shown in the code snippet below (Note: We are using Swift and all the necessary audio graph setup is done but not shown here):
var playerNode: AVAudioPlayerNode!
...
...
let audioFileToPlay = AVAudioFile(forReading: URLOfTheAudioFle)
playerNode.scheduleFile(audioFileToPlay, atTime: nil, completionHandler: nil)
Do someone have a hint on what could be wrong in the audio data format?

After contacting Apple Support, the answer was that this is not possible for the currently shipping system configurations:
"Thank you for contacting Apple Developer Technical Support (DTS). Our engineers have reviewed your request and have concluded that there is no supported way to achieve the desired functionality given the currently shipping system configurations."

Related

Why enabling Voice Processing on AVAudioInputNode changes channels count on its format?

I've noticed that enabling voice processing on AVAudioInputNode change the node's format - most noticeably channel count.
let inputNode = avEngine.inputNode
print("Format #1: \(inputNode.outputFormat(forBus: 0))")
// Format #1: <AVAudioFormat 0x600002bb4be0:  1 ch,  44100 Hz, Float32>
try! inputNode.setVoiceProcessingEnabled(true)
print("Format #2: \(inputNode.outputFormat(forBus: 0))")
// Format #2: <AVAudioFormat 0x600002b18f50:  3 ch,  44100 Hz, Float32, deinterleaved>
Is this expected? How can I interpret these channels?
My input device is an aggregate device where each channel comes from a different microphone. I then record each channels to separate files.
But when voice processing messes up with the channels layout, I cannot rely on this anymore.

Nvidia codec SDK samples: can't decode an encoded file correctly

I'm trying out the sample applications in the Nvidia video codec sdk, and am having trouble getting a useable decoded result.
My input file is YUV 4:2:0, taken from here, which is 352x288px.
I'm encoding using the AppEncD3D12.exe sample, with the following command:
.\AppEncD3D12.exe -i D:\akiyo_cif.y4m -s 352x288 -o D:\akiyo_out.mp4
This gives the output
GPU in use: NVIDIA GeForce RTX 2080 Super with Max-Q Design
[INFO ][17:46:39] Encoding Parameters:
codec : h264
preset : p3
tuningInfo : hq
profile : (default)
chroma : yuv420
bitdepth : 8
rc : vbr
fps : 30/1
gop : 250
bf : 1
multipass : 0
size : 352x288
bitrate : 0
maxbitrate : 0
vbvbufsize : 0
vbvinit : 0
aq : disabled
temporalaq : disabled
lookahead : disabled
cq : 0
qmin : P,B,I=0,0,0
qmax : P,B,I=0,0,0
initqp : P,B,I=0,0,0
Total frames encoded: 112
Saved in file D:\akiyo_out.mp4
Which looks promising. However, using the decode sample, a single frame of the output contains what look like 12 smaller frames of the input, in monochrome.
I'm running the decode sample like this:
PS D:\Nvidia\Video_Codec_SDK_11.1.5\Samples\build\Debug> .\AppDecD3D.exe -i D:\akiyo_out.mp4
GPU in use: NVIDIA GeForce RTX 2080 Super with Max-Q Design
Display with D3D9.
[INFO ][17:58:58] Media format: raw H.264 video (h264)
Session Initialization Time: 23 ms
[INFO ][17:58:58] Video Input Information
Codec : AVC/H.264
Frame rate : 30000/1000 = 30 fps
Sequence : Progressive
Coded size : [352, 288]
Display area : [0, 0, 352, 288]
Chroma : YUV 420
Bit depth : 8
Video Decoding Params:
Num Surfaces : 7
Crop : [0, 0, 0, 0]
Resize : 352x288
Deinterlace : Weave
Total frame decoded: 112
Session Deinitialization Time: 8 ms
I'm quite new to this so could be doing something stupid. Right now I don't know whether to look at encode or decode! Any ideas or tips most appreciated.
-I've tried other YUV files with the same result. I read that 4:2:2 is not supported, the above is 4:2:0.
Using the AppEncCuda sample, the decoded video (played with AppDecD3D.exe) is the correct size and in colour, but the video appears to scroll to the right as it is played, with colour information not scrolling at the same rate as the image
you have 2 problems:
According to the code and remarks in the AppEncD3D12 sample it expect the input frames to be in ARGB format but your input file is YUV -so the sample read data from the YUV file and treat it as ARGB. If you want the AppEncD3D12 to work with this file you need to either convert each YUV frame to argb or to change the code to work with YUV as input. The AppEncCuda sample is expecting YUV as input and that is the reason it give you better results. you can also see that in the AppEncD3D12 there were a total of 112 encoded but in the AppEncCuda there a total of 300 frames - this is because YUV frame are smaller then ARGB frames.
the 2nd problem is that the both sample save the output as RAW h264. The file is not really MP4 despite the name you gave it. There are a few players that can play a file of h264 RAW data and you can try to use them to play the output file. another option is to use FFMPEG to create a valid MP4 file and pass the RAW h264 samples to it - the NVIDIA encoder encode the video but it does not handle the creation of video files containers (There 2 many type of files like avi,mpg,mp4,mkv,ts, etc.) - you should use FFMPEG or other solution for that. The sdk samples contain a file FFmpegStreamer.h under the Utils folder that show how to use ffmpeg to output h264 video in Mpeg2 transport stream format to a file (*.ts) or the network.

Stereo mic passthrough with AVAudioEngine?

I'm trying to create a graph with AVAudioEngine that takes input from the device microphone, and supports stereo output.
Right now the graph looks like this:
[audioEngine.inputNode] --> [Mic input mixer node] --> [Summing mixer node] --> [audioEngine.mainMixerNode]
As the output of audioEngine.inputNode is 1 ch, 48000 Hz, Float32, I wanted to run through a mixer node after the mic so I can a) send the signal to multiple audio units and b) convert to a stereo signal for more interesting routing, panning etc…
However, the audio passthrough is only working if I set the entire graph to match the input format of 1 ch, 48000 Hz, Float32, like this:
audioEngine.connect(microphoneNode!, to: microphoneInputMixer, format: inputSessionFormat!)
audioEngine.connect(microphoneInputMixer, to: summingMixer, format: inputSessionFormat!)
audioEngine.connect(summingMixer, to: mainMixerNode!, format: inputSessionFormat!)
I have two format variables, the mono inputSessionFormat and stereo outputSettingFormat, which is 2 ch, 44100 Hz, Float32, non-inter.
As I understood from the docs AVAudioMixerNode should handle format conversion, but in this case it's refusing to play audio unless all connections are set to the same (mono) format. I've also tested this code with AVAudioConnectionPoint as well, to no avail.
Only AVAudioEngine's inputNode and outputNodes have the fixed formats. Use nil when connecting to/from these. Mixers will implicitly convert between their input and output formats. Connect to microphoneInputMixer using nil, and from microphoneInputMixer using the desired format. The mainMixerNode will take any input format and its output format cannot be set.
let desiredFormat = AVAudioFormat(standardFormatWithSampleRate: 44100, channels: 2)
audioEngine.connect(microphoneNode!, to: microphoneInputMixer, format:nil)
audioEngine.connect(microphoneInputMixer, to: summingMixer, format: desiredFormat)
audioEngine.connect(summingMixer, to: mainMixerNode!, format: desiredFormat)

Extract Mpeg TS from Wireshark

I need to extract a MPEG-TS stream from a Wireshark capture. I have managed to do this but when I play it back using VLC the output is crappy, it's just a green window with some jitter on the top rows.
Here is how I did it:
Captured using ip.dest filter for the multicast stream.
Analyze -> Decode As -> UDP port (field), portnumber (value), MP2T (current)
Tools Dump MPEG TS Packets.
It does not play out correctly. Is there any other way of doing this
When I need to dump TS from a pcap file I do following:
If TS in plain UDP (column protocol shows MPEG TS for each packet) jump to step 3
If TS is packed in RTP, right click on any packet -> Decode as -> Choose RTP under field "Current"
Use tool MPEG Dump, Tools -> Dump MPEG TS Packets
I do not use MP2T packets decoding, it usually doesn't work.
If the TS is in plain UDP, it can happen that TS packets are shuffled and 4 bits long TS packet field which serves as a continuity counter is not long enough to correctly order TS packets. This can result in corrupted playback of dumped TS.
I've added two filtering options to the original pcap2mpeg.
You can find it on: https://github.com/bugre/pcap2mpegts
So you can:
filter by udp destination port
filter by mcast group IP and destination port
for the cases where the captured file has multiple TS on the same IP but on different ports, or on different mcast IP's.
you would run it as:
pcap2mpegts.pl -y -i 239.100.0.1 -p 2000 -l multi_ts_capture.pcap -o single-stream-output.ts
Not using Wireshark, you can use pcap2mpeg.pl. I tested it and it works well if there is a single MPEG
stream in the PCAP.
Here is the output of ffprobe on a mpeg file with 2 streams that was successfully extracted:
Input #0, mpegts, from 'test.mpeg':
Duration: 00:27:59.90, start: 4171.400000, bitrate: 8665 kb/s
Program 1
Metadata:
service_name : Service01
service_provider: FFmpeg
Stream #0:0[0x100]: Video: h264 (Main) ([27][0][0][0] / 0x001B), yuv420p(progressive), 4096x2176 [SAR 1:1 DAR 32:17], 10 fps, 10 tbr, 90k tbn, 20 tbc
Stream #0:1[0x1001]: Data: bin_data ([6][0][0][0] / 0x0006)

Detecting a low frequency tone in an audio file

I know this question been asked hundred times... But I am getting frustrated with my result so I wanted to ask again. Before I dive deep into fft, I need to figure this simple task out.
I need to detect a 20 hz tone in an audiofile. I insert the 20hz tone myself like in the picture. (It can be any frequency as long as listener can't hear it so I thought I should choose a frequency around 20hz to 50 hz)
info about the audiofile.
afinfo 1.m4a
File: 1.m4a
File type ID: adts
Num Tracks: 1
----
Data format: 1 ch, 22050 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
Channel layout: Mono
estimated duration: 8.634043 sec
audio bytes: 42416
audio packets: 219
bit rate: 33364 bits per second
packet size upper bound: 768
maximum packet size: 319
audio data file offset: 0
optimized
format list:
[ 0] format: 1 ch, 22050 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
Channel layout: Mono
----
I followed this three tutorials and I came up with a working code that reads audio buffer and gives me fft doubles.
http://blog.bjornroche.com/2012/07/frequency-detection-using-fft-aka-pitch.html
https://github.com/alexbw/iPhoneFFT
How do I obtain the frequencies of each value in an FFT?
I read the data as follows
// If there's more packets, read them
inCompleteAQBuffer->mAudioDataByteSize = numBytes;
CheckError(AudioQueueEnqueueBuffer(inAQ,
inCompleteAQBuffer,
(sound->packetDescs?nPackets:0),
sound->packetDescs),
"couldn't enqueue buffer");
sound->packetPosition += nPackets;
int numFrequencies=2048;
int kNumFFTWindows=10;
SInt16 *testBuffer = (SInt16*)inCompleteAQBuffer->mAudioData; //Read data from buffer...!
OouraFFT *myFFT = [[OouraFFT alloc] initForSignalsOfLength:numFrequencies*2 andNumWindows:kNumFFTWindows];
for(long i=0; i<myFFT.dataLength; i++)
{
myFFT.inputData[i] = (double)testBuffer[i];
}
[myFFT calculateWelchPeriodogramWithNewSignalSegment];
for (int i=0;i<myFFT.dataLength/2;i++) {
NSLog(#"the spectrum data %d is %f ",i,myFFT.spectrumData[i]);
}
and my out out log something like
Everything checks out for 4096 samples of data
Set up all values, about to init window type 2
the spectrum data 0 is 42449.823771
the spectrum data 1 is 39561.024361
.
.
.
.
the spectrum data 2047 is -42859933071799162597786649755206634193030992632381393031503716729604050285238471034480950745056828418192654328314899253768124076782117157451993697900895932215179138987660717342012863875797337184571512678648234639360.000000
I know I am not calculating the magnitude yet but how can I detect that sound has 20 hz in it? Do I need to learn Goertzel algorithm?
There are many ways to convey information which gets inserted into then retrieved from some preexisting wave pattern. The information going in can vary things like the amplitude (amplitude modulation) or freq (frequency modulation), etc. Do you have a strategy here ? Note that the density of information you wish to convey can be influenced by such factors as the modulating frequency (higher frequencies naturally can convey more information as it can resolve changes more times per second).
Another approach is possible if both the sender and receiver have the source audio (reference). In this case the receiver could do a diff between reference and actual received audio to resolve out the transmitted extra information. A variation on this would be to have the sender send ~~same~~ audio twice, first send the reference untouched audio followed by a modulated version of this same reference audio that way the receiver just does a diff between these two audibly ~~same~~ clips to resolve out the embedded audio.
Going back to your original question ... if the sender and receiver have an agreement ... say for some time period X the reference pure 20 Hz tone is sent followed by another period X that 20 Hz tone is modulated by your input information to either alter its amplitude or frequency ... then just repeat this pattern ... on receiving side they just do a diff between each such pair of time periods to resolve your modulated information ... for this to work the source audio cannot have any tones below some freq say 100 Hz (you remove such frequency band if needed) just to eliminate interference from source audio ... you have not mentioned what kind of data you wish to transmit ... if its voice you first would need to stretch it out in effect lowering its frequency range from the 1 kHz range down to your low 20 Hz range ... once result of diff is available on receiving side you then squeeze this curve to restore it back to normal voice range of 1kHz ... maybe more work than you have time for but this might just work ... real AM/FM radio uses modulation to send voice over mega Hz carrier frequencies so it can work

Resources