I need to detect number of channels and the format of audio (interleaved or non-interleaved) from AVAssetTrack. I tried the following code to detect the number of channels. As can be seen in the code, there are two ways to detect number of channels. I want to know which one is more reliable and correct, or none of them perhaps (irrespective of audio format)?
if let formatDescriptions = track.formatDescriptions as? [CMAudioFormatDescription],
let audioFormatDesc = formatDescriptions.first,
let asbd = CMAudioFormatDescriptionGetStreamBasicDescription(audioFormatDesc)
{
//First way to detect number of channels
numChannels = asbd.pointee.mChannelsPerFrame
var aclSize:size_t = 0
var currentChannelLayout:UnsafePointer<AudioChannelLayout>? = nil
currentChannelLayout = CMAudioFormatDescriptionGetChannelLayout(audioFormatDesc, sizeOut: &aclSize)
if let currentChannelLayout = currentChannelLayout, aclSize > 0 {
let channelLayout = currentChannelLayout.pointee
//second way of detecting number of channels
numChannels = AudioChannelLayoutTag_GetNumberOfChannels(channelLayout.mChannelLayoutTag)
}
}
And I don't know how to get audio format details (interleaved or non-interleaved). Looking for help in this.
Use the AudioStreamBasicDescription. All audio CMFormats have one, while the AudioChannelLayout is optional:
https://developer.apple.com/documentation/coremedia/1489137-cmaudioformatdescriptiongetchann?language=objc
AudioChannelLayouts are optional; this API returns NULL if one doesn’t exist.
Related
Hello fellow AudioKit users,
I'm trying to setup AudioKit 5 with a playback time indication, and am having trouble.
If I use AudioPlayer's duration property, this is the total time of the audio file, not the current playback time.
ex:
let duration = player.duration
Always gives the file's total time.
Looking at old code from AKAudioPlayer, it seemed to have a "currentTime" property.
The migration guide (https://github.com/AudioKit/AudioKit/blob/v5-main/docs/MigrationGuide.md) mentions some potentially helpful classes from the old version, however the "AKTimelineTap" has no replacement with no comments from the developers... nice.
Also still not sure how to manipulate the current playback time either...
I've also checked out Audio Kit 5's Cookbooks, however this is for adding effects and nodes, not necessarily for playback display, etc..
Thanks for any help with this new version of AudioKit.
You can find playerNode in AudioPlayer, it's AVAudioPlayerNode class.
Use lastRenderTime and playerTime, you can calculate current time.
ex:
// get playerNode in AudioPlayer.
let playerNode = player.playerNode
// get lastRenderTime, and transform to playerTime.
guard let lastRenderTime = playerNode.lastRenderTime else { return }
guard let playerTime = playerNode.playerTime(forNodeTime: lastRenderTime) else { return }
// use sampleRate and sampleTime to calculate current time in seconds.
let sampleRate = playerTime.sampleRate
let sampleTime = playerTime.sampleTime
let currentTime = Double(sampleTime) / sampleRate
I want to count the number of audio and subtitle tracks in given AVURLAsset. How do I do that using Swift ?
For given manifest expected answer should be 2 subtitles and 3 audio tracks
#EXTM3U
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subtitle",NAME="#1 Fre",DEFAULT=YES,FORCED=NO,LANGUAGE="fre",URI="subtitles/planete_interdite_subtitle3_fre_vtt.m3u8"
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID="subtitle",NAME="#3 Eng",DEFAULT=NO,FORCED=NO,LANGUAGE="eng",URI="subtitles/planete_interdite_subtitle5_eng_vtt.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="hdready",NAME="#2 Eng",DEFAULT=NO,AUTOSELECT=YES,LANGUAGE="eng",URI="hdready/planete_interdite_4160_n264_720p_audio2_eng.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="fullhd",NAME="#1 Fre",DEFAULT=YES,AUTOSELECT=YES,LANGUAGE="fre",URI="fullhd/planete_interdite_8256_n264_1080p_audio1_fre.m3u8"
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID="fullhd",NAME="#2 Eng",DEFAULT=NO,AUTOSELECT=YES,LANGUAGE="eng",URI="fullhd/planete_interdite_8256_n264_1080p_audio2_eng.m3u8"
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=314000,CODECS="avc1.66.30,mp4a.40.2",RESOLUTION=256x144,AUDIO="low",SUBTITLES="subtitle"low/planete_interdite_228_h264_144p.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=638000,CODECS="avc1.66.30,mp4a.40.2",RESOLUTION=426x240,AUDIO="medium",SUBTITLES="subtitle"medium/planete_interdite_500_h264_240p.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1942000,CODECS="avc1.66.30,mp4a.40.2",RESOLUTION=640x360,AUDIO="high",SUBTITLES="subtitle"high/planete_interdite_1228_q264_360p.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=3274000,CODECS="avc1.66.30,mp4a.40.2",RESOLUTION=854x480,AUDIO="veryhigh",SUBTITLES="subtitle"veryhigh/planete_interdite_2080_q264_480p.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=4814000,CODECS="avc1.4d001f,mp4a.40.2",RESOLUTION=1280x720,AUDIO="hdready",SUBTITLES="subtitle"hdready/planete_interdite_4160_n264_720p.m3u8
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=9501000,CODECS="avc1.640028,mp4a.40.2",RESOLUTION=1920x1080,AUDIO="fullhd",SUBTITLES="subtitle"fullhd/planete_interdite_8256_n264_1080p.m3u8
Thanks
< /Hey >
It looks like you will need to be using the function AVAsset.tracks() :
func tracks(withMediaType mediaType: AVMediaType) -> [AVAssetTrack]
Then you can specify the mediaType argument out of the selection they provide here. From what you can specify, it looks like you will have to use AVMediaType.audio & AVMediaType.subtitle.
i.e. To get the number of audio tracks
var audioTracks = YourAVURLAsset.tracks(AVMediaType.audio);
var nAudioTracks = audioTracks.count;
or number of subtitle tracks
var subtitleTracks = YourAVURLAsset.tracks(AVMediaType.subtitle);
var nSubtitleTracks = var subtitleTracks.count;
Note: Code is untested so may need to be adjusted a bit but hopefully this gets you going in the right direction
I would like to make a 5-band audio equalizer (60Hz, 230Hz, 910Hz, 4kHz, 14kHz) using AVAudioEngine. I would like to have the user input gain per band through a vertical slider and accordingly adjust the audio that is playing. I tried using AVAudioUnitEQ to do this, but I hear no difference when playing the audio. I tried to hardcode in values to specify a gain at each frequency, but it still does not work. Here is the code I have:
var audioEngine: AVAudioEngine = AVAudioEngine()
var equalizer: AVAudioUnitEQ!
var audioPlayerNode: AVAudioPlayerNode = AVAudioPlayerNode()
var audioFile: AVAudioFile!
// in viewDidLoad():
equalizer = AVAudioUnitEQ(numberOfBands: 5)
audioEngine.attach(audioPlayerNode)
audioEngine.attach(equalizer)
let bands = equalizer.bands
let freqs = [60, 230, 910, 4000, 14000]
audioEngine.connect(audioPlayerNode, to: equalizer, format: nil)
audioEngine.connect(equalizer, to: audioEngine.outputNode, format: nil)
for i in 0...(bands.count - 1) {
bands[i].frequency = Float(freqs[i])
}
bands[0].gain = -10.0
bands[0].filterType = .lowShelf
bands[1].gain = -10.0
bands[1].filterType = .lowShelf
bands[2].gain = -10.0
bands[2].filterType = .lowShelf
bands[3].gain = 10.0
bands[3].filterType = .highShelf
bands[4].gain = 10.0
bands[4].filterType = .highShelf
do {
if let filepath = Bundle.main.path(forResource: "song", ofType: "mp3") {
let filepathURL = NSURL.fileURL(withPath: filepath)
audioFile = try AVAudioFile(forReading: filepathURL)
audioEngine.prepare()
try audioEngine.start()
audioPlayerNode.scheduleFile(audioFile, at: nil, completionHandler: nil)
audioPlayerNode.play()
}
} catch _ {}
Since the low frequencies have a gain of -10 and the high frequencies have a gain of 10, there should be a very noticeable difference when playing any media. However, when the media starts playing, it sounds the same as if played without any equalizer attached.
I'm not sure why this is happening, but I tried several different things to debug. I thought that it might be the order of the functions so I tried switching it so that audioEngine.connect is called after adjusting all of the bands, but that did not make a difference either.
I tried this same code with using an AVAudioUnitTimePitch, and it worked perfectly, so I am dumbfounded as to why it does not work with AVAudioUnitEQ.
I do not want to use any third-party libraries or cocoa pods for this project, I would like to do it using AVFoundation alone.
Any help would be greatly appreciated!
Thanks in advance.
AVAudioUnitEQFilterParameters
Looking through the documentation, I noticed that I had messed with all of the parameters except bypass and it seems that changing this flag fixed everything!
So, I believe the main issue here is that each AVAudioUnitEQ band must not be bypassed by the provided system values rather than the values the programmer sets.
So, I changed
for i in 0...(bands.count - 1) {
bands[i].frequency = Float(freqs[i])
}
to
for i in 0...(bands.count - 1) {
bands[i].frequency = Float(freqs[i])
bands[i].bypass = false
bands[i].filtertype = .parametric
}
and everything started working. Furthermore, to make an effective equalizer that allows the user to modify individual frequencies the filtertype for each band should be set to .parametric.
I am still unsure on what I should set the bandwith to, but I can probably check online for that or just mess with it until the sound matches a different equalizer application.
I am working on an application that plays back video and allows the user to scrub forwards and backwards in the video. The scrubbing has to happen smoothly, so we always re-write the video with SDAVAssetExportSession with the video compression property AVVideoMaxKeyFrameIntervalKey:#1 so that each frame will be a keyframe and allow smooth reverse scrubbing. This works great and provides smooth playback. The application uses video from a variety of sources and can be recorded on android or iOS devices and even downloaded from the web and added to the application, so we end up with quite different encodings, some of which are already suited for scrubbing (each frame is a keyframe). Is there a way to detect the keyframe interval of a video file so I can avoid needless video processing? I have been through much of AVFoundation's docs and don't see an obvious way to get this information. Thanks for any help on this.
If you can quickly parse the file without decoding the images by creating an AVAssetReaderTrackOutput with nil outputSettings. The frame sample buffers you encounter have an attachment array containing a dictionary with useful information, include whether the frame depends on other frames, or whether other frames depend on it. I would interpret that former as indicating a keyframe, although it gives me some low number (4% keyframes in one file?). Anyway, the code:
let asset = AVAsset(url: inputUrl)
let reader = try! AVAssetReader(asset: asset)
let videoTrack = asset.tracks(withMediaType: AVMediaTypeVideo)[0]
let trackReaderOutput = AVAssetReaderTrackOutput(track: videoTrack, outputSettings: nil)
reader.add(trackReaderOutput)
reader.startReading()
var numFrames = 0
var keyFrames = 0
while true {
if let sampleBuffer = trackReaderOutput.copyNextSampleBuffer() {
// NB: not every sample buffer corresponds to a frame!
if CMSampleBufferGetNumSamples(sampleBuffer) > 0 {
numFrames += 1
if let attachmentArray = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, false) as? NSArray {
let attachment = attachmentArray[0] as! NSDictionary
// print("attach on frame \(frame): \(attachment)")
if let depends = attachment[kCMSampleAttachmentKey_DependsOnOthers] as? NSNumber {
if !depends.boolValue {
keyFrames += 1
}
}
}
}
} else {
break
}
}
print("\(keyFrames) on \(numFrames)")
N.B. This only works for local file assets.
p.s. you don't say how you're scrubbing or playing. An AVPlayerViewController and an AVPlayer?
Here is the Objective C version of the same answer. After implementing this and using it, Videos that should have all keyframes are returning about 96% keyframes from this code. I'm not sure why, so I am using that number as a determining factor even though I would like it to be more accurate. I am also only looking through the first 600 frames or the end of the video (whichever comes first) since I don't need to read through a whole 20 minute video to make this determination.
+ (BOOL)videoNeedsProcessingForSlomo:(NSURL*)fileUrl {
BOOL needsProcessing = YES;
AVAsset* anAsset = [AVAsset assetWithURL:fileUrl];
NSError *error;
AVAssetReader *assetReader = [AVAssetReader assetReaderWithAsset:anAsset error:&error];
if (error) {
DLog(#"Error:%#", error.localizedDescription);
return YES;
}
AVAssetTrack *videoTrack = [[anAsset tracksWithMediaType:AVMediaTypeVideo] objectAtIndex:0];
AVAssetReaderTrackOutput *trackOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:videoTrack outputSettings:nil];
[assetReader addOutput:trackOutput];
[assetReader startReading];
float numFrames = 0;
float keyFrames = 0;
while (numFrames < 600) { // If the video is long - only parse through 20 seconds worth.
CMSampleBufferRef sampleBuffer = [trackOutput copyNextSampleBuffer];
if (sampleBuffer) {
// NB: not every sample buffer corresponds to a frame!
if (CMSampleBufferGetNumSamples(sampleBuffer) > 0) {
numFrames += 1;
NSArray *attachmentArray = ((NSArray*)CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, false));
if (attachmentArray) {
NSDictionary *attachment = attachmentArray[0];
NSNumber *depends = attachment[(__bridge NSNumber*)kCMSampleAttachmentKey_DependsOnOthers];
if (depends) {
if (depends.boolValue) {
keyFrames += 1;
}
}
}
}
}
else {
break;
}
}
needsProcessing = keyFrames / numFrames < 0.95f; // If more than 95% of the frames are keyframes - don't decompress.
return needsProcessing;
}
Using kCMSampleAttachmentKey_DependsOnOthers was giving me 0 key frames in some cases, when ffprobe would return key frames.
To get the same number of key frames as ffprobe shows, I used:
if attachment[CMSampleBuffer.PerSampleAttachmentsDictionary.Key.notSync] == nil {
keyFrames += 1
}
In the CoreMedia header it says:
/// Boolean (absence of this key implies Sync)
public static let notSync: CMSampleBuffer.PerSampleAttachmentsDictionary.Key
for dependsOnOthers key it says:
/// `true` (e.g., non-I-frame), `false` (e.g. I-frame), or absent if
/// unknown
public static let dependsOnOthers: CMSampleBuffer.PerSampleAttachmentsDictionary.Key
I've been stuck on this problem for days now and have looked through nearly every related StackOverflow page. Through this, I now have a much greater understanding of what FFT is and how it works. Despite this, I'm having extreme difficulties implementing it into my application.
In short, what I am trying to do is make a spectrum visualizer for my application (Similar to this). From what I've gathered, I'm pretty sure I need to use the magnitudes of the sound as the heights of my bars. So with all this in mind, currently I am able to analyze an entire .caf file all at once. To do this, I am using the following code:
let audioFile = try! AVAudioFile(forReading: soundURL!)
let frameCount = UInt32(audioFile.length)
let buffer = AVAudioPCMBuffer(PCMFormat: audioFile.processingFormat, frameCapacity: frameCount)
do {
try audioFile.readIntoBuffer(buffer, frameCount:frameCount)
} catch {
}
let log2n = UInt(round(log2(Double(frameCount))))
let bufferSize = Int(1 << log2n)
let fftSetup = vDSP_create_fftsetup(log2n, Int32(kFFTRadix2))
var realp = [Float](count: bufferSize/2, repeatedValue: 0)
var imagp = [Float](count: bufferSize/2, repeatedValue: 0)
var output = DSPSplitComplex(realp: &realp, imagp: &imagp)
vDSP_ctoz(UnsafePointer<DSPComplex>(buffer.floatChannelData.memory), 2, &output, 1, UInt(bufferSize / 2))
vDSP_fft_zrip(fftSetup, &output, 1, log2n, Int32(FFT_FORWARD))
var fft = [Float](count:Int(bufferSize / 2), repeatedValue:0.0)
let bufferOver2: vDSP_Length = vDSP_Length(bufferSize / 2)
vDSP_zvmags(&output, 1, &fft, 1, bufferOver2)
This works fine and outputs a long array of data. However, the problem with this code is it analyzes the entire audio file at once. What I need is to be analyzing the audio file as it is playing, very similar to this video: Spectrum visualizer.
So I guess my question is this: How do you perform FFT analysis while the audio is playing?
Also, on top of this, how do I go about converting the output of an FFT analysis to actual heights for a bar? One of the outputs I received for an audio file using the FFT analysis code from above was this: http://pastebin.com/RBLTuGx7. The only reason for the pastebin is due to how long it is. I'm assuming I average all these numbers together and use those values instead? (Just for reference, I got that array by printing out the 'fft' variable in the code above)
I've attempted reading through the EZAudio code, however I am unable to find how they are reading in samples of audio in live time. Any help is greatly appreciated.
Here's how it is done in AudioKit, using EZAudio's FFT tools:
Create a class for your FFT that will hold the data:
#objc public class AKFFT: NSObject, EZAudioFFTDelegate {
internal let bufferSize: UInt32 = 512
internal var fft: EZAudioFFT?
/// Array of FFT data
public var fftData = [Double](count: 512, repeatedValue: 0.0)
...
}
Initialize the class and setup the FFT. Also install the tap on the appropriate node.
public init(_ input: AKNode) {
super.init()
fft = EZAudioFFT.fftWithMaximumBufferSize(vDSP_Length(bufferSize), sampleRate: 44100.0, delegate: self)
input.avAudioNode.installTapOnBus(0, bufferSize: bufferSize, format: AKManager.format) { [weak self] (buffer, time) -> Void in
if let strongSelf = self {
buffer.frameLength = strongSelf.bufferSize;
let offset: Int = Int(buffer.frameCapacity - buffer.frameLength);
let tail = buffer.floatChannelData[0];
strongSelf.fft!.computeFFTWithBuffer(&tail[offset], withBufferSize: strongSelf.bufferSize)
}
}
}
Then implement the callback to load your internal fftData array:
#objc public func fft(fft: EZAudioFFT!, updatedWithFFTData fftData: UnsafeMutablePointer<Float>, bufferSize: vDSP_Length) {
dispatch_async(dispatch_get_main_queue()) { () -> Void in
for i in 0...511 {
self.fftData[i] = Double(fftData[i])
}
}
}
AudioKit's implementation may change so you should check https://github.com/audiokit/AudioKit/ to see if any improvements were made. EZAudio is at https://github.com/syedhali/EZAudio