Audio length not consistent between librosa and pydub - machine-learning

I am new to audio prosessing in machine learning.
I processed a bunch of audio files and take one as example, I use pydub to make an audio file exactly to 6s as I expected using AudioSegment slicing or adding silence to modify the original audio file.
However, when I was converting the audio file to mfcc using librosa.feature.mfcc, I realized that my 6s file in pydub was not displayed as 6s using librosa.load. Can someone tell me why is that? My sample rate didn't change though, before and after the audio slicing it is always 44100.
Here a picture of my code of checking the length and the output.

Related

Transcoding fMP4 to HLS while writing on iOS using FFmpeg

TL;DR
I want to convert fMP4 fragments to TS segments (for HLS) as the fragments are being written using FFmpeg on an iOS device.
Why?
I'm trying to achieve live uploading on iOS while maintaining a seamless, HD copy locally.
What I've tried
Rolling AVAssetWriters where each writes for 8 seconds, then concatenating the MP4s together via FFmpeg.
What went wrong - There are blips in the audio and video at times. I've identified 3 reasons for this.
1) Priming frames for audio written by the AAC encoder creating gaps.
2) Since video frames are 33.33ms long, and audio frames 0.022ms long, it's possible for them to not line up at the end of a file.
3) The lack of frame accurate encoding present on Mac OS, but not available for iOS Details Here
FFmpeg muxing a large video only MP4 file with raw audio into TS segments. The work was based on the Kickflip SDK
What Went Wrong - Every once in a while an audio only file would get uploaded, with no video whatsoever. Never able to reproduce it in-house, but it was pretty upsetting to our users when they didn't record what they thought they did. There were also issues with accurate seeking on the final segments, almost like the TS segments were incorrectly time stamped.
What I'm thinking now
Apple was pushing fMP4 at WWDC this year (2016) and I hadn't looked into it much at all before that. Since an fMP4 file can be read, and played while it's being written, I thought that it would be possible for FFmpeg to transcode the file as it's being written as well, as long as we hold off sending the bytes to FFmpeg until each fragment within the file is finished.
However, I'm not familiar enough with the FFmpeg C API, I only used it briefly within attempt #2.
What I need from you
Is this a feasible solution? Is anybody familiar enough with fMP4 to know if I can actually accomplish this?
How will I know that AVFoundation has finished writing a fragment within the file so that I can pipe it into FFmpeg?
How can I take data from a file on disk, chunk at a time, pass it into FFmpeg and have it spit out TS segments?
Strictly speaking you don't need to transcode the fmp4 if it contains h264+aac, you just need to repackage the sample data as TS. (using ffmpeg -codec copy or gpac)
Wrt. alignment (1.2) I suppose this all depends on your encoder settings (frame rate, sample rate and GOP size). It is certainly possible to make sure that audio and video align exactly at fragment boundaries (see for example: this table). If you're targeting iOS, I would recommend using HLS protocol version 3 (or 4) allowing timing to be represented more accurately. This also allows you to stream audio and video separately (non-multiplexed).
I believe ffmpeg should be capable of pushing a live fmp4 stream (ie. using a long-running HTTP POST), but playout requires origin software to do something meaningful with it (ie. stream to HLS).

Changing audio bit rate after recording

I'm recording audio files at a bit rate of 44.1khz. I like having high quality audio for playback purposes. However, when I want to export via text or email, the audio files fail to export because they're larger than 15MB (usually for audio files +3mins). Is there a way to reduce the bit rate only when I want to export? I've seen the following tutorial, but I'd rather keep my files as m4a rather than converting to aac:
http://atastypixel.com/blog/easy-aac-compressed-audio-conversion-on-ios/.
You can use AVAssetReader and AVAssetWriter to transcode an audio file to one with different parameters (lower bit rate, higher compression, etc.). Just because you create a new (temporary?) audio file for export doesn't force you to delete the current higher quality audio file you want for playback.

iOS: 44k audio file should play at 22k sample rate

In my audio app I need to be able to change the format of an audio file (AIFF), more specifically the sample rate. The audio session is running at 22050 Hz, and the audio file itself is created in libpd/Pure Data also running the same sample rate. The problem is that the file appears to be a 44100 Hz audio file, which means that when played back on the device it plays twice as fast.
Is it possible to change the header of the file or something so that its sample rate becomes 22050 Hz, without resampling the audio?
I have seen other related topics where one suggestion is to play the file at half speed. However, this will not solve my problem, as the file will be further compressed to AAC for uploading to a server, and it need to be able to play back at correct speed on other devices.
Thanks!
I discovered that the problem was caused by a bug in a file creating object in Pure Data. No matter what sample rate I set the file to be, it ended up being 44100 Hz. So I simply switched to using wav files, and the files ended up having the correct sample rate of 22050, and now play back at correct speed.
All good now!

AUGraph setup on iOS

I am designing an AUGraph for an iOS application and would appreciate help on the following things.
If I want to play a number of audio files at once, does each file need an audio unit?
From the Core-Audio docs
Linear PCM and IMA/ADPCM (IMA4) audio You can play multiple linear PCM or IMA4 format sounds simultaneously in iOS without incurring CPU resource problems.
AAC, MP3, and Apple Lossless (ALAC) audio Playback for AAC, MP3, and Apple Lossless (ALAC) sounds uses efficient hardware-based decoding on iPhone and iPod touch. You can play only one such sound at a time.
So multiple AAC or MP3 files cannot be played at the same time. What is the optimal LPCM format to play multiple sounds at once?
Does this apply to Audio-Units too, as this in under the AudioQueue documentation.
Can an audio unit in an AUGraph be inactive? If an AUGraph looks like this
Speaker/output < recorder unit < mixer unit < number of audio file playing units
what happens if the recorder is not active, would it still pull, but just not write the buffers to a file?
No; you need to use the mixer audio unit. Check this:
http://developer.apple.com/library/ios/DOCUMENTATION/MusicAudio/Conceptual/AudioUnitHostingGuide_iOS/ConstructingAudioUnitApps/ConstructingAudioUnitApps.html#//apple_ref/doc/uid/TP40009492-CH16-SW1
Mostly reading the document above, wrapping the sample code in a class and creating a pair of utility structures, I coded this 'Simple Sound Engine' from scratch:
ttp://nicolasmiari.com/blog/a-simple-sound-engine-for-ios-using-the-audio-unit-framework/
(Link to article in my blog containing the source code). Sorry, moved blog to Jekyll/Github and this article didn't make the cut.
...I was going to start a repo on github, but it's too much trouble. I am a visual guy, still pretty much git-phobic. Okay, that was a long time ago... Now I use git from the command line :-)
You can use it as-is, or extract the Audio Unit-related code and adapt it to your project.
I believe the Cocos Denshion 'Simple Audio Engine' does pretty much the same thing, but haven't checked the source code.
Known issues
If you have an exception breakpoint set for C++ exceptions, when debugging, the code will stop 2 or 3 times on AUGraphInitialize(). This is a 'non-crashing' exception, so you can click on continue and the code works OK.
To convert your wav files to the uncompressed .caf format, use this command on the Terminal:
%afconvert -f caff -d LEI16 mysoundFile.wav mySoundFile.caf
EDIT: So I created a GitHub repo after all:
https://github.com/nicolas-miari/Sound-Engine
Both ordinary common .wav and .caf files contain raw PCM audio samples, and can be played without hardware assist or DSP processing if already at the destination sample rate.
When there's no audio file or other synthesized data to feed an audio unit that's pulling buffers, the usual practice is to feed it buffers of silence (or perhaps a taper to zero if the previous buffer ended with non-zero amplitude).

Flex 4 Sound class -- detect sample rate of .mp3

I'm working on an Adobe Air application written in Flex 4 that plays .mp3 audio files on the user's computer. Note: these are are not audio files shipped with the application -- they are .mp3's on the user's computer that they select for playback through the application.
The application works fine for .mp3s encoded at 44.1 kHz, but can give unpredictable results if other sample rates are used. I've done plenty of research to know the limitations of the Sound class and how .mp3 will basically be my only option in Flex.
My question is: Is there a way to detect the sample rate of the .mp3 audio in Flex 4 ActionScript?
Rather than worry about making the application work well with non-standard sample rates, at this point I'd like to just catch those cases and prevent files with non-44.1 kHz sample rates from loading.
To be specific: if a user selects an .mp3 for playback that has been encoded at 48 kHz, for example, I'd like to be able to detect that case and take action preventing the file from loading and then announce to the user that this is not a supported audio file.
Thanks in advance,
Fitz
Use mp3infoutil

Resources