Audio Fingerprinting for Music Box - signal-processing

I am working on an iOS app that needs to recognize 4 different songs played on a music box. I tried using echoprint's fingerprint codegen (https://github.com/rexstjohn/echoprint-ios-sample), and while it works great if I play back an exact recording of the same song, it doesn't work if playing the music box "live". Evidently whatever differences in timing / volume / etc. arise between different instances of playing the music box lead it to detect as distinct songs.
Does anyone know a library or technique that would work best for this particular application? Or perhaps a way to hack echo print to be more forgiving for this music box application?

Related

How to manipulate (slow down/change pitch) audio from Spotofy?

From what I've found online so far, it seems as though the Spotify SDK does not allow developers to manipulate audio (by slowing down songs/changing their pitch); all it allows you to do is simply play the audio at its original pitch/speed.
What I'm wondering is how apps like the Amazing Slow Downer (https://apps.apple.com/us/app/amazing-slow-downer/id308998718) are able to manipulate audio pitch/tempo from Spotify/Apple Music?
I am trying to accomplish this for an iOS app I am building, but I have no idea where to start -- I would appreciate any help in pointing me in the right direction to learn how I can do this!

Designing a library for Hardware-accelerated unsupported containers on iOS (and Airplay)

I'm trying to put together an open source library that allows iOS devices to play files with unsupported containers, as long as the track formats/codecs are supported. e.g.: a Matroska video (MKV) file with an H264 video track and an AAC audio track. I'm making an app that surely could use that functionality and I bet there are many more out there that would benefit from it. Any help you can give (by commenting here or—even better— collaborating with me) is much appreciated. This is where I'm at so far:
I did a bit of research trying to find out how players like AVPlayerHD or Infuse can play non-standard containers and still have hardware acceleration. It seems like they transcode small chunks of the whole video file and play those in sequence instead.
It's a good solution. But if you want to throw that video to an Apple TV, things don't work as planned since the video is actually a bunch of smaller chunks being played as a playlist. This site has way more info, but at its core streaming to Apple TV is essentially a progressive download of the MP4/MPV file being played.
I'm thinking a sort of streaming proxy is the way to go. For the playing side of things, I've been investigating AVSampleBufferDisplayLayer (more info here) as a way of playing the video track. I haven't gotten to audio yet. Things get interesting when you think about the AirPlay side of things: by having a "container proxy", we can make any file look like it has the right container without the file size implications of transcoding.
It seems like GStreamer might be a good starting point for the proxy. I need to read up on it; I've never used it before. Does this approach sound like a good one for a library that could be used for App Store apps?
Thanks!
Finally got some extra time to go over GStreamer. Especially this article about how it is already updated to use the hardware decoding provided by iOS 8. So no need to develop this; GStreamer seems to be the answer.
Thanks!
The 'chucked' solution is no longer necessary in iOS 8. You should simply set up a video decode session and pass in NALUs.
https://developer.apple.com/videos/wwdc/2014/#513

iOS process audio stream while playing video

I am trying to create a video player for iOS, but with some additional audio track reading. I have been checking out MPVideoPlayerController, and also AVPlayer in the AV Foundation, but it's all kinda vague.
What I am trying to do is play a video (from a local .mp4), and while the movie is playing get the current audio buffer/frames, so I can do some calculations and other (not video/audio relevant) actions that depend on the currently played audio. This means that the video should keep on playing, with its audio tracks, but I also want the live raw audio data for calculations (like i.e.: getting the amplitude for certain frequency's).
Does anyone have an example or hints to do this ? Of-course I checked out Apple's AV Foundation library documentation, but it was not clear enough for me.
After a really (really) long time Googling, I found a blog post that describes MTAudioProcessingTap. Introduced in iOS 6.0, it solves my problem perfectly.
The how-to/blogpost can be found here : http://chritto.wordpress.com/2013/01/07/processing-avplayers-audio-with-mtaudioprocessingtap/
I Hope it helps anyone else now....... The only thing popping up for me Googling (with a lot of different terms) is my own post here. And as long as you don't know MTAudioProcessingTap exists, you don't know how to Google for it :-)

Multiple HTML5 media elements on one page in iOS (iPad)

My research has led me to learn that Apple's media element handler is a singleton, meaning I can't have a video playing while an audio is playing in the background. I'm tasked to build a slideshow presentation framework and the client wants a background audio track, timed audio voice-overs which match bullet points, and variable media which can either be an image or video - or a timed cycle of multiple media elements.
Of course, none of the media works on iOS. Each media element cancels out the previous.
My initial thought is to embed the voice-over audio into the video when there's a video present, but there's an existing Flash version of this setup which depends on existing assets so I pretty much have to use what's delivered.
Is there ANY work-around for this? I'm testing on iOS 4.3.5. The smartest devs in the world are on this site - we've got to be able to come up with something.
EDIT: Updated my iPad to iOS 5.0.1 and the issue remains.
How about do it with CSS to do the trick.
Maybe you know about a company called vdopia that distribute video ad on mobile.
http://mobile.vdopia.com/index.php?page=mobilewebsolutions
They claim to developed what so called vdo video format, that actually just to do a css sprite running on that :D
I mean you could have your "video" as a framed image, then attach html5 audio tag on there.
I would like to know your response
Are you working on a Web App or on a Native Application?
If you are working on a Web App you're in a world of hurt. This is because you simply do not have much control over things that Mobile Safari doesn't provide right away.
If this is the case I would come forth and be honest with the stakeholders.
If you are working on a Native Application you can resort to a mechanism that involves some back and forth communication between UIWebView and ObjC. It's actually doable.
The idea is the following:
Insert special <object> elements in your HTML5 documents, that you handcraft yourself according to your needs, taking special care to maintain the attr-* naming convention for non-standard attributes.
Here you could insert IDs, paths and other control variables in the multimedia artifacts that you want to play.
Then you could actually build some javascript (on top of jQuery,p.e.) that communicates with ObjC through the delegation mechanism on the UIWebView or through HTTP. I'll go over this choice down below.
Say that on $(document).ready() you go through all the objects that have a special class. A class that you carefully choose to identify all the special <object>.
You build a list of such objects and pass them on to the ObjC part of your application. You could easily serialize such list using JSON.
Then in ObjC you can do what you want with them. Play them through AVPlayer or some other framework whenever you want them played (again you would resort to a JS - ObjC bridge to actually signal the native part to play a particular element).
You can "communicate" with ObjC through the delegation pattern in UIWebView or through HTTP.
You would then have a JS - ObjC bridge in place.
The HTTP approach makes sense in some cases but it involves a lot of extra code and is resource hungry.
If you are building an ObjC application and want further details on how to actually build an ObjC - JS bridge that fits these needs get back to us :)
I'm halting this post as of now because it would be nice to know if it is in fact a Native App.
Cheers.
This is currently not possible. As you notice when a video plays it takes up the full screen with quicktime and moves the browser to the background. The only solution at this time is to merge the audio and video together into an mp4 format and play that single item.
If I understand you correctly, you are not able to merge the audio and video together because it relies on flash? Since iOS can't play flash you should merge the audio and video together and use flash as a backup. There are numerous html5 players which use javascript to try and play the html5 video first then fallback to flash for backup.
You mention there is an existing Flash setup of the video - is it a an swf file, could you import it into a video/audio editing software and add an audio track on top?
Something like this: http://www.youtube.com/watch?v=J2vvH7oi8m8&feature=youtube_gdata_player
Also, if it is a Flash file, will you be converting it to an avi or like for iOS? If you'd have to do it anyway, there is your chance for adding an audio track.
Could you use a webservice to merge the streams in real time with FFMpeg and then stream one output to quicktime?
To elaborate maybe a library like http://directshownet.sourceforge.net/about.html could also work. It looks like they have method
DESCombine – A class library that uses DirectShow Editing Services to combine video and audio files (or pieces of files) into a single output file. A help file (DESCombine.chm) is provided for using the class.
This could then be used to return the resulting data as the response to the call and loaded via the HTML5 player.

Virtual Instrument App Recording Functionality With RemoteIO

I'm developing a virtual instrument app for iOS and am trying to implement a recording function so that the app can record and playback the music the user makes with the instrument. I'm currently using the CocosDenshion sound engine (with a few of my own hacks involving fades etc) which is based on OpenAL. From my research on the net it seems I have two options:
Keep a record of the user's inputs (ie. which notes were played at what volume) so that the app can recreate the sound (but this cannot be shared/emailed).
Hack my own low-level sound engine using AudioUnits & specifically RemoteIO so that I manually mix all the sounds and populate the final output buffer by hand and hence can save said buffer to a file. This will be able to be shared by email etc.
I have implemented a RemoteIO callback for rendering the output buffer in the hope that it would give me previously played data in the buffer but alas the buffer is always all 00.
So my question is: is there an easier way to sniff/listen to what my app is sending to the speakers than my option 2 above?
Thanks in advance for your help!
I think you should use remoteIO, I had a similar project several months ago and wanted to avoid remoteIO and audio units as much as possible, but in the end, after I wrote tons of code and read lots of documentations from third party libraries (including cocosdenshion) I end up using audio units anyway. More than that, it's not that hard to set up and work with. If you however look for a library to do most of the work for you, you should look for one written a top of core audio not open al.
You might want to take a look at the AudioCopy framework. It does a lot of what you seem to be looking for, and will save you from potentially reinventing some wheels.

Resources