I Have a movie sample with audio transcription (For Blind People- There is a narrator explaining what is going on in the movie). I want to extract that.
What i so far tried was:
1- I have the sample without the transcription as well so i just imported both samples in Audacity. Inverted one and mixed. But it simply doesnt work ( Normalization is also applied)
2- I tooke the sample with audio description. splitted to mono, Took one channel and inverted. and mixed again. Now i have the movie without audio transcript. My intuition was that if i invert this result file again and mix it with the Actual one the other sounds should cross out and i would have the Narrator sound. But it did not happen! what shall i do now ? any suggestions ?
I have checked the following links so far :
http://www.howtogeek.com/61250/how-to-isolate-and-save-vocals-from-music-tracks-using-audacity/
http://www.labnol.org/software/tutorials/remove-vocals-song-mp3-music-instruments/1301/
Related
I have an audio format with 4 channels that I want to convert into 2 channels format. kAudioConverterChannelMap can only discard the extra inputs:
When In > Out, the first Out inputs are routed to the first Out outputs, and the remaining inputs are discarded.
Is it possible to specify channel map with kAudioConverterChannelMap to merge 1 + 2 channel into one and 3 + 4 channels into the second of the output format?
If not, what should I use for this conversion?
AFAIK Apple does not provide an API that reduces the channel count of an audio stream, apart from discarding extra channels.
Why? I guess because how you do this is such a personal choice that if they did think about it at all they may have come to the conclusion that there was no one approach would make everyone happy and so best instead to choose the approach that would make everyone angry: discard.
So what's the big deal? I think it's mainly to do with the "meaning" of the individual channels. Consider the "obviously simple" case of converting stereo to mono - it's surprisingly nuanced, depending on how the stereo audio was recorded. And that's in an ideal case where your 2-channel audio is actually declared to be "stereo" (e.g. via an AudioChannelLayoutTag like kAudioChannelLayoutTag_Stereo).
So even "tagged" multi-channel audio is terribly ambiguous, so the only person who knows how to interpret and reduce the channel count of your 4 channel audio is you.
That said, would it have killed them to provide a function that summed even channels with even and odd with odd using some Accelerate/vDSP array functions, like vDSP_vadd? Because that's probably what you want.
I want to create similar sound to theremin using touch coordinate on screen. I'm using y axis as frequency, x axis as amplitude.
Due to my small research I believe I can create it using AKOscillator or AKFMOscillator from AudioKit framework (please let me know if any other oscillator works better in this case). I'm open to other frameworks like built-in AudioToolbox (MIDINoteMessage etc.) if I can create similar sound to theremin.
Here it says theremin has two oscillators. One with fixed-frequency on 260kHz and one is dynamic between 257-260kHz. It superimposes their output (it takes difference of them I guess?). And it outputs between frequency between 0-3 kHz.
When I create sounds using AKFMOscillator with baseFrequency between 257-260 kHz, it sounds high-pitched.
When I try with one oscillator range between 0-3kHz it sounds very robotic. How I can simulate timbre of theremin?
How can I make it sound better? Should I mix two oscillators? I tried mixing with AKMixer but when both oscillators use same frequency and amplitude, it makes no difference.
I tried to mapping to nearest note (auto-tune), I tried limiting the frequency between 3-4 octaves. It sounds better but still not good as theremin.
What should use ( AKOscillator or AKFMOscillator, OscillatorBank), with which parameters (rampDuration, baseFrequency, modulationIndex, amplitude) to simulate more thereminish sound?
Update:
I did some more research and played with Synth One presets. Now, I know I need two oscillators mixed (both set to saw-shape wave). Changing ADSR(envelope) values to specific ranges creates richer sound (this gives the instrumental sound type). And a lfo to create the wavy (or spooky) sound effect. Playing notes (specific frequencies) creates good sounds, if you play every frequency in between note frequencies it doesn't sound good.
in my code i am using 2 AVAudioRecorder one for monitoring the audio and another one for recording. In that recording is good but problem is the second recorder can't able to record the first 2 words or 2 sec. sample like this suppose i am say like "hi how are you" means it will record the "are you" words. http://purplelilgirl.tumblr.com/post/9377269385/making-that-talking-app with use of this tutorial only i wrote the recording functionality to my code. any one facing this same issue. let me know please regarding on this.
I think you should keep recording it all the time with low or high threshold, what you need to do is just, start playing from time when your lowPassResults exceeds to 0.5 in case ajDanny's example.
I'm trying to develop an iPhone app that will use the camera to record only the last few minutes/seconds.
For example, you record some movie for 5 minutes click "save", and only the last 30s will be saved. I don't want to actually record five minutes and then chop last 30s (this wont work for me). This idea is called "Loop recording".
This results in an endless video recording, but you remember only last part.
Precorder app do what I want to do. (I want use this feature in other context)
I think this should be easily simulated with a Circular buffer.
I started a project with AVFoundation. It would be awesome if I could somehow redirect video data to a circular buffer (which I will implement). I found information only on how to write it to a file.
I know I can chop video into intervals and save them, but saving it and restarting camera to record another part will take time and it is possible to lose some important moments in the movie.
Any clues how to redirect data from camera would be appreciated.
Important! As of iOS 8 you can use VTCompressionSession and have direct access to the NAL units instead of having to dig through the container.
Well luckily you can do this and I'll tell you how, but you're going to have to get your hands dirty with either the MP4 or MOV container. A helpful resource for this (though, more MOV-specific) is Apple's Quicktime File Format Introduction manual
http://developer.apple.com/library/mac/#documentation/QuickTime/QTFF/QTFFPreface/qtffPreface.html#//apple_ref/doc/uid/TP40000939-CH202-TPXREF101
First thing's first, you're not going to be able to start your saved movie from an arbitrary point 30 seconds before the end of the recording, you'll have to use some I-Frame at approximately 30 seconds. Depending on what your Keyframe Interval is, it may be several seconds before or after that 30 second mark. You could use all I-frames and start from an arbitrary point, but then you'll probably want to re-encode the video afterward because it will be quite large.
SO knowing that, let's move on.
First step is when you set up your AVAssetWriter, you will want to set its AVAssetWriterInput's expectsMediaDataInRealTime property to YES.
In the captureOutput callback you'll be able to do an fread from the file you are writing to. The first fread will get you a little bit of MP4/MOV (whatever format you're using) header (i.e. 'ftyp' atom, 'wide' atom, and the beginning of the 'mdat' atom). You want what's inside the 'mdat' section. So the offset you'll start saving data from will be 36 or so.
Each read will get you 0 or more AVC NAL Units. You can find a listing of NAL unit types from ISO/IEC 14496-10 Table 7-1. They will be in a slightly different format than specified in Annex B, but it's fine. Additionally, there will only be IDR slices and non-IDR slices in the MP4/MOV file. IDR will be the I-Frame you're looking to hang onto.
The NAL unit format in the MP4/MOV container is as follows:
4 bytes - Size
[Size] bytes - NALU Data
data[0] & 0x1F - NALU Type
So now you have the data you're looking for. When you go to save this file, you'll have to update the MPV/MOV container with the correct length, sample count, you'll have to update the 'stsz' atom with the correct sizes for each sample and things like updating the media headers and track headers with the correct duration of the movie and so on. What I would probably recommend doing is creating a sample container on first run that you can more or less just overwrite/augment with the appropriate data for that particular movie. You'll want to do this because the encoders on the various iDevices don't all have the same settings and the 'avcC' atom contains encoder information.
You don't really need to know much about the AVC stream in this case, so you'll probably want to concentrate your experimenting around updating the container format you choose correctly. Good luck.
Is there an API in one of the iOS layers that I can use to generate a tone by just specifying its Hertz. What I´m looking to do is generate a DTMF tone. This link explains how DTMF tones consists of 2 tones:
http://en.wikipedia.org/wiki/Telephone_keypad
Which basically means that I should need playback of 2 tones at the same time...
So, does something like this exist:
SomeCleverPlayerAPI(697, 1336);
If spent the whole morning searching for this, and have found a number of ways to playback a sound file, but nothing on how to generate a specific tone. Does anyone know, please...
Check out the AU (AudioUnit) API. It's pretty low-level, but it can do what you want. A good intro (that probably already gives you what you need) can be found here:
http://cocoawithlove.com/2010/10/ios-tone-generator-introduction-to.html
There is no iOS API to do this audio synthesis for you.
But you can use the Audio Queue or Audio Unit RemoteIO APIs to play raw audio samples, generate an array of samples of 2 sine waves summed (say 44100 samples for 1 seconds worth), and then copy the results in the audio callback (1024 samples, or whatever the callback requests, at a time).
See Apple's aurioTouch and SpeakHere sample apps for how to use these audio APIs.
The samples can be generated by something as simple as:
sample[i] = (short int)(v1*sinf(2.0*pi*i*f1/sr) + v2*sinf(2.0*pi*i*f2/sr));
where sr is the sample rate, f1 and f1 are the 2 frequencies, and v1 + v2 sum to less than 32767.0. You can add rounding or noise dithering to this for cleaner results.
Beware of clicking if your generated waveforms don't taper to zero at the ends.