Text to Speech framework for iOS with kids voice - ios

I am trying to build a kids game using swift. I want to use text to speech API in my app, but all the API which i came through were either male or female robot kind of voice. Is there any API available which converts text to speech with kids voice or something similar?
Thanks!

You can just use the standard AVSpeechSynthesizer and increase the pitch:
let utterance = AVSpeechUtterance( "Hi, uh.. I'm a.. um kid!" )
utterance.pitchMultiplier = 1.3 // or whatever value you find works well

Related

AKFrequencyTracker with tempo

When using AKFrequencyTracker, I like to add the "tempo" feature to recognize notes according to their pace and wonder
Trying to use AKPeriodicFunction and AKMetronome but it looks they are used for playback rather than analysis.
tracker = AKFrequencyTracker(mic)
if tracker.amplitude > 0.1 {
var frequency = Float(tracker.frequency)
...
How can I add the "tempo" feature into the the tracker?
Thanks
AudioKit doesn't have beat detection, which would be the first tempo to a tempo tracker. We recently incorporated Aubio with AudioKit for a client project:
https://github.com/aubio/aubio
There are other open source alternatives, just do some Google searches for "tempo detection GitHub":
https://www.google.com/search?client=safari&rls=en&q=tempo+detection+github&ie=UTF-8&oe=UTF-8

How to identify audio files containing same set of words?

I want to implement an application where given an audio containing a speech as query,it returns the most similar audio that was already submitted by an other user.
Here two audio are similar if they contains approximately the same set of words.
For example if the query speech is "Hello World!":
it returns "Hello my World!", "Hello Worlds!"
it doesn't necessary returns "Hello Earth" or "Bye world!"
it doesn't return "Trump is a dickhead" (even if it is true, but this an other story :) )
Notice that this "audio detector" MUST be robust against different timbers (different users voices). It would be cool if it is robust against noise (like reasonable outdoor noise) and melody distortion (like matching "Hello World!" with "Hellooo World!").

Poor recognition due to background noise using OpenEars on iOS

I'm using OpenEars in my app for performing the recognition of some words and sentences. I have followed the basic tutorial for the offline speech recognition and executed a porting in Swift. This is the setup procedure
self.openEarsEventsObserver = OEEventsObserver()
self.openEarsEventsObserver.delegate = self
let lmGenerator: OELanguageModelGenerator = OELanguageModelGenerator()
addWords()
let name = "LanguageModelFileStarSaver"
lmGenerator.generateLanguageModelFromArray(words, withFilesNamed: name, forAcousticModelAtPath: OEAcousticModel.pathToModel("AcousticModelEnglish"))
lmPath = lmGenerator.pathToSuccessfullyGeneratedLanguageModelWithRequestedName(name)
dicPath = lmGenerator.pathToSuccessfullyGeneratedDictionaryWithRequestedName(name)
The recognition works well in a quiet room for both single words and whole sentences ( I would say it has a 90% hit rate). However, when I tried in quiet pub with a light background noise the app had serious difficulties in recognising even just word.
Is there any way to improve the speech recognition when there is background noise?
If the background noise is more or less uniform (i.e. has a regular pattern), you can try adaptation of the acoustic model, otherwise it's an open problem sometimes referred to as the cocktail party effect, which can be part solved using DNNs.
Try this setting, works well for me.
try? OEPocketsphinxController.sharedInstance().setActive(true)
OEPocketsphinxController.sharedInstance().secondsOfSilenceToDetect = 2
OEPocketsphinxController.sharedInstance().setSecondsOfSilence()
OEPocketsphinxController.sharedInstance().vadThreshold = 3.5
OEPocketsphinxController.sharedInstance().removingNoise = true
Or You can try iSphinx library.

automatically start voice recording like talking tom on this first word is missing

in my code i am using 2 AVAudioRecorder one for monitoring the audio and another one for recording. In that recording is good but problem is the second recorder can't able to record the first 2 words or 2 sec. sample like this suppose i am say like "hi how are you" means it will record the "are you" words. http://purplelilgirl.tumblr.com/post/9377269385/making-that-talking-app with use of this tutorial only i wrote the recording functionality to my code. any one facing this same issue. let me know please regarding on this.
I think you should keep recording it all the time with low or high threshold, what you need to do is just, start playing from time when your lowPassResults exceeds to 0.5 in case ajDanny's example.

iOS: Sound generation on iPad given Hz parameter?

Is there an API in one of the iOS layers that I can use to generate a tone by just specifying its Hertz. What I´m looking to do is generate a DTMF tone. This link explains how DTMF tones consists of 2 tones:
http://en.wikipedia.org/wiki/Telephone_keypad
Which basically means that I should need playback of 2 tones at the same time...
So, does something like this exist:
SomeCleverPlayerAPI(697, 1336);
If spent the whole morning searching for this, and have found a number of ways to playback a sound file, but nothing on how to generate a specific tone. Does anyone know, please...
Check out the AU (AudioUnit) API. It's pretty low-level, but it can do what you want. A good intro (that probably already gives you what you need) can be found here:
http://cocoawithlove.com/2010/10/ios-tone-generator-introduction-to.html
There is no iOS API to do this audio synthesis for you.
But you can use the Audio Queue or Audio Unit RemoteIO APIs to play raw audio samples, generate an array of samples of 2 sine waves summed (say 44100 samples for 1 seconds worth), and then copy the results in the audio callback (1024 samples, or whatever the callback requests, at a time).
See Apple's aurioTouch and SpeakHere sample apps for how to use these audio APIs.
The samples can be generated by something as simple as:
sample[i] = (short int)(v1*sinf(2.0*pi*i*f1/sr) + v2*sinf(2.0*pi*i*f2/sr));
where sr is the sample rate, f1 and f1 are the 2 frequencies, and v1 + v2 sum to less than 32767.0. You can add rounding or noise dithering to this for cleaner results.
Beware of clicking if your generated waveforms don't taper to zero at the ends.

Resources