Record audio from text to speech ios AVAudioSession

Record audio from text to speech ios AVAudioSession - ios

I am using Xamarin to develop an iOS app. I need to be able to record the audio from AVSpeechUtterance to a file, or stream the audio buffer from internal audio to a method that will populate a byte array.
I've searched high and low on how to record internal audio (from text to speech api). I'm only able to use the internal microphone for recording, not the audio being played by app.
Currently using Xamarin's example for text to speech:
public void Speak (string text)
{
var speechSynthesizer = new AVSpeechSynthesizer ();
var speechUtterance = new AVSpeechUtterance (text) {
Rate = AVSpeechUtterance.MaximumSpeechRate/2,
Voice = AVSpeechSynthesisVoice.FromLanguage ("en-US"),
Volume = 0.5f,
PitchMultiplier = 1.0f
};
speechSynthesizer.SpeakUtterance (speechUtterance);
}
Hints/solution can be C#, Swift or Objective C. I just need to be pointed in the general right direction.

Related

Play audio buffers generated by AVSpeechSynthesizer directly

We have a requirement for audio processing on the output of AVSpeechSynthesizer. So we started with using the write method of AVSpeechSynthesizer class to apply processing on top. of it. What we currently have:
var synthesizer = AVSpeechSynthesizer()
var playerNode: AVAudioPlayerNode = AVAudioPlayerNode()
fun play(audioCue: String){
let utterance = AVSpeechUtterance(string: audioCue)
synthesizer.write(utterance, toBufferCallback: {[weak self] buffer in
// We do our processing including conversion from pcmFormatFloat16 format to pcmFormatFloat32 format which is supported by AVAudioPlayerNode
self.playerNode.scheduleBuffer(buffer as! AVAudioPCMBuffer, completionCallbackType: .dataPlayedBack)
}
}
All of it was working fine before iOS 16 but with iOS 16 we started getting this exception:
[AXTTSCommon] TTSPlaybackEnqueueFullAudioQueueBuffer: error -66686 enqueueing buffer
Not sure what this exception means exactly. So we are looking for a way of addressing this exception or may be a better way of playing the buffers.
UPDATE:
Created an empty project for testing and it turns out the write method if called with an empty bloc generates these logs:

Code I have used for Swift project :
let synth = AVSpeechSynthesizer()
let myUtterance = AVSpeechUtterance(string: message)
myUtterance.rate = 0.4
synth.speak(myUtterance)
Can move let synth = AVSpeechSynthesizer() out of this method and declare on top for this class and use.
Settings to enable for Xcode14 & iOS 16 : If you are using XCode14 and iOS16, it may be voices under spoken content is not downloaded and you will get an error on console saying identifier, source, content nil. All you need to do is, go to accessiblity in settings -> Spoken content -> Voices -> Select any language and download any profile. After this run ur voice and you will be able to hear the speech from passed text.
It is working for me now.

How can I let the iOS Speech framework distinguish actual user's voice and record voice (e.g. mp3)?

My app is a Speech-to-Text app that enables to playback of the mp3 that is another user's dictated voice at the same time via iPhone speaker. It's like a pseudo phone call
I'm facing the issue that while playing an mp3, the iOS Speech framework detects it as a user's speech that is not expected. I want the framework to only detect the actual user's voice who is speaking in front of the iPhone.
I'm not sure if it's possible in the first place. However, I guess it somehow is since when I use the iOS Notes app and use the dictation feature on the keyboard, it doesn't detect the recorded voice (even though I play it very loudly!). So how can I realize the same?
My piece of code as a reference.
private var speechRecognizer: SFSpeechRecognizer?
private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest?
private var audioEngine: AVAudioEngine?
func start() {
self.speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "ja-JP"))
self.recognitionRequest = .init()
self.audioEngine = .init()
let inputNode = self.audioEngine?.inputNode
let recordingFormat = inputNode?.outputFormat(forBus: 0)
inputNode?.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { buffer, _ in
self.recognitionRequest?.append(buffer)
}
self.audioEngine?.prepare()
try? self.audioEngine?.start()
}

how to get .opus audio file duration in swift

I am using 'YbridPlayerSDK' library to play opus audio
let myEndpoint = MediaEndpoint(mediaUri: audioURL.absoluteString)
try AudioPlayer.open(for: myEndpoint, listener: nil) {
(control) in /// called asychronously
control.play()
}
The above code play the audio but I am not able to find out its duration. how to get audio duration

AVSpeechSynthesizer: how to display in a default player view

I use AVSpeechSynthesizer to play text books via audio.
private lazy var synthesizer: AVSpeechSynthesizer = {
let synthesizer = AVSpeechSynthesizer()
synthesizer.delegate = self
return synthesizer
}()
let utterance = AVSpeechUtterance(string: text)
utterance.voice = AVSpeechSynthesisVoice(
language: languageIdentifier(from: language)
)
synthesizer.speak(utterance)
I want to update information in iPhone's default player view (probably naming is wrong 🙏):
indicate playing Chapter with some text
enable next button to play the next chapter
How can I accomplish this?

I really don't think you want to hack your way through this.. But if you really do I would:
Listen to remote commands (UIApplication.sharedApplication().beginReceivingRemoteControlEvents(), see Apple Sample Project
Set your properties on MPNowPlayingInfoCenter: MPNowPlayingInfoCenter.default().nowPlayingInfo[MPMediaItemPropertyTitle] = "Title"
Implement the AVSpeechSynthesizerDelegate and try to map the delegate functions to playback states and estimate the playback progress using speechSynthesizer(_:willSpeakRangeOfSpeechString:utterance:) (idk if possible)
You might have to play with the usesApplicationAudioSession property of AVSpeechSynthesizer to have more control over the audio session (set categories etc.)

implementing Queue Services in AV Audio Recorder using Swift

Is it possible to create a buffer concept similar to AudioQueue services in AVRecorder Framework. In my application , i need to capture the Audio buffer and send it over the Internet. The server connection part is done, but i wanted to know if there is a way to record the voice continuously in the foreground, and pass this audio buffer by buffer at the background to the server using Swift.
Comments are appreciated.

AVAudioRecorder records to a file, so you can't easily use it to stream audio data out of your app. AVAudioEngine on the other hand can call you back as it captures audio buffers:
var engine = AVAudioEngine()
func startCapturingBuffers() {
let input = engine.inputNode!
let bus = 0
input.installTapOnBus(bus, bufferSize: 512, format: input.inputFormatForBus(bus)) { (buffer, time) -> Void in
// buffer.floatChannelData contains audio data
}
try! engine.start()
}

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Record audio from text to speech ios AVAudioSession - ios

Related

Play audio buffers generated by AVSpeechSynthesizer directly

How can I let the iOS Speech framework distinguish actual user's voice and record voice (e.g. mp3)?

how to get .opus audio file duration in swift

AVSpeechSynthesizer: how to display in a default player view

implementing Queue Services in AV Audio Recorder using Swift

Categories

Resources