I wanna use Google Voice Service by not microphone but video file.
for example, A Video File is playing on my computer and Google Speech Recognition Program is recognizing the video's Audio stream.
ex) Auto caption function of Youtube.
How can I use G.S.R??
This is a great question, Google does provide a way of doing that through the Web Speech API. Here's a link to an example usage, and a demo site from Google here.
However, you would have to extract the audio from the video first and then feed the audio to the API.
There's also the Cloud Speech API, which is free up to a certain point. It can be found here.
Related
I'm looking for a solution to be able to retrieve music from the Audio Library on YouTube. The Audio Library section on YouTube allows you to filter by genre and attribution requirements. Can someone confirm if this is exposed via their API? I haven't seen any documentation addressing this.
Is it possible to convert mp3 files into text without playing it using the microphone, for example, when listening to an audiobook with the mobile device? I was looking for relevant API in IBM Watson but can't find the solution.
Thereis no good/direct way to grab the audio output on android.
Record Android Audio Output
For Speech to Text you could use the Google API
Although if you have the mp3, it should be no problem to convert it to text with Google API.
Take a look here for that.
If I use the audio of one good YouTube video as input for the Google Could Speech API, would you say that I will get the "same" transcript as the one automatically provided by YouTube?
If you're interested on how Youtube automatic captions work, read their blog Automatic captions in YouTube:
To help address this challenge, we've combined Google's automatic
speech recognition (ASR) technology with the YouTube caption system to
offer automatic captions, or auto-caps for short. Auto-caps use the
same voice recognition algorithms in Google Voice to automatically
generate captions for video. The captions will not always be perfect
(check out the video below for an amusing example), but even when
they're off, they can still be helpful—and the technology will
continue to improve with time.
Credits to this Quora post.
Youtube has a feature where you can submit every thing that is being spoken in the video as text and youtube will autosync that transcript into subtitles automatically.
Is voice recognition is used or they figure out sync by audio spectrum displacements. There are several similar services online as well.
How is such a system can be developed?
I'm doing an app which converts from speech to text. I have googled and find that google speech api is a google choice. Now I meet a question: When user speak to ios device, how can I capture the audio file? does any Frameworks or APIs should be introduced? And what's the type of raw audio file, WAV or MP3? Thank you.
Why don't you take a look at some of the existing StackOverflow questions on this subject. Try Speech to text Conversion.? or What is the current best speech recognition API for ios to match few keywords?