Speech to text recognition - machine-learning

Is it possible to convert mp3 files into text without playing it using the microphone, for example, when listening to an audiobook with the mobile device? I was looking for relevant API in IBM Watson but can't find the solution.

Thereis no good/direct way to grab the audio output on android.
Record Android Audio Output
For Speech to Text you could use the Google API
Although if you have the mp3, it should be no problem to convert it to text with Google API.
Take a look here for that.

Related

Google Speech Recognition on video file

I wanna use Google Voice Service by not microphone but video file.
for example, A Video File is playing on my computer and Google Speech Recognition Program is recognizing the video's Audio stream.
ex) Auto caption function of Youtube.
How can I use G.S.R??
This is a great question, Google does provide a way of doing that through the Web Speech API. Here's a link to an example usage, and a demo site from Google here.
However, you would have to extract the audio from the video first and then feed the audio to the API.
There's also the Cloud Speech API, which is free up to a certain point. It can be found here.

Selecting other qualities in a hls live streaming

I am trying to develop an ios application which permits to visualize a streaming using the protocol hls. As hls is by default adaptive, I can only select the most suitable quality for my connection. I cannot access other qualities from my developed ios application. Is there a way to access the four streams from an m3u8 link? I am using objective c as language.
m3u8 is a text based file format so you can just open it and parse the sources it is describing.

Save audio stream to mp3 file (iOS)

I have an AVSpeechSynthesizer which converts text to speech, but i've encountered a problem.
I don't know how to save the audio file that it generates to a music file, which I would quite like to be able to do!
So here's my question, how do you save the AVSpeechSynthesizer output and if this isn't possible, can I us AVFoundation, CoreMedia or other public API to capture the output of the speakers, but before it has come out?
Thanks!
Unfortunately no, there is no public API available to capture the speaker output and looking over the docs for AVSpeechSynthesizer and related classes I don't see a way to capture any audio from it. You may want to look at 3rd party libraries to help with this.
Related questions:
Recording audio output only from speaker of iphone excluding microphone
Text-to-speech libraries for iPhone

How to display RTSP from IP Camera/CCTV in iOS

There is obviously a way to do this because so many applications are already doing it - NetCamViewer and iCamviewer to name just one.
I have searched and searched, but I'm not finding anything of value that gives a hint as to how this is done. I'm reaching out hoping that someone will give me a clue.
I'm trying to connect to an video security camera (Y-CAM), which supports the RTSP protocol, and display the video from my iPhone/iPad application. The camera has an IP address and I can view the video from a web browser and from Quicktime running on my Mac. The problem is that RSTP is not supported on iOS so even trying to connect using Safari on an iPad doesn't work.
I've read that some are trying to use Live5555, but I haven't seen an article that describes if it has been done successfully and how.
An alternative is to capture the RTSP stream on a server, convert it to an HTTP Live stream and then connect to the HTTP Live stream from iOS. Unfortunately, this hasn't proved as easy as it sounds.
I'd prefer to go directly to the camera like other applications I've seen do. the RTSP to Live is a fall back if I have to.
Any hints are greatly appreciated. Thanks!
This is wrong :) or not necessary (An alternative is to capture the RTSP stream on a server, convert it to an HTTP Live stream and then connect to the HTTP Live stream from iOS. Unfortunately, this hasn't proved as easy as it sounds.)
You should use ffmpeg library, as this library can connect any streaming server (supporting rtsp, mms, tcp, udp ,rtmp ...) and then draw pictures to the screen.. (for drawing you can use opengles or uiimage also works)
First of all, use avformat_open_input to connect to your ip address
then use avcodec_find_decoder & avcodec_open2 to find codecs and to open them (you should call them for both audio & video)
Then, in a while loop read packets from server by using av_read_frame method
When you get frame, if it is audio then sent it to AudioUnit or AudioQueue,
if it is video, then convert it from yuv to rgb format by using sws_scale method and draw the picture to the screen.
That's all.
look at this wrapper also (http://www.videostreamsdk.com), it's written on ffmpeg library and supports iOS
You really need to search stack overflow before posting , this question has been asked many times. Yes live 555 sort of works and some of us have gotten it to work..
There are other players too, including ours http://www.streammore.tv/
You can find an open source FFMepg Decoder for iOS (and somes samples) on GitHub : https://github.com/mooncatventures-group
Sample use of this library : http://sol3.typepad.com/exotic_particles/
There are two general technology to display RTSP video on iOS Safari:
RTSP / HLS (H.264+AAC)
RTSP / Websocket (H.264+AAC ==> MPEG+G.711 or H.264+?)
For HLS you can consider Wowza server.
For Websocket playback in iOS Safari you can use WCS4 server.
Main idea for websocket playback is direct HTML5 rendering to HTML page Canvas element and audio context. In the case of MPEG playback video decoding will be done on iOS Safari side using plain JavaScript.
Another option - install a WebRTC plugin with getUserMedia support and play this stream via WebRTC. Anyway you will need a server side RTSP-WebRTC transcoder in such case.

ios capture audio file from user's speech

I'm doing an app which converts from speech to text. I have googled and find that google speech api is a google choice. Now I meet a question: When user speak to ios device, how can I capture the audio file? does any Frameworks or APIs should be introduced? And what's the type of raw audio file, WAV or MP3? Thank you.
Why don't you take a look at some of the existing StackOverflow questions on this subject. Try Speech to text Conversion.? or What is the current best speech recognition API for ios to match few keywords?

Resources