Can Google's Speech API accept an external Video URL? - google-cloud-speech

I recently figured out that Google's Vision API can accept an external image URL and I was curious if anyone knew if Google's Speech could accept an external video URL such as a YouTube video?
The code I have in my mind would look something like this:
def transcribe_gcs(yotube_url):
"""Asynchronously transcribes the audio file specified by the gcs_uri."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri=youtube_url) # swapped out gcs_uri with youtube_url
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
# sample_rate_hertz=16000,
language_code='en-US')
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=90)
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u'Transcript: {}'.format(result.alternatives[0].transcript))
print('Confidence: {}'.format(result.alternatives[0].confidence))

I was curious if anyone knew if Google's Speech could accept an
external video URL such as a YouTube video?
It needs to be a local path to your audio file (less than 1 min audio file) or GCS URI for audio file longer than 1 minute. What you're thinking is not possible, the audio/video file needs to be in GCS.

I think you can achieve this by streaming same video (for example on wowza or on any server of your choice.) and then simply extract audio using lets say ffmpeg and pass this to google. It should work. use StreamingRecognizeRequest instead of RecognitionAudio.

Related

Record audio (FLAC or WAV) in ReactJs and use Speech2Text from google using ruby backend

I need to record an audio through front-end. I'm using React Mic and mic-recorder-to-mp3 ReactJS libraries. Everything is fine, I'm managing to download the blob of the audio and hear it. Although, I need to upload it to my back-end so I can use Google SpeechToText API to extract text from audio.
That is the script that im using but its not returning any result, i think that is cos' the audio recorded dont have the right encodings.
require "google/cloud/speech"
require 'json'
# Instantiates a client
speech = Google::Cloud::Speech.new
# The name of the audio file to transcribe
file_name = "./newmp3.mp3"
# The raw audio
audio_file = File.binread file_name
encoding = :LINEAR16
# The audio file's encoding and sample rate
config = {
encoding: "LINEAR16",
language_code: "pt-BR",
model: "default",
sample_rate_hertz: 16000
}
audio = { content: audio_file }
# Detects speech in the audio file
response = speech.recognize(config, audio)
results = response.results
puts response
You are sending an mp3 file to the API, but you are telling it that the file is encoded as LINEAR16 (PCM data). This will not work.
According to the speech API docs, MP3 is only supported through the beta API.
One easy way to resolve this is to use a simple external audio encoder like ffmpeg, and convert it to something like FLAC before sending it over:
ffmpeg -i input.mp3 output.flac
Then set FLAC as the audio type for your encoding setting. But remember you cannot upload more then 1 minute of audio using this method. Uploading longer files has to be done asynchronously by using Google Cloud servers for storage.

Google cloud speech very inaccurate and misses words on clean audio

I am using Google cloud speech through Python and finding many transcriptions are inaccurate and missing several words. This is a simple script I'm using to return a transcript of an audio file, in this case 'out307.wav':
client = speech.SpeechClient()
with io.open('out307.wav', 'rb') as audio_file:
content = audio_file.read()
audio = speech.types.RecognitionAudio(content=content)
config = speech.types.RecognitionConfig(
enable_word_time_offsets=True,
language_code='en-US',
audio_channel_count=1)
response = client.recognize(config, audio)
for result in response.results:
alternative = result.alternatives[0]
print(u'Transcript: {}'.format(alternative.transcript))
This returns the following transcript:
to do this the tensions and suspicions except
This is very far off what the actual audio says (I've uploaded it at https://vocaroo.com/i/s1zdZ0SOH1Ki). The audio is a .wav and very clear with no background noise. This is worse than average, as in some cases it will get the transcription fully correct on a 10 second audio file, or it may miss just a couple of words. Is there anything I can do to improve results?
This is weird, I tried your audio file with your code and I get the same result, but, if I change the language_code to "en-UK" I am able to get the full response.
I'm working for Google Cloud and I created for you a public issue here, you can track there the updates.

How to change url from m3u8 to .ts

I'm trying to make an iptv link work on my receiver
this is the original link that i want to convert
http://s7.iapi.com:8000/re-NBA/index.m3u8?token=BzyIVQOtO77MTw
and this is the format that i want to reach in the end.
http://pro-vision.dyndns.pro:12580/live/laurent/laurent/2791.ts
An m3u8 file is just a text file that acts as an index for media streams - it will contain 'pointers' to the location of video and audio streams themselves.
A TS file is a 'container' that contains the video and audio streams themselves - i.e. the actual video and audio data.
You can't simply convert any m3u8 to a ts file or stream, but you can extract from the m3u8 file a ts file URL, which maybe is what you want.
If you look at the overview section of the m3u8 definition there is a very simple example which is maybe the best way of understanding this:
https://datatracker.ietf.org/doc/html/draft-pantos-http-live-streaming-19
The m3u8 file includes the ts references and can be seen in this extract from the above document:
#EXTM3U
#EXT-X-TARGETDURATION:10
#EXTINF:9.009,
http://media.example.com/first.ts
#EXTINF:9.009,
http://media.example.com/second.ts
#EXTINF:3.003,
http://media.example.com/third.ts
The numbers here refer to the length of the stream. More complex examples allow you have multiple variants of a particular stream, to allow different bit rate versions of a video for Adaptive Bit Rate (ABR) streaming for example.

How to play RTSP url from within app in ios

I have found many suggestion in stack overflow regarding usage of FFmpeg and link of github for DFURTSPPlayer but it is not compiling. But after integrating FFmpeg what I have to write? suppose i am having HTTP urls then I write:
code
moviePath = "http:/path.mp4"
movieURL = NSURL.URLWithString(moviePath!)
moviePlayer = MPMoviePlayerController(contentURL: movieURL)
moviePlayer!.play()
So for using RTSP urls what kind of code should i write?
Here is another post that has an example FFmpeg code that receives an RTSP stream (this one also decodes the stream to YUV420, stores it in pic, then converts the frame to RGB24, stores in picrgb and writes it to a file). So to achieve something similar to what you have for HTTP you should:
1) Write a wrapper Objective-C class for the FFmpeg C code, or just wrap the code in functions/functions that you will call directly from Objective-C code. You should have a way to pass the RTSP url to the class or function and provide a callback for a new frame. In the class/function start a new thread that will actually execute something similar to the code in the example and call a callback for each new decoded frame. NOTE: FFmpeg has a way to perform asynchronous I/O by using your own custom IO context and that would actually allow you to avoid creating the thread, but if you are new to FFmpeg maybe start with the basics and then you can improve your code later on.
2) In the callback update the view or whatever you are using for display with the decoded frame data.

How to extract the song name from a live audio stream on the Blackberry Storm?

HI
I am new to Blackberry.
I am developing an application to get the song name from the live audio stream. I am able to get the mp3 stream bytes from the particular radioserver.To get the song name I add the flag "Icy-metadata:1".So I am getting the header from the stream.To get the mp3 block size I use "Icy-metaInt".How to recognize the metadatablocks with this mp3 block size.I am using the following code.can anyone help me to get it...Here the b[off+k] is the bytes that are from the server...I am converting whole stream in to charArray which is wrong, but how to recognize the metadataHeaders according to the mp3 block size..
b[off+k] = buffers[PlayBuf]PlayByte];
String metaSt = httpConn.getHeaderField("icy-metaint");
metaInt=Integer.parseInt(metaSt);
for (int i=0;i<b[off+k];i++)
{
metadataHeader+=(new String(b)).toCharArray();
System.out.println(metadataHeader);
metadataLength--;
Blackberry has no native regex functionality; I would recommend grabbing the regexp-me library (http://code.google.com/p/regexp-me/) and compiling it into your code. I've used it before and its regex support is pretty good. I think the regex in the code you posted would work just fine.

Resources