Why does my python script not recognize speech from audio file?

Why does my python script not recognize speech from audio file? - google-cloud-speech

I have the following piece of code successfully recognizing short (less than 1 min) test audio file, but failing with recognition another long audiofile (1.5h).
from google.cloud import speech
def run_quickstart():
speech_client = speech.Client()
sample = speech_client.sample(source_uri="gs://linear-arena-2109/zoom0070.flac", encoding=speech.Encoding.FLAC)
alternatives = sample.recognize('uk-UA')
for alternative in alternatives:
print(u'Transcript: {}'.format(alternative.transcript))
with open("Output.txt", "w") as text_file:
for alternative in alternatives:
text_file.write(alternative.transcript.encode('utf8'))
if __name__ == '__main__':
run_quickstart()
Both files are uploaded to Google Cloud.
The first one:
https://storage.googleapis.com/linear-arena-2109/sample.flac
The second one:
https://storage.googleapis.com/linear-arena-2109/zoom0070.flac
Both were converted from mp3 with ffmpeg utility:
ffmpeg -i sample.mp3 -ac 1 sample.flac
ffmpeg -i zoom0070.mp3 -ac 1 zoom0070.flac
First file was successfully recognized, but second file outputs the following error:
google.gax.errors.RetryError: GaxError(Exception occurred in retry method that was not classified as transient, caused by <_Rendezvous of RPC that terminated with (StatusCode.INVALID_ARGUMENT, Sync input too long. For audio longer than 1 min use LongRunningRecognize with a 'uri' parameter.)>)
But I have already used uri parameter in my python script. What is wrong?
update
#NieDzejkob helped to understand the error. So, method long_running_recognize should be used instead of recognize. The comprehensive long_running_recognize usage example can be found on the corresponding document page

For any audio file longer than 1 minute, you need to use Asynchronous Speech Recognition and the file has to be uploaded to Google Cloud Storage so that you can pass in a gcs_uri.
In addition, you will need to use the .long_running_recognize method in your script. An example from GCP documentation can be found here.
I realize that OP figured it out but thought it would be useful to provide an answer and generalize it a bit.

Related

NVIDIA DALI : unable to load videos using readers.video in NVIDIA DALI pipeline

Trying to load the video for NVIDIA DALI pipeline for video processing but not able to load the .mp4 video.
import os
import numpy as np
from nvidia.dali import pipeline_def
import nvidia.dali.fn as fn
import nvidia.dali.types as types
batch_size=2
sequence_length=8
initial_prefetch_size=16
video_directory=['sintel_trailer-720p_0.mp4']
n_iter=6
print(video_directory)
#pipeline_def
def video_pipe(file_root):
video, labels = fn.readers.video(device="gpu", file_root=file_root, sequence_length=sequence_length,
random_shuffle=True, initial_fill=initial_prefetch_size)
return video, labels
pipe = video_pipe(batch_size=batch_size, num_threads=2, device_id=0, file_root=video_directory, seed=12345)
pipe.build()
Above DALI pipeline shows the following issue while loading the video:
RuntimeError: Critical error when building pipeline: Error when
constructing operator: readers__Video encountered:
[/opt/dali/dali/operators/reader/loader/video_loader.cc:117] Assert on
"dir != nullptr" failed: Directory ['sintel_trailer-720p_0.mp4'] could
not be opened.
I have referred the documentation from NVIDIA DALI for video processing but not to able solve,
Please check for reference : NVIDIA DALI DOCS VIDEO PROCESSING

The file_root argument points to the root directory, where DALI should search for videos, and the file_list argument should point to a file listing all samples to be loaded.
However, from your example, the filenames argument must be the one that suits your needs better.
Your example should work as expected, with the following pipeline definition:
#pipeline_def
def video_pipe(file_root):
video, labels = fn.readers.video(device="gpu", filenames=file_root, labels=[], sequence_length=sequence_length,
random_shuffle=True, initial_fill=initial_prefetch_size)
return video, labels
I added the labels argument too. Without it, the operator returns just one output. Please see the DALI manual if you want to understand the operator better.

After some research and forum discussion from NVIDIA DALI got this Answer, Please refer to issues/3503 the link for a detailed answer discussion.
Thank you

Rospy message_filter ApproximateTimeSynchronizer issue

I installed ROS melodic version in Ubuntu 18.04.
I'm running a rosbag in the background to mock cameras in messages rostopics.
I set the camera names in rosparams and iterated through it to capture each camera topics.
I'm using message_filter ApproximateTimeSynchronizer to get time synchronized data as mentioned in the official documentation,
http://wiki.ros.org/message_filters
But most of the time the callback function to ApproximateTimeSynchronizer is not being called/is having delay. The code snippet I'm using is given below:
What am I doing wrong here?
def camera_callback(*args):
pass # Other logic comes here
rospy.init_node('my_listener', anonymous=True)
camera_object_data = []
for camera_name in rospy.get_param('/my/cameras'):
camera_object_data.append(message_filters.Subscriber(
'/{}/hd/camera_info'.format(camera_name), CameraInfo))
camera_object_data.append(message_filters.Subscriber(
'/{}/hd/image_color_rect'.format(camera_name), Image))
camera_object_data.append(message_filters.Subscriber(
'/{}/qhd/image_depth_rect'.format(camera_name), Image))
camera_object_data.append(message_filters.Subscriber(
'/{}/qhd/points'.format(camera_name), PointCloud2)
topic_list = [filter_obj for filter_obj in camera_object_data]
ts = message_filters.ApproximateTimeSynchronizer(topic_list, 10, 1, allow_headerless=True)
ts.registerCallback(camera_callback)
rospy.spin()

Looking at your code, it seems correct. There is, however, a trouble with perhaps bad timestamps and ergo this synchronizer as well, see http://wiki.ros.org/message_filters/ApproximateTime for algorithm assumptions.
My recommendation is to write a corresponding node that publishes empty versions of these four msgs all at the same time. If it's still not working in this perfect scenario, there is an issue with the code above. If it is working just fine, then you need to pay attention to the headers.
Given that you have it as a bag file, you can step through the msgs on the command line and observe the timestamps as well. (Can also step within python).
$ rosbag play --pause recorded1.bag # step through msgs by pressing 's'
On time-noisy msgs with small payloads, I've just written a node to listen to all these msgs, and republish them all with the latest time found on any of them (for sync'd logging to csv). Not optimal, but it should reveal where the issue lies.

mp3 # 0x1768ac0 Header missing

I have a school project using Opencv, a Raspberry pi 2, a Raspicam and C++. Our project needs to use a video stream from the camera to detect objects.
That's why I assume to use pipe and fifo by this line :
mkfifo fifo;
raspivid -t 0 -o fifo & ./Detection fifo
As a result I get :
[mp3 # 0x1768ac0] Header missing
But when a well saved video is sent to the program, our project perfectly runs.
Example of good behavioured instructions :
raspivid -t 10000 -o video.h264 ; ./Detection video.h264
Do someone have an idea ?
All result I found where not suitable for our project. I maybe miss some important information.
Thank you
PS : Hope I am understandable, my english is not that good

Use stderr in lua io.popen to determine faulty function call

I'm making a function that can read the metadata of the current song playing in spotify. This is being programmed in lua since it is an implementation for awesome wm. I got the following line to get all the metadata that I can later use.
handle = io.popen('qdbus org.mpris.MediaPlayer2.spotify /org/mpris/MediaPlayer2 org.mpris.MediaPlayer2.Player.Metadata | awk -F: \'{$1=\"\";$2=\"\";print substr($0,4)}\'')
However when Spotify is not running I don't get the expected information and qdbus writes an error to the stderr stream. I wanted to use the fact that qdbus writes to the error stream to determine a fault and stop the program there. (This should also catch any other errors not related to wheter spotify is running or not)
My understanding is that lua popen uses popen3 that can subdivide between stdout and stderr. but all my efforts so far are fruitless and my error stream is always empty. Is it possible to check for a non nil value in the stderr in order to determine a faulty call to qdbus (or awk)?
thanks!

I think you can redirect stderr to stdout in the call to popen like this:
handle = io.popen("somecommand 2>&1")

If you want to differentiate stderr and stdout, you cannot do it with the io library but you can with luaposix. See this answer for instance.

You can checkout juci.exec which I wrote for JUCI webgui. I struggled with the same problem and I ended up using luaposix for this kind of thing when I really need two separate streams. My implementation also gives you the program exit code which is good for testing for errors: https://github.com/mkschreder/juci/blob/master/juci/lua/core.lua

Is there a good way to tell if HandBrakeCLI actually encoded anything?

I'm working on a system to convert a bunch of .mov files to H.264 (using HandBrakeCLI) and webm (using ffmpeg) as the .mov files are created. In general, things are going very well. I'm hung up on a bit of error detection. I want to know if one of the encodings failed so that we can investigate, try again, etc.
To test encoding failure, I copied a text file into a file with a .mov extention, and set the programs about trying to encode it. Naturally, they both fail to encode the file (I'm not sure what "success" would mean in this context...) However, while ffmpeg reports this failure by setting its exit code to 1, HandBrakeCLI sets the exit code to 0, because it exited cleanly. This is consistent with the HandBrakeCLI documentation but it leaves me wondering how I can tell if HandBrakeCLI knows if it failed to encode anything. That same documentation page suggests "If you want to monitor HandBrake's process, you should monitor the standard pipes", so I'm now getting the effect that I want by doing something like this:
HandBrakeCLI --preset 'Normal' --input bad.mov --output out.mv4 2>&1 | grep 'Encode done'
grep then sets its exit code to 0 if it found a match, and 1 if it didn't. But, this seems rather barbaric: for instance, the text "Encode done!" could change in a future release of HandBrake.
So, anyone have a better way to tell if HandBrake encoded something or not?
Some edited shell output is included below for reference...
$ ffmpeg -i 'develqueuedir/B_BH_120409.mov' 'develqueuedir/B_BH_120409.webm'
FFmpeg version 0.6.4-4:0.6.4-0ubuntu0.11.04.1, Copyright (c) 2000-2010 the Libav Developers
[snip]
develqueuedir/B_BH_120409.mov: Invalid data found when processing input
$ echo $?
1
$ HandBrakeCLI --preset 'Normal' --maxWidth 720 --optimize --input 'develqueuedir/B_BH_120409.mov' --output 'develqueuedir/B_BH_120409.mv4'
Output format couldn't be guessed from file name, using default.
[11:45:45] hb_init: starting libhb thread
HandBrake 0.9.6 (2012022900) - Linux x86_64 - http://handbrake.fr
Opening develqueuedir/B_BH_120409.mov...
[snip]
[11:45:45] libhb: scan thread found 0 valid title(s)
No title found.
HandBrake has exited.
$ echo $?
0

Short answer is no , you can find detailed explanation at HandBrake forum https://forum.handbrake.fr/viewtopic.php?f=12&t=18559&p=85529&hilit=return+code#p85529
adddition:
I think that there is a patch from user fonkprop that is rejected by developers , if you really need it contact that guy

Good news! It appears that this feature is about to be implemented in HandBrake-CLI 0.10. As you can read on the roadmap for the 0.10 milestone:
Basic support for return codes from the CLI. (0 = No Error, 1 = Cancelled, 2 = Invalid Input, 3 = Initialization error, 4 = Unknown Error")

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart