I am trying to transcribe audio from a stream using this tutorial (section, "Performing streaming speech recognition on a local file"): https://cloud.google.com/speech-to-text/docs/streaming-recognize
The file is an M3U file, so I am trying to use the RecognitionConfig.AudioEncoding.MP3 option, but the MP3 attribute is being rejected. When I try to autocomplete the option, MP3 does not appear either.
The documentation show that the MP3 attribute is only available in version v1beta1 (https://cloud.google.com/text-to-speech/docs/reference/rpc/google.cloud.texttospeech.v1beta1#google.cloud.texttospeech.v1beta1.AudioEncoding), and I ran the pip upgrade.
Is there something else I need to do to install v1beta1?
Note that the second link you shared, regarding v1beta1, is for the Text-to-Speech API which is the other way around of the examples you are following (Speech-to-Text API).
In that case, to use RecognitionConfig.AudioEncoding.MP3, you'll need to use the v1p1beta1 version instead. No changes are needed to the pip command (pip install --upgrade google-cloud-speech) but you need to import the right version (speech_v1p1beta1) in your Python code:
# [START speech_transcribe_streaming]
def transcribe_streaming(stream_file):
"""Streams transcription of the given audio file."""
import io
from google.cloud import speech_v1p1beta1
from google.cloud.speech_v1p1beta1 import enums
from google.cloud.speech_v1p1beta1 import types
client = speech_v1p1beta1.SpeechClient()
And now you can use the MP3 encoding:
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.MP3,
sample_rate_hertz=16000,
language_code='en-US')
streaming_config = types.StreamingRecognitionConfig(config=config)
Full code here but it's just the base example with the previous changes.
Tested with an MP3 sample:
$ python mp3.py sample.mp3
Finished: True
Stability: 0.0
Confidence: 0.9875912666320801
Transcript: I'm sorry Dave I'm afraid I can't do that
Related
I just want to get a audio file(opus codec used) only in webm file.
I try to search what is webm format, how to parse, but I cant get info well.
I check that webm format is from mkv, then should I check the mkv first?
there is just one github code, but I cant find way how parse the audio from webm.
https://github.com/webmproject/libwebm/tree/master/webm_parser
You're really going to want the MKVToolNix. These include the tool mkvextract in another answer.
The MKVToolNix is actually a series of tools (mkvmerge, mkvinfo, mkvextract, mkvpropedit). First you asked how to parse the info. You can find the details using:
mkvinfo file.webm
mkvinfo file.webm -a
The first command will parse the overall structure. The second gives the detail of each frame. Use the --help switch if you want all commands.
To extract the audio, do
mkvextract file.webm tracks X:newfile.opus
Where X is the track number that you've identified as wanted from mkvinfo previously. Webm and MKV can have multiple tracks. "newfile.opus" is the new file that you want to create, choose the name you want.
There is also a mkvtoolnix gui, but I've never used that.
mkvextract can extract audio for you, and I recommend having a look at the mkvtoolsnix source code.
For example, you can extract audio from a WebM file into an Ogg Opus file like this:
$ mkvextract ~/audio/bubbles.webm tracks 0:audio.opus
Extracting track 0 with the CodecID 'A_OPUS' to the file 'audio.opus'. Container format: Ogg (Opus in Ogg)
Progress: 100%
I have added this (https://github.com/kewlbear/FFmpeg-iOS-build-script) version of ffmpeg to my project. I can't see the entry point to the library in the headers included.
How do I get access to the same text command based system that the stand alone application has, or an equivalent?
I would also be happy if someone could point me towards documentation that allows you to use FFmpeg without the command line interface.
This is what I am trying to execute (I have it working on windows and android using the CLI version of ffmpeg)
ffmpeg -framerate 30 -i snap%03d.jpg -itsoffset 00:00:03.23333 -itsoffset 00:00:05 -i soundEffect.WAV -c:v libx264 -vf fps=30 -pix_fmt yuv420p result.mp4
Actually you can build ffmpeg library including the ffmpeg binary's code (ffmpeg.c). Only thing to care about is to rename the function main(int argc, char **argv), for example, to ffmpeg_main(int argc, char **argv) - then you can call it with arguments just like you're executing ffmpeg binary. Note that argv[0] should contain program name, just "ffmpeg" should work.
The same approach was used in the library VideoKit for Android.
To do what you want, you have to use your compiled FFmpeg library in your code.
What you are looking for is exactly the code providing by FFmpeg documentation libavformat/output-example.c (that mean AVFormat and AVCodec FFmpeg's libraries in general).
Stackoverflow is not a "do it for me please" platform. So I prefer explaining here what you have to do, and I will try to be precise and to answer all your questions.
I assume that you already know how to link your compiled (static or shared) library to your Xcode project, this is not the topic here.
So, let's talk about this code. It creates a video (containing video stream and audio stream randomly generated) based on a duration. You want to create a video based on a picture list and sound file. Perfect, there are only three main modifications you have to do:
The end condition is not reaching a duration, but reaching the end of your file list (In code there is already a #define STREAM_NB_FRAMES you can use to iterate over all you frames).
Replace the dummy void fill_yuv_image by your own method that load and decode image buffer from file.
Replace the dummy void write_audio_frame by your own method that load and decode the audio buffer from your file.
(you can find "how to load audio file content" example on documentation starting at line 271, easily adaptable for video content regarding documentation)
In this code, comparing to your CLI, you can figure out that:
const char *filename; in the main should be you output file "result.mp4".
#define STREAM_FRAME_RATE 25 (replace it by 30).
For MP4 generation, video frames will be encoded in H.264 by default (in this code, the GOP is 12). So no need to precise libx264.
#define STREAM_PIX_FMT PIX_FMT_YUV420P represents your desired yuv420p decoding format.
Now, with these official examples and related documentation, you can achieve what you desire. Be careful that there is some differences between FFmpeg's version in these examples and current FFmpeg's version. For example:
st = av_new_stream(oc, 1); // line 60
Could be replaced by:
st = avformat_new_stream(oc, NULL);
st->id = 1;
Or:
if (avcodec_open(c, codec) < 0) { // line 97
Could be replaced by:
if (avcodec_open2(c, codec, NULL) < 0) {
Or again:
dump_format(oc, 0, filename, 1); // line 483
Could be replaced by:
av_dump_format(oc, 0, filename, 1);
Or CODEC_ID_NONE by AV_CODEC_ID_NONE... etc.
Ask your questions, but you got all the keys! :)
MobileFFMpeg is an easy to use pod for the purpose. Instructions on how to use MobileFFMpeg at: https://stackoverflow.com/a/59325680/1466453
MobileFFMpeg gives a very simple method for translating ffmpeg commands to your IOS objective-c program.
Virtually all ffmpeg commands and switches are supported. However you have to get the pod with appropriate license. e.g min-gpl will not give you features of libiconv. libiconv is convered in vidoe, gpl and full-gpl licenses.
Please highlight if you have specific issues regarding use of MobileFFMpeg
l try to read the following video, downloaded from http://www.sample-videos.com/
which is http://www.sample-videos.com/video/mp4/720/big_buck_bunny_720p_5mb.mp4
Here is my code :
import cv2
cap = cv2.VideoCapture('big_buck_bunny_720p_5mb.mp4')
if(cap.isOpened()== False):
print("Error opening video stream or file")
count = 0
while (cap.isOpened()):
# capture frame by frame :
ret, frame = cap.read()
if ret==True:
# Display the resulting frame
cv2.imshow('Frame', frame)
cv2.imwrite("frame%d.jpg" % count, frame)
count +=1
print(count)
However l get Error opening video stream or file at cap = cv2.VideoCapture('big_buck_bunny_720p_5mb.mp4')
and ret equals False always
My OpenCV version is 3.1.0
There may be the following issue with your machine:
configure the video path
check the permission to access the file
install an additional codec
You might have installed opencv but there are some prerequisites needs to be installed while reading a .mp4 video file using open cv.
You can verify that by simply reading an .avi format file and .mp4 file
[it could read .avi file but not .mp4 file]
To read a mp4 .file
Install ffmpeg package compiled with H.264 codec:
H.264/MPEG-4 Part 10 or AVC (Advanced Video Coding) is a standard for video compression, and is currently one of the most commonly used formats for the recording, compression, and distribution of high definition video.
Ref : https://www.debiantutorials.com/how-to-install-ffmpeg-with-h-264mpeg-4-avc/
Few suggestions to make sure all prerequisites are available
1. check ffmpeg package compiled with H.264 is already installed in the machine using the command below.
ffmpeg -version
2. Installation of open-cv in anaconda will reduce the stress to install ffmpeg package compiled with H.264
3. Make sure that the user created in the machine has got enough privilege to read and write in specific application related directories
a. Check the read and write permission using the command below
ls -ld <folder-path>
or
namei -mo <folder-path>
b. Alter the access writes based on the user privilege required (sudo access needed else we need to engage admin to alter the permission)
eg : sudo chmod -R 740 <folder-path>** [ Recursive rwx for user ,r for group ]
I want to extract images from a m4v video sent from mobile to my rails server. These images will be later used for face recognization purposes. There is a gem called "streamio-ffmpeg" that does this job nicely and easily but the problem is that it does not support JRuby-1.7.13 that I am currently using on my server. It's a big application and upgrading the JRuby version not desirable at this moment.
Can someone please suggest JRuby1.7.13 compatible alternative solutions/gems to extract the images from a video file?
From the sourcecode, it looks like streamio-ffmpeg outputs the underlying command by default :
FFMPEG.logger.info("Running transcoding...\n#{command}\n")
So all you have to do is execute :
movie.screenshot("screenshot_%d.jpg", { vframes: 50, frame_rate: '6/2' }, validate: false)
on a system where streamio-ffmpeg is installed.
You look at the output, extract the command, and use it somewhere else with :
system("ffmpeg arguments_you_extracted_from_the_logs")
without having to install streamio-ffmpeg.
I followed this page:
https://cloud.google.com/speech/docs/getting-started
and I could reach the end of it without problems.
In the example though, the file
'uri':'gs://cloud-samples-tests/speech/brooklyn.flac'
is processed.
What if I want to process a local file? In case this is not possible, how can I upload my .flac via command line?
Thanks
You're now able to process a local file by specifying a local path instead of the google storage one:
gcloud ml speech recognize '/Users/xxx/cloud-samples-tests/speech/brooklyn.flac' \ --language-code='en-US'
You can send this command by using the gcloud tool (https://cloud.google.com/speech-to-text/docs/quickstart-gcloud).
Solution found:
I created my own bucket (my_bucket_test), and I upload the file there via:
gsutil cp speech.flac gs://my_bucket_test
If you don't want to create a bucket (costs extra time and money) - you can stream the local files. The following code is copied directly from the Google cloud docs:
def transcribe_streaming(stream_file):
"""Streams transcription of the given audio file."""
import io
from google.cloud import speech
client = speech.SpeechClient()
with io.open(stream_file, "rb") as audio_file:
content = audio_file.read()
# In practice, stream should be a generator yielding chunks of audio data.
stream = [content]
requests = (
speech.StreamingRecognizeRequest(audio_content=chunk) for chunk in stream
)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="en-US",
)
streaming_config = speech.StreamingRecognitionConfig(config=config)
# streaming_recognize returns a generator.
responses = client.streaming_recognize(
config=streaming_config,
requests=requests,
)
for response in responses:
# Once the transcription has settled, the first result will contain the
# is_final result. The other results will be for subsequent portions of
# the audio.
for result in response.results:
print("Finished: {}".format(result.is_final))
print("Stability: {}".format(result.stability))
alternatives = result.alternatives
# The alternatives are ordered from most likely to least.
for alternative in alternatives:
print("Confidence: {}".format(alternative.confidence))
print(u"Transcript: {}".format(alternative.transcript))
Here is the URL incase the package's function names are edited over time: https://cloud.google.com/speech-to-text/docs/streaming-recognize