Using avconv how do you make a linear16 file for Google Speech to Text - avconv

I'm trying to use avconv to make a LINEAR16 raw file for Google's speech to text, but whenever I try, I get a really slow file when I try to play it back using the play command in the documentation:
play --rate=16000 --bits=16 --endian=little --encoding=signed-integer --channels=1 out.raw
What's the right way to make this kind of a conversion?

It took some experimentation, but I was able to get it working by explicitly stating the sample rate, number of channels, and output format:
avconv -i michael_queen_v._ed_schultz_cl.mp3 -f s16le -ac 1 -ar 16k out.raw
-f: This forces the output encoding, since .raw isn't apparently enough for it to know what to do.
-ac 1: Mono
-ar 16k: This sounds like a gun, which is depressing, but this sets the sample rate to 16000MHz.

Related

Video Morph Between Two Images, FFMPEG/Minterpolate

I am trying to make a quick and easy morph video using two frames (png images) with ffmpeg's minterpolate filter, in a bash script on Ubuntu Linux. The intent is to use the morphs as transitions between similar video in a different video editor later.
It will work on 3+ frames/images, but fails using just 2 frames/images.
First the code that works: 3 frames
This is using three 1080p png files:
test01_01.png
test01_02.png
test01_03.png
input01="test01_%02d.png"
ffmpeg -y -fflags +genpts -r 30 -i $input01 -vf "setpts=100*PTS,minterpolate=fps=24:scd=none" -pix_fmt yuv420p "test01.mp4"
This takes a bit of processing time, then creates a 414kb, roughly three second mp4 video of a morph starting with the first frame, morphing to the second, then morphing to the third.
The code that fails: 2 frames
This is using just two of the same 1080p png files:
test02_01.png
test02_02.png
input01="test02_%02d.png"
ffmpeg -y -fflags +genpts -r 30 -i $input01 -vf "setpts=100*PTS,minterpolate=fps=24:scd=none" -pix_fmt yuv420p "test02.mp4"
This almost immediately creates a 262 byte corrupt mp4 file. There are no differences except the number of frames.
Things I've tried:
I have tried this with the Ubuntu default repo version of ffmpeg, and the static 64bit 5.0 and git-20220108-amd64 versions, all with the same result.
I have also tried with a 2-frame mp4 file as the input, with the same result.
Thoughts?
Is this a bug in ffmpeg or am I doing something wrong?
I am also open to any suggestions for creating a morph like this using other Linux-compatible software.
Thank you for any insight!
It is not documented, but it looks like minterpolate filter requires at least 3 input frames.
We may create a longer video using 5 input frames, and keep the relevant part.
For getting the same output as applying Minterpolate filter with only two input images, we may use the following solution:
Define two input streams:
Set test02_01.png as the first input and test02_02.png as the second input.
Loop each image at least twice, using -stream_loop
(test02_01.png is repeated twice and test02_02.png is repeated 3 times).
Set the input frame rate to 0.3 fps (it is equivalent to -r 30 and setpts=100*PTS).
The input arguments are as follows: -r 0.3 -stream_loop 1 -i test02_01.png -r 0.3 -stream_loop 2 -i test02_02.png.
Concatenate the two input streams using concat filter.
Apply minterpolate filer to the concatenated output.
The output of the above stage is a video with few redundant seconds at the beginning, and few redundant seconds at the end.
Apply trim filter for keeping the relevant part.
Add setpts=PTS-STARTPTS at the end (as recommended when using trim filter).
Suggested command:
ffmpeg -y -r 0.3 -stream_loop 1 -i test02_01.png -r 0.3 -stream_loop 2 -i test02_02.png -filter_complex "[0][1]concat=n=2:v=1:a=0[v];[v]minterpolate=fps=24:scd=none,trim=3:7,setpts=PTS-STARTPTS" -pix_fmt yuv420p test02.mp4
Sample output (as animate GIF):
test02_01.png:
test02_02.png:

iOS CoreAudio Producing Low Quality vs Comparable FFmpeg Function?

In my iOS app, I'm converting an MP3 file to a 16-bit mono 22,050Hz WAV file. I'm defining the output as this:
AudioStreamBasicDescription outputFormat = new AudioStreamBasicDescription();
outputFormat.setFormat(AudioFormat.LinearPCM);
outputFormat.setFormatFlags(AudioFormatFlags.Canonical);
outputFormat.setBitsPerChannel(16);
outputFormat.setChannelsPerFrame(1);
outputFormat.setFramesPerPacket(1);
outputFormat.setBytesPerFrame(2);
outputFormat.setBytesPerPacket(2);
outputFormat.setSampleRate(22050);
In a different project, I'm using FFmpeg to accomplish the exact same thing with the exact same input - convert an MP3 file to a 16-bit mono 22,050Hz PCM (not WAV but close enough) file:
ffmpeg -i input.mp3 -f s16le -acodec pcm_s16le -ac 1 -ar 22050
The file produced with FFmpeg sounds very similar if not identical to the MP3 file. The file produced by iOS CoreAudio sounds noticeably low-quality, though.
I'm wondering if there are some parameters or something I'm missing with my iOS setup?

Convert a series of jpg into an mov file in Ruby (or using any language)

I am making a site in Ruby in which I have a series of images, (almost like a powerpoint) and I need to automatically convert those images into one continuous video file (mov, mpeg) that shows each image for 5 seconds or so. Any one have any clues where to start.
I'm also open to using another language if there are tools to get the job done.
You could probably use FFmpeg to do this. Here's an example from the FFmpeg Wiki on the subject:
ffmpeg -framerate 1/5 -i img%03d.jpg -c:v libx264 -r 30 -pix_fmt yuv420p -movflags +faststart out.mp4
What this would do is...
-i img%03d.jpg
Read input from a series of JPEG files named img001.jpg, img002.jpg and so on
-framerate 1/5
...at one frame per five seconds...
-c:v libx264
...then turn it into H.264/MPEG-4 AVC...
-r 30
...at thirty frames per second...
-pix_fmt yuv420p
...with YUV420 pixel format (really, all the FFmpeg flags work here)...
-movflags +faststart
...after encoding completes, relocate some data to the beginning of the file so playback can begin before the file is completely downloaded...
out.mp4
...and store it into out.mp4.
If you were using this from Ruby you'd likely launch a subprocess. The flags would be similar if you really want a (QuickTime) .mov file instead of H.264 MPEG-4.

Something wrong with my m3u8 bandwidth value

I use ffmpeg to encode my sample videos following the recommanded bitrates in Technical Note TN2224, then use HLS tools to segment it and create playlists, finally create the variant plist file "all.m3u8"
I used the validation tool to validate my HLS content, it ended up showing except for the 64k audio only bandwidth is low, others are stay in the same bandwidth, I opened "all.m3u8" using text editor and seeing that all other bitrate contents are using the same bandwidth. No matter how I change parameters in the ffmpeg command, I still can't correct them. The following command is the one I used to encode contents:
ffmpeg -i input.m4v -acodec libfaac -vcodec libx264 -s 480x360 -b 350k -r 29.97 -vpre medium output.mp4
The following command is for generating the segments and plists:mediafilesegmenter -b http://www.example.com/stream/ -I -f ~/Documents/sample/ output.mp4
The following command is for generating the all.m3u8:variantplaylistcreator -o all.m3u8 http://www.example.com/stream/110/prog_index.m3u8 ~/Documents/sample/110/prog_index.m3u8 -iframe-url http://www.freeyourteam.com/stream/110/iframe_index.m3u8 http://www.example.com/stream/200/prog_index.m3u8 ~/Documents/sample/200/prog_index.m3u8 -iframe-url http://www.freeyourteam.com/stream/200/iframe_index.m3u8 http://www.example.com/stream/350/prog_index.m3u8 ~/Documents/sample/350/prog_index.m3u8 -iframe-url http://www.freeyourteam.com/stream/350/iframe_index.m3u8 http://www.example.com/stream/550/prog_index.m3u8 ~/Documents/sample/550/prog_index.m3u8 -iframe-url http://www.freeyourteam.com/stream/550/iframe_index.m3u8 http://www.example.com/stream/64/prog_index.m3u8 ~/Documents/sample/64/prog_index.m3u8
and in my "all.m3u8", the bandwidths are all 523894:
Please allow me to ask two more basic questions:
In the tech note, recommanded bitrates are 64 Kbps, 110 Kbps, 200 Kbps, 350 Kbps, 550 Kbps, I wonder if this value includes the audio bitrate or exclude the audio.
How do you insert keyframe to segment? Because in the document it says:"You must include at least one keyframe per segment, preferably more. If you only include one, put it at the beginning of the segment." I don't quite get how you can do it.
Thank you very much for your help and I do appreciate your time.
Jason,
To create all.m3u8 should it not be given multiple m3u8 files each corresponding to a different bitrate?
I am guessing you run ffmpeg say 4 times to create for 4 bitrate files. Then you run the segmenter 4 times to create 4 set of segments and its individual m3u8 files.
Finally you have to tell the variantplaylistcreator where the location of the various m3u8 files per bitrate to create a single master m3u8 file.
Eg.
variantplaylistcreator -o mymedia_all.m3u8 http://mywebserver/mymedia_lo/prog_index.m3u8 mymedia_lo.plist http://mywebserver/mymedia_med/prog_index.m3u8 mymedia_med.plist http://mywebserver/mymedia_hi/prog_index.m3u8 mymedia_hi.plist
I don't see you providing the various filese seperately. I hope you get the picture.
EDIT: To answer your other questions:
Bitrates include audio. What you need to do is ensure you have a fixed key frame interval in your encoding. This will allow the segmenter to segment the files at regular intervals. you don't insert anything anywhere.
Out of curiosity why not directly use ffmpeg to give you the output segmented files? It supports it.
Thanks for everybody's attention and suggestions. I finally figured it out. The reason why the bandwidth stayed the same for different bitrate is that my ffmpeg command missed couple settings. I ended up using the following command:ffmpeg -i inputVideo.m4v -f mpegts -acodec libfaac -ar 44100 -ab 64k -vcodec libx264 -b 350k -s 480x360 -r 29.97 -flags +loop -cmp +chroma -partitions +parti4x4+partp8x8+partb8x8 -subq 5 -trellis 1 -refs 1 -coder 0 -me_range 16 -keyint_min 25 -sc_threshold 40 -i_qfactor 0.71 -bt 200k -maxrate 350k -bufsize 350k -rc_eq 'blurCplx^(1-qComp)' -qcomp 0.6 -qmin 10 -qmax 51 -qdiff 4 -level 30 -aspect 4:3 -g 30 -async 2 output.ts
I put it here so that other people who have the same problem as me will have a reference.
It sounds like you may have uncovered a bug in variantplaylistcreator. I recommend to verify that the sub-streams really are the bitrate you expect, and if it's really putting the wrong value, to report it to apple.
It might have something to do with using multiple -iframe-url. I can't understand why it would be necessary to specify it more than once. Adaptive streaming won't work if the substreams have different I-frame positions -- at least all of the segment boundaries must be aligned.
If you need to fix the playlist up programmatically, I recommend to use ffprobe (from ffmpeg suite) to extract the bitrate of each substream, and replace the bandwidth number with the extracted value.

Is it possible to retrieve frames as images out of FMS live stream?

Has anyone tried this ?
What's the best practice for this?
FMS live streams are using the RTMP protocol:
ffmpeg -i rtmp://***server/path* **-acodec copy -vcodec copy -y *captured***.flv**
Here, we are saving the whole stream to an FLV file, which is Flash's static movie file format and so can always preserve all RTMP audio and video codecs without conversion.
You can then extract any frames you want, e.g.
ffmpeg -i *captured***.flv -s** starttime -vframes 1 -f image2 -vcodec mjpeg *captured***.jpg**
If you are ambitious and know exactly what time offsets and intervals you want to capture in advance, you can do both steps at once, e.g. one frame every second:
ffmpeg -i rtmp://***server/path* **-r 1 -f image2 -vcodec mjpeg *captured***%d.jpg**
All commandlines have not been tested, will need fixing but give you a good impression

Resources