Google Speech API Won't Accept Large Audio Files - google-cloud-speech

I'm receiving a server error when trying to process large audio files. The audio files are originally audio/m4a # 32kHz and per the recommendations of the documentation am converting/compressing them to audio/amr_wb # 16kHz. These files are well below the 180 minutes of audio limit yet I'm still receiving a server error when processing them.
GaxError Exception occurred in retry method that was not classified as transient, caused by 8:Received message larger than max (5371623 vs. 4194304)
I'm using version V1p1beta and the method long_running_recognize to transcribe these audio files. My files are hosted on Google Cloud Storage and I'm providing the uri in my api call.
How can I send large audio files to the API without the server enforcing a size restriction ? It seems wrong to recommend using FLAC or WAV and having a 180 minutes of length in audio limit if the server can't even handle my hour long audio file that has been encoded to AMR_WB.
Thanks for any help

Currently Speech-to-Text API released the v1 endpoint, I suggest trying this version. I was able to get a proper response by using a 90 minute audio.

Related

Does AVPlayer support live footage served directly from a fragmented MP4 file?

Overview
I have a server generating a livestream of video that is exposed as a fragmented MP4 file.
That file is being served to an iOS emulator trying to play the video using react-native-video, which, I believe uses AVPlayer.
The first request the emulator makes is a range request for bytes 0-1. I record the X-Playback-Session-Id and respond with: 206 partial content, bytes 0-1, and the content-range bytes 0-1/*. According to the specification, the size of * indicates that the value is unknown.
I then receive an error on the AVPlayer stating that the server is not correctly configured. According to the apple docs this indicates the server does not support range requests.
I have implemented support for range requests. As an experiment, I set the content-range to respond with a very large size, instead of * (bytes 0-1/17179869176). Which works to an extent. The AVPlayer follows through with multiple range requests for different byte-ranges (0-17179869175). Though sometimes it only requests a singular range.
This buffers for a while and displays nothing until I stop the server (with a breakpoint), and a short while after the video stops buffering (but does not close any active connections) and plays what it has so far loaded. Given that this is a livestream that's not acceptable.
Playing the livestream inside chrome or an android emulator works exactly as I'd expect - the video is played as soon as it gets all the necessary data. But chrome also does not require any of the support for byte range requests to be able to play a video.
I can understand that without any source of content-length the AVPlayer is unable to make range requests as it doesn't know where the file ends. However, as the media I'm exposing is a live stream I don't have a meaningful content-length to give it. So there must be something I can specify either in headers on the server or as AVPlayer settings on the client that states the video is a livestream and so cannot be handled through range requests, or that it must request chunks of footage at a time.
I've looked online and found some useful documents regarding the subject of livestreaming, though all of them are surrounding use of HLS and m3u playlist files. However, changing the back-end to generate m3u playlist files and to decode the video to work out the durations for the chunks correctly would probably take more weeks and months of development time, and I don't understand why it'd be necessary, given that I'm only exposing a single resolution of a single video file that does not need to seek, and that it does work perfectly fine on android.
After having spent so long and having come across so many different hard to resolve issues it's starting to feel like I've somehow gone down the wrong path and that I'm going about this completely the wrong way.
My question is two-fold
Does AVPlayer support live footage served directly from a fragmented MP4 file?
If so, how do I implement it?

How can I stream MP4 videos from S3 without AVPlayer downloading the files before playing them?

I have a lot of long (45 mins - 90 mins) MP4 videos in a public S3 bucket and I want to play them in my iOS app using AVPlayer.
I am using AVPlayerViewController to play them but I need to wait several minutes before they start playing as it downloads the whole video rather than streaming it.
I am caching it locally so this is only happening the first time but I would love to stream the video so the user doesn't have to wait for the entire video to download.
Some people are pointing out that I need Cloudfront to stream videos but in the documentation, I've read that this is only necessary when you have many people streaming the same file. I'm building a MVP so I only need a simple solution.
Is there any way to stream an MP4 video from an S3 bucket with AVPlayerViewController without it fully downloading the file before playing it to the user?
TLDR
AVPlayer does not support 'streaming' (HTTP range requests) as you would define it, so either use an alternative video player that does or use a real media streaming protocol like HLS which is supported by AVPlayer & would start the video before downloading it all.
CloudFront is great for delivery in general but is not truly needed - you may have seen it mentioned due to CloudFront RTMP distributions but they now have been discontinued.
Detailed Answer
S3 supports a concept called byte-range fetches using HTTP range requests - you can verify this by doing a HEAD request to your video file & seeing that the Accept-Ranges header exists with a value set to bytes (or not 'none').
Load your MP4 file in the browser & notice that it can start as soon as you click play. You're also able to move to the end of the video file and yet, you haven't really downloaded the entire video file. HTTP range requests are what allow this mechanism to work. Small chunks of the video can be downloaded as & when the user gets to that part of the video. This saves the file server & the user bandwidth while providing a much better user experience than the client downloading the entire file.
The server would need to support byte-range fetches in the first instance before the client can then decide to make range requests (or not to). The key is that, once the server supports it, it is up to the HTTP client to decide whether it wants to fetch the data in chunks or all in one go.
This isn't really 'streaming' as you know it & are referring to in your question but it is more 'downloading the video from the server in chunks and playing it back' using HTTP 206 Partial Content responses.
You can see this in the Network tab of your browser as a series of multiple 206 responses when seeking in the video. The entire video is not downloaded but the video is streamed from whichever position that you skip to.
The problem with AVPlayer
Unfortunately, AVPlayer does not support 'streaming' using HTTP range requests & HTTP 206 Partial Content responses. I've verified this manually by creating a demo iOS app in Xcode.
This has nothing to do with S3 - if you stored these files on any other cloud provider or file server, you'd see that the file is still fully loaded before playing.
The possible solutions
Now that the problem is clear, there are 2 solutions.
Using an alternative video player
The easiest solution is to use an alternative video player which does support byte-range fetches. I'm not an expert in iOS development so I sadly can't help in recommending an alternative but I'm sure there'll be a popular library that the industry prefers over the in-built AVPlayer. This would provide you with your (extremely common) definition of 'streaming'.
Using a video streaming protocol
However, if you must use AVPlayer, the solution is to implement true media streaming with a video streaming protocol - true streaming also allows you to leverage features like adaptive bitrate switching, live audio switching, licensing etc.
There are quite a few of these protocols available like DASH (Dynamic Adaptive Streaming over HTTP), SRT (Secure Reliable Transport) & last but not least, HLS (HTTP Live Streaming).
Today, the most widely used streaming protocol on the internet is HLS, created by Apple themselves (hey, maybe the reason to not support range requests is to force you to use the protocol). Apple's own documentation is really wonderful for delving deeper if you are interested.
Without getting too much into protocol detail, HLS will allow playback to start more quickly in general, fast-forwarding can be much quicker & delivers video as it is being watched for the true streaming experience.
To go ahead with HLS:
Use AWS Elemental MediaConvert to convert your MP4 file to HLS format - the resulting output will be 1 (or more) .M3U8 manifest files in addition to .ts media segment file(s)
Upload the resulting output to S3
Point AVPlayer to the .M3U8 file
let asset = AVURLAsset(url: "https://ermiya.s3.eu-west-1.amazonaws.com/videos/video1playlist.m3u8")
let item = AVPlayerItem(asset: asset)
...
Enjoy near-instant loading of the video
CloudFront
In regards to Amazon CloudFront, it isn't required per se & S3 is sufficient in this case but a quick Google search will mention loads of benefits that it provides, especially caching which can help you save on S3 costs later on.
Conclusion
I would go with converting to HLS if you can, as it will yield more possibilities down the line & is a better true streaming experience in general, but using an alternative video player will work just as well due to iOS AVPlayer restrictions.
Whether to use CloudFront or not, will depend on your user base, usage of S3 and other factors.
As you're creating an MVP, I would recommend just doing a batch conversion of your MP4 files to HLS format & not using CloudFront which would add additional complexity to your cloud configuration.
Like #ErmiyaEskandary said, you could just use HLS to solve your problem, which is probably a good idea, but you should not have to wait for the entire MP4 file to download before playing it with AVPlayer. The issue is actually not with AVPlayer or byte-range requests at all, but rather with how your MP4 files are formatted.
You could have your MP4 file configured incorrectly for streaming. MP4's have a metadata section called the MOOV atom. By default, many encoders put this at the back of the file. In this case, the player would have to download the entire file before it could begin playing.
For streaming usecases, this would need to be put at the front of the file. The player then will only need to buffer the MOOV atom, and it can begin playing the video as the data is loaded.
You can use ffmpeg with the fast start flag enabled to move the MOOV atom to the beginning of the file.

Google cloud speech API streaming: RequestObserver.onCompleted() still causes "OUT_OF_RANGE: Exceeded maximum allowed stream duration" error

Using the Google cloud speech API's streaming requires the length of a streaming session to be less than 60 seconds. To handle streaming beyond this limitation, we would require to split the streaming data in to several chunks, using something like single_utterance, for example. When submitting such chunk, we use the "RequestObserver.onCompleted()" to mark the end of streaming session. However it seems like the Grpc threads which were created to handle streaming running even after getting the final results, causing the error "io.grpc.StatusRuntimeException: OUT_OF_RANGE: Exceeded maximum allowed stream duration of 65 seconds".
Is there any other mechanisms that could be used to properly terminate Grpc threads, so that it won't run until the allowed 60 second limit? (Seems like we could use SpeechClient.close() or SpeechClient.shutdown() to release all background resources, but would require recreating the SpeechClient instances again. This would be somewhat heavy.)
Is there any other recommended ways that we could use to stream data beyond the 60 second limit, without splitting the stream to several chunks.
Parameters : [Encoding=LINEAR16,Rate=44100]

How to build a simple video streaming server?

I am a newbie in video streaming and I just build a sample website which plays videos. Here i just give the video file location to the video tag in html5. I just noticed that in youtube the video tag contains the blob url and had a look into this. I found that the video data comes in segments and came across a term called pseudo streaming. Whereas it seems likes the website that i build downloads the whole file and plays the video. I am not trying to do any live streaming, just trying to stream local videos. I thought maybe the way video data is received in segments is done by a video streaming server. I came across RED5 open source streaming server, but most of the examples that is given is for live streaming which I am not experimenting on. Its been few days and I am not sure whether i am on the right track
The segmented approach you refer to is to support Adaptive Bit Rate streaming - ABR.
ABR allows the client device or player download the video in chunks, e.g 10 second chunks, and select the next chunk from the bit rate most appropriate to the current network conditions. See here for an example:
https://stackoverflow.com/a/42365034/334402
For your existing site, so long as your server supports range requests then you probably are not actually downloading the whole video. With Range Requests, the browser or player will request just part of the file at a time so it can start playback before the whole file is downloaded.
For MP4 files, it is worth noting that you need to have the header information, which is contained in a 'block' or 'atom' called MOOV atom, at the start of the file rather than the end - it is at the end for regular MP4 files. There are a number of tools which will allow you move it to the start - e.g.:
http://multimedia.cx/eggs/improving-qt-faststart/
You are definitely on the right track with your investigations - video hosting and streaming is a specialist area so it is generally easier to leverage existing streaming technologies and services rather than to build them your self. Some good places to look to get a feel for open source solutions:
https://gstreamer.freedesktop.org
http://www.videolan.org/vlc/streaming.html

optimize upload videos in different signal strength

I have a question, my app is a short video share application just like vine, but now I encounter questions when used in subway or some places with weak signals, it will fail sometimes and have poor user experience.
I am a newbie for network programming and iOS. I did a lot search on Google, and have some general sense, let me sum up my finds and pls help to give some suggestions for it.
My requirement is:1. support resume when uploading interrupt. 2. can success upload in weak signal. Actually I do NOT need to think about the realtime problems or how to compress the video, just think the video as a file is totally ok. BTW the server is a REST style, I use post to upload datas.
Questions:
which is the better way for my requirement, using stream(stream NOT mean live stream video just data stream like NSOutStream&NSInputStream, just play the video after all of it has uploaded, NOT the live stream video playing and downloading at meantime) or divide the whole file into several chunks and upload chunk by chunk.
someone said, using stream is good for resource efficiency since the stream will read files into memory and control the size of the buffer and after setup connection with server we use delegate to control the failure so easy to use.
Upload chunk by chunk is good at speed, I have puzzled with this statement, upload by chunks after successfully upload one chunk we need to release the connection resources and setup another connection then do upload I think this will spend time to do these preparation stuffs.
If upload by chunks which size should be good, one video file is almost 1M bytes, someone said 8k is a safe choice, but......
since the app needs to adapt to different signal strength, is there any way? for example the chunk size is depended on the bandwidth or other ways
Is there any private API already support resume uploading interrupt or is there any apple api can support this, my app needs to run on iOS 5 and above so can NOT use NSURLSession
Concurrent uploading is a way to speed up? If so how to implement or any API available?
Thank you in advance for helping a newbie like me. Thank you very much.
It takes o lot of topics your question. iOS doesn't have an public API to stream video (such as the face time components). The main issue here is sending frame by frame will require a lot of network traffic, instead if you use the normal video writer you get hardware compression, that will be a lot better. There's more and you can check here: Realtime Audio/Video Streaming FROM iPhone to another device (Browser, or iPhone), Upload live streaming video from iPhone like Ustream or Qik, How send to stream video from iOS device to server? and here
If real time is not your problem I would suggest you just to use a good network manager such as: MKNetworkkit or AFNetworking 2.0 . They will take care of most of the aspect that you asked.

Resources