I'm currently experiencing an intermittent issue with some VOIP WebRTC voice calls.
The symptom is that the outbound audio can sometimes fade in and out and sounds extremely muffled or even disappears momentarily. The 2 audio files reference here show examples or a snippet from a good call and then a bad call, both very close together. The audio is captured server side.
Good quality call - https://s3-eu-west-1.amazonaws.com/audio-samples-mlcl/Good.mp3
Poor quality call - https://s3-eu-west-1.amazonaws.com/audio-samples-mlcl/Poor.mp3
The tech stack is comprised of…
Electron application running on Mac/Windows
Electron wraps Chromium v66
WebRTC used within Chromium
OPUS codec used from client to server.
Wired network connection (stats show no packet loss and Jitter, RTT and delay are all very low)
SRTP used for media between client and TURN server (Coturn)
Cotur
Janus WebRTC Gateway
Freeswitch
These are using high-quality headsets and have been tested with various different manufacturers connecting to the Mac/Windows using USB.
Any ideas/help would be greatly be appreciated.
This might be a result of auto gain control. Try disabling it by passing autoGainControl: false to getUserMedia. In Chrome/electron googAutoGainControl and googAutoGainControl2 might still work.
Related
I'm looking for a way to implement real-time streaming of video (and optionally audio) from iOS device to a browser. In this case iOS device is a server and browser is a client.
Video resolution must be in the range 800x600-1920x1080. Probably the most important criteria is lag that should be less than 500 msec.
I've tried a few approaches so far.
1. HLS
Server: Objective-C, AVFoundation, UIKit, custom HTTP-server implementation
Client: JS, VIDEO tag
Works well. Streams smoothly. The VIDEO tag in the browser handles incoming video steam out of the box. This is great! However, it has lags that are hard to minimize. It feels like this protocol was built for non-interactive video streaming. Something like twitch where a few seconds of lag is fine.
Tried Enabling Low-Latency. A lot of requests. A lot of hassle with the playlist. Let me know if this is the right option and I have to push harder in this direction.
2. Compress every frame into JPEG and send to a browser via WebSockets
Server: Objective-C, AVFoundation, UIKit, custom HTTP-server implementation, WebSockets server
Client: JS, rendering via IMG tag
Works super-fast and super-smooth. Latency is 20-30 msec! However, when I receive a frame in a browser, I have to load it using loading from a Blob field via base64 encoded URL. At the start, all of this works fast and smoothly, but after a while, the browser starts to slow down and lags. Not sure why I haven't investigated too deeply yet. Another issue is that frames compressed as JPEGs are much larger (60-120kb per frame) than MP4 video stream of HLS. This means that more data is pumped through WiFi, and other WiFi consumers are starting to struggle. This approach works but doesn't feel like a perfect solution.
Any ideas or hints (frameworks, protocols, libraries, approaches, e.t.c.) are appreciated!
HLS
… It feels like this protocol was built for non-interactive video streaming …
Yep, that's right. The whole point of HLS was to utilize generic HTTP servers as media streaming infrastructure, rather than using proprietary streaming servers. As you've seen, several tradeoffs are made. The biggest problem is that media is chunked, which naturally causes latency of at least the size of the chunk. In practice, it ends up being the size of a couple chunks.
"Low latency" HLS is a hack to return to the methods we had before HLS, with servers that just stream content from the origin, in a way compatible with all the HLS stuff we have to deal with now.
Compress every frame into JPEG and send to a browser via WebSockets
In this case, you've essentially recreated a video codec, and added the overhead of Web Sockets. Also, with the base64 encoding rather than sending it binary, you're adding extra CPU and memory requirements, as well as ~33% overhead in bandwidth.
If you really wanted to go this route, you could simply use MediaRecorder, an HTTP PUT request, stream the output of the recorder, send it to the server, to relay on to the client over HTTP. The client then just needs a <video> tag referencing some URL on the server, and nothing special to playback. You'll get nice low latency without all the overhead and hassle.
However, don't go that route. Suppose the bandwidth drops out? What if some packets are lost and you need to re-sync? How will you set up communication between each end to continually adjust quality, buffering, codec negotiation, etc.? What if peer-to-peer connections are advantageous?
Use WebRTC
It's a full purpose-built stack for maintaining low latency. Libraries are available for most any stack on most any platform. It works in browsers.
Rather than reinventing all of this, you can take advantage of what's there.
The downside is complexity... it isn't easy to get started with, but well worth it for most low latency use cases.
I need a multi-window app to share media streams. Is there anyway to do that? In nw.js I can create a proof of concept, where a MediaStream created in one window can be played in the other, but it appears I cannot do this in Electron. Am I correct?
I know for certain that it's possible with WebRTC to stream audio/video from a MediaStream to another window process. Been there, done that, based on the electron-peer-connection library (it makes the process quite easy, actually).
Unfortunately, there are a lot of limitations to consider if you take this approach (WebRTC will compress your audio with lossy compression, you'll have a big latency, an Electron bug currently causes the audio to become mono, things like that).
So this is fine for things like voice, but not for e.g. high-end native-quality audio processing.
Additionally, if your app is not a monster beast with insane performance requirements, you can also use Web Audio API and a ScriptProcessorNode (AudioWorklet is still not available in Electron) to access audio sample data from the MediaStream directly, and send that over with standard electron-IPC.
You can then rebuild the MediaStream in the other window process using Web Audio API and MediaStreamDestinationNode.
You should be able to communicate between windows using the ipc module by emitting events through main process and add listeners for them in the windows.
I am trying to create a RTSP client which live broadcast Audio and Video. I modified the iOS code at link http://www.gdcl.co.uk/downloads.htm and able to broadcast the Video to server properly. But now i am facing issues in broadcasting the audio part. In the link example the code is written in such a way that it writes the Video data to file and than reads the data from the file and upload the NALU's video packets to RTSP server.
For Audio part i am not sure how to proceed on it. Right now what i have tried is that get the audio buffer from mic and than broadcast it to the server directly by adding RTP headers and ALU.. but This approach is not properly working as Audio starts lagging behind and lag increases with time. Can someone let me know if there is some better approach to achieve this and with lip sycn audio/video.
Are you losing any packets on the client? If so, you need to leave "space." If you receive packet 1,2,3,4,6,7, You need to leave space for the missing packet (5).
The other possibility is a what is known as a clock drift problem. The clock (crystal) on your client and server are not perfectly in sync with each other.
This can be caused by environment, temperature changes, etc.
Let's say in a perfect world your server is producing audio samples 20ms audio samples at 48000 hz. Your client is playing them back using a sample rate of 48000 hz. Realistically your client and server are not exactly 48000hz. Your server might be 48000.001 and your client might be 47999.9998. So your server might be delivering faster than your client or vise versa. You would either consume packets too fast and under run the buffer or lag too far behind and overflow the client buffer. In your case, it sounds like the client is playing back too slow and slowly lagging behind the server. You might only lag a couple milliseconds per minute but the issue will keep continuing and it will look like a 1970s lip synced Kung Fu movie.
In other devices, there is often a common clock line to keep things in sync. For example, Video camera clocks, midi clocks. multitrack recorder clocks.
When you deliver data over IP, there is no common clock shared between a client and server. So your issue concerns syncing clocks between disparate devices with no. I have successfully solved this problem using this general approach:
A) Let the client count the rate of packets that come in over a period of time.
B) Let the client count the rate that the packets are consumed (played back).
C) Adjust the sample rate of the client based on A and B.
So your client requires that you adjust the sample rate of the playback. So yes you play it faster or slower. Note that the playback rate change will be very very subtle. You might set the sample rate to be 48000.0001 hz instead of 48000 hz. The difference in pitch would be undetectable by humans as it would only cause a fraction a cent difference in pitch. I gave an explanation of a very simplified approach. There many other nuances and edge cases that must be considered when developing such a control system. You don't just set it and forget it. You need a control system to manage the playback.
An interesting test to demonstrate this is to take two devices with the exact same file. A long recording (say 3 hours) is best. Start them at the same time. After 3 hours of playback, you will notice that one is ahead of the other.
This post explains that it is NOT a trivial task to stream audio and video.
We have an FFMPEG stream being streamed to mobile devices. We're using the HTML5 <video src="..." webkit-playsinline> tag to display the video inline (inside a real-time streaming app). We've managed to reduce the delay at the FFMPEG end down to the minimum but there's still a lag at the iOS end, where the player presumably buffers for a couple of seconds.
Is there any way to reduce the client-side delay?
We need as close to real-time as possible and skipping is acceptable.
If you are using an HTML5 video tag then the iOS device will use Quicktime to playback the video. Apple offers no control over internal mechanism like buffer settings for its Quicktime player. For a project on Apple TV I even work with a guy in Cupertino at Apple and they just won't allow any access to the information you would require on their device.
Typically if you use HLS:
Is this a real-time delivery system?
No. It has inherent latency corresponding to the size and duration of the media files containing stream segments. At least one segment must fully download before it can be viewed by the client, and two may be required to ensure seamless transitions between segments. In addition, the encoder and segmenter must create a file from the input; the duration of this file is the minimum latency before media is available for download. Typical latency with recommended settings is in the neighborhood of 30 seconds.
What is the latency?
Approximately 30 seconds, with recommended settings. See question #15.
For live streaming scenario on iOS you better off tuning the streaming chain before the actual player:
capture -> transcoding -> upload -> streaming server -> delivery -> playback
Using ffmpeg you can tune for zero lantency streaming at transcoding level which I understand you have done. After that using a well established streaming server like Wowza and CDN delivery will help you get there (of course at a certain cost - and assuming you need a streaming server which you may not).
If you go all native for your iOS app you may look at MPMoviePlayerController. I have no experience with native app code in iOS so I let you decide if it is worth the time (still I doubt it will be possible because of the underlying Quicktime/HLS layer).
I also came across this which sounds interesting but I have not tested it and even with such an approach you will face limitations.
Even if it may not be the answer you were looking for I hope this helps.
I would like to write a web application that allows me to sync audio playback of an MP3 down to ~50ms, or close enough that the human ear can't detect the difference.
The idea would be that two or more smartphones could each be paired to a bluetooth speaker, and two or more speakers would play the same audio at the exact same time.
How would you suggest I go about setting this up, both client-side and server-side? I'm planning to use Rails/Ruby for backend, and iOS/obj c for mobile dev.
I had though of the idea of syncing to a global/atomic clock on the server, and having the server provide instructions to clients on when to start playing/jump in to an already playing track. My concern is that, if I want to stream the audio, that it will be impossible to load a song into memory and start playback accurately on the millisecond level.
Thoughts?
The jitter in internet packet delivery will be too large, so forget about syncing over the internet. However you could check the accuracy of NTP which is still used (I guess, I know that older UNIX's used it) by the OS when you switch on automatic date/time in Settings, but my guess is that it won't be good enough either. But perhaps the OS may also use other time sources like GPS; I'm don't know how iOS does it but accuracy within 20ms is not to be expected. You could create experimental app to check it out.
So, what's left is a sync closer to home, meaning between the devices directly. Of course you need to make sure that all devices haves loaded (enough of) the song, and have preloaded it in AVAudioPlayer or whatever you're using, to be able to start playing immediately. (It may actually not be the best idea to use higher level 'AVAudioPlayer` API's as it may give higher delays, and what more important higher jitter, than lower level API's.)
Here are three ideas (one device needs to be master triggering the start play, the others are slaves that are waiting for the trigger):
Use an audio trigger pulse, like a high tone of a defined length and frequency. Then use FFT to recognise this tone.
Connect the devices via GameKit Bluetooth and transmit the trigger on these connections.
Use the iPhone 4+ flash as trigger: flash in a certain pattern. This would require you to sample the video data which is quite doable and can be very fast.
I'm going with a solution that uses an atomic clock for synchronization, and an external service that allows server instructions/messages to be sent to all devices in close sync.