When programs such as Skype streams video from a user to another and vice versa, how is that usually accomplished?
Does client A stream to a server, and server sends it to client B?
or does it go directly from client A to B?
Feel free to correct me if i am way off and none of those is correct.
Skype is much more complicated than that, because it is Peer to Peer, meaning that your stream may travel through several other skype clients, acting as several servers. Skype does not have a huge central system for this. Skype always keeps track of multiple places that it can deliver your stream to, so that if one of these places disappear (that Skype client disappears), then it will continue sending through another server/skype-client. This is done so efficiently, that you don't notice the interruption.
Basically , this is how its achieved.
1) encode video / audio using the best compression you can get. Go lossy compression and plenty of aliasing to throw away portions of video and audio which is not usable. Like removing background hiss
2) pack video / audio into packets and put a timestamp on them. The packets are usually datagrams.
3) send packets directly to destination. Use the most appropriate route. You dont have to send all packets the same way. Use many routes if possible. P2P networks often use many routes to the same destination
4) re-encode on the destination. If a packet is too old , throw it away. If packets are lost , dont bother about it since its too late.
5) join the video back and fill in the missing frames the best you can.
Related
I am trying to build an iOS application that streams audio coming directly from the input (or mic) of a device. What I am thinking is that every certain period of time, I'd have to send the audio buffer to the server, so that the server sends it to another client that might want to listen. I am planning to use WebSockets for the server-side implementation.
Is there a way to grab just a specific stream of buffer from the input (mic) of the iOS device and send it to the server while the user speaks another bit and so on and so forth? I am thinking that if I could start an AVAudioRecorder perhaps with AVAudioEngine and record every 1 second or half a second, but I think that that would create too much of a delay and possibly lost streams in the transition process.
Is there a better way to accomplish this? I am really interested in understanding the science behind it. If this is not the best approach please tell me which one it is and maybe a basic idea for its implementation or something that could point me in the right direction.
I found the answer to my own question!! The answer lies in the AVFoundation framework, specifically AVCaptureAudioDataOutput and its delegate that will send you a buffer as soon as the input source captures it.
I'm trying to learn how to do pseudo streaming for MP4 files. I can't think of a good way to do it, but I just found a great example app has similar implementation (except I don't understand how it does it yet)
Here's the scenario:
Alice can send a video to Bob in the app
Bob can open it immediately and see Alice's video, from beginning, while Alice is still recording it
Also, Bob can choose to view the video later after Alice finished recording. But Bob should be able to view the video instantly without waiting too much time, even when the whole size of the video is large.
Thus, my hunch is, it's using some sort of pseudo streaming for mp4.
Here's the screenshots of the requests Alice's phone makes while using the example app:
The screenshot suggests, the example app is making an array of PATCH requests to their server, every 0.x seconds. And finally, the very last request will make a PATCH to update the moov information for this MP4.
Thus my question is, how is this implemented (any educated guess will be welcomed)? Or is there any sort of existing protocol/iOS encoder that I didn't know is doing this already?
Thanks a lot!
Reading the text of your question rather than the title, I think there are a number of likely steps:
Alice is recording video
She is ending the video to a streaming server
Alice notifies Bob that the stream is available and sends the URL on the streaming server that Bob can access to retrieve the stream
Bob's video client requests the stream, using range request to download it chunk by chunk
Have a server in the middle like this is a typical approach for any stream which may have more than one client watching it.
More sophisticated streaming servers may also support delivery the stream in different bit rates and even encoded with different codecs for maximum device reach.
There are commercial (e.g. https://www.wowza.com) and open source streaming servers (e.g. https://gstreamer.freedesktop.org) you can look at to get more info on streaming servers and to see some examples.
Can anybody explain what is the basic concept in mixing in Kurento media server?
As it is mentioned in what kurento provides, there is a term mixing. So, I would like to know what kurento Media server mixes. As,
Do it mix multi stream generated by a user into one stream and broadcast that stream to other receiving user? If it does this how to use this concept
Do kurento able to receive multi-streams through one PeerConnection object with user, i.e., at one WebRtcEndPoint Kurento can receive or send multi stream by mixing those streams into one stream?
Edit Regarding Answer Update
So, I can use mixing concept by using Hubport.
Now, do this HubPort supports different MediaTypes. As, if one user is streaming its screen sharing and at the same time he is streaming its audio also. So, do this composite element mix both the streams to one and stream one single stream to all other users?
The concept of mixing refers to combining several media streams into one. This can be better understood with a conference room. In other setups, every user would have one stream going out, and another coming in for each other participant (except himself). That leaves you with 1 + (n -1) = n streams per participant. This results in n * n streams total, where n is the number of participants.
Mixing all streams in the media server allows you to save bandwidth, ideal in scenarios like mobile devices connected through 3G, for instance. What the mixer does, it combines all the streams into one, so each user is sending one stream, and receiving one stream that has all the combined participant's media (except his own). So just two streams per user saves a lot of bandwidth.
This, however, has a toll on CPU consumption, as it's necessary to adapt the videos to the new resolution, combine them... there is some processing involved.
On the other hand, the concept you are referring to is multicast, which is the ability to send several streams through one WebRTC connection. This doesn't save bandwidth, nor combines all the streams into one, but helps you reduce the number of endpoints present in your deployment. this is in our roadmap, but can't tell you when that'll be.
EDIT
Mixing can be achieved in the media server through the Composite media element. You can check this other SO answer for more info on how to use that media element.
I'm trying to send music over bluetooth from one iOS device to another. I've been using this to build packets like in Ray Wenderlich's SNAP tutorial, but I've been having trouble reconstructing the packet information on the receiving phone. I have tried using https://github.com/abbood/iphoneAudioSyncer but I think it is too complicated for my needs (since I do not need synced playing). What is the simplest buffer approach that accounts for things like lost/out of order packets? I have read through a lot of CoreAudio stuff but it is very dense, so I would appreciate help from someone who has tackled this type of problem.
when you talk about los/out of order packets.. you're talking about the topic of Packet Loss Concealment.. which is a very dense topic (I mean if you think core audio is dense.. wait till you dive into PLC).
In a nutshell, there are many ways to deal with packet loss.. but the simplest way (which I advise you to do) is to replace the lost packets with silence (same goes with out of order packets.. if a packet is out of order.. just discard it).
that being said.. you are dealing with audio that is streamed to you (ie sent via the bluetooth/wifi network).. which means in almost 100% of the time it's compressed audio you're getting (ie Variable Bit Rate audio VBR).. if you simply try to substitute lost VBR packets with silence.. you'll run into this problem. You'll either have to insert silence packets in the same compression format as the VBR audio you're dealing with, or you will have to convert your VBR compressed audio into non-compressed audio (Lossless PCM), then insert zeros in place of the missing packets.
A potential client has come to me asking for a an app which will stream a six hour audio file. The user needs to be able to set the "playback head" to any position along the file. Presumably, this means that the app must not be forced to download the entire file before it beings playing back starting at an arbitrary
An added complication -- there are actually four files which need to be streamed and mixed simultaneously.
My questions are:
1) Is there an out-of-the box technology which will allow me random access of streaming audio, on iOS? Can this be done with standard server technology and a single long file, or will it involve some fancy server tech?
2) Which iOS framework is best suited for this. Is there anything high-level that would allow me to easily mix these four audio files?
3) Can this be done entirely with standard browser technology on the client side? (i.e. HTML5)
Have a close look at the MP3 format. It is remarkably easy and efficient to parse, chop up into little bits, and reassemble into a custom stream.
Hence rolling your own server-side code to grab what you want and send to the client will not be as crazy or difficult as it may sound.
MP3 is also widely supported by various clients. I strongly suspect any HTML5 capable browser will be able of play the stream you generate via a long-lived bit-rate regulated HTTP request.