Is there any way to get buffer size in bytes? - youtube

I'm studying network in college.
I want to know buffer size of Youtube live streaming.
If I use statistics I can only get buffer size in seconds, but I want to get this buffer size in bytes.
I tried to calculate streaming rate(in bytes) multiple with buffer size(In seconds), but it doesn't show me accurate buffer size.
Is there any way to get accurate buffer size of live streaming in bytes?
Thanks

Havent personally tried this yet. Youtube has a deprecated function namely getVideoBytesLoaded() which returns the number of bytes loaded for the current video. It's now been replaced with getVideoLoadedFraction().

Related

How to use TPCircularBuffer for Video?

We have a VoIP app for iOS platform. Where we are using TPCircularBuffer for audio buffering and it's performance is so good.
So i was wondering if it's possible to use TPCircularBuffer for Video buffering also. I have searched a lot but didn't find anything useful on "Using TPCircularBuffer for Video". Is that even possible ?? If yes, then can anyone shade some light on it ? And any code sample would be highly appreciated.
I guess you could copy your video frame's pixels into a TPCircularBuffer, and you'd technically have a video ring buffer, but you've already lost the efficiency race at that point because you don't have time to copy that much data around. You need to keep a reference to your frames.
Or, if you really wanted to mash a solution into TPCircularBuffer, you could write the CMSampleBuffer pointers into the buffer (carefully respecting retain and release). But that seems heavy handed, as you're really not gaining anything from TPCircularBuffer's magical memory mapping wrapping because pointers are so small.
I would simply make my own CMSampleBufferRef ring buffer. You can grab a prebuilt circular buffer or do the clock arithmetic yourself:
CMSampleBufferRef ringBuffer[10]; // or some other number
ringBuffer[(++i) % 10] = frame;
Of course your real problem is not the ring buffer itself, but dealing with the fact that decompressed video is very high bandwidth, e.g. each frame is 8MB for 1080p, or 200MB to store 1 second's worth at 24fps, so you're going to have to get pretty creative if you need anything other than a microscopic video buffer.
Some suggestions:
the above numbers are for RGBA, so try working in YUV, where the numbers become 3MB and 75MB/s
try lower resolutions

FFmpeg set playing buffer to 0

I'm using FFmpeg to play RTMP streams inside an iOS application. I use av_read_frame to get the audio and video frames. I need the latency to be as small as possible at all times, but if there's a bottleneck or the download speed decreases, av_read_frame is blocked. Of course, this is how it should work. The problem is that FFmpeg waits too much. As much as it needs to fill its buffers. I need to set those buffers to a value close to 0. Right now, I'm dropping buffered packets "manually" to get the latency at the initial value. The result is the desired one, but I'd wish FFmpeg won't buffer in the first place...
Can anyone help me with this? Thanks in advance.

Length (Time) of an (non-vbr) mp3 file

Im currently researching the mp3 format in order to build an mp3 decoder.
After some thinking I figured out that the simplest way to calculate the length of the song would be to divide the size by the bitrate (taking in account the size of the ID3 tag etc.), and transform the result to minutes. Using this method on a few songs I got accurate times.
I always assumed the time of the song is the length of the pure audio data, but in this method, frames are also "considered" part of the song (when calculating the time).
Also, the I understood that the audio data in the mp3 file is compressed, so when its decompressed the size of it will be larger of course, and then the time calculation seems un accurate.
Am I missing something here? because it just doesnt make any sense to me that the songs length is calculated with the compressed data and not the uncompressed ones, and frames which are a DWORD each are not ignored.
I always assumed the time of the song is the length of the pure audio data, but in this method, frames are also "considered" part of the song (when calculating the time). Also, the I understood that the audio data in the mp3 file is compressed, so when its decompressed the size of it will be larger of course, and then the time calculation seems un accurate.
When a media stream, such as an MP3 file, is compressed with a constant bitrate, that bitrate reflects the compressed size of the data, not the uncompressed size. So your math is fine.
What will throw you off with this approach is metadata tags (e.g, ID3) -- those are part of the file size, but are not counted in the bitrate, since they aren't audio data. Luckily, those tend to be relatively small, so they won't affect your results much.

How to change a DirectShow renderer's buffer size if it's input pin doesn't support IAMBufferNegotiation?

I have a DirectShow application written in Delphi 6. I want to reduce the buffer size of the Renderer from its current 500 ms value to something smaller. The problem is, its input pin does not support IAMBufferNegotiation, which is odd since the renderer is the ear piece on my VOIP phone and it would obviously need a smaller buffer size to avoid an unpleasant delay during phone calls.
I tried a loopback test in Graph Edit connecting the VOIP phones' capture filter (microphone) to the renderer (ear piece). I know the buffer size is 500 ms because that's what Graph Edit shows for the renderer's properties. However, when I use the VOIP phone in a Skype call the delay is much shorter, about 50-100 milliseconds as I would expect.
So Skype knows how to change the renderer's default buffer size. How can I do the same trick?
Output pin is normally responsible for setting up the allocator, and IAMBufferNegotiation is typically available on the output pin. You want to reduce buffers size at capture filter's output pin only, and it will generate small buffers which are going to travel through the graph being still small buffers and small chunks of data, so reducing buffer sizes at intermediate filters is not necessary.

What is the decoded output of a video codec?

Folks,
I am wondering if someone can explain to me what exactly is the output of video decoding. Let's say it is a H.264 stream in an MP4 container.
From displaying something on the screen, I guess decoder can provider two different types of output:
Point - (x, y) coordinate of the location and the (R, G, B) color for the pixel
Rectangle (x, y, w, h) units for the rectangle and the (R, G, B) color to display
There is also the issue of time stamp.
Can you please enlighten me or point me the right link on what is generated out of a decoder and how a video client can use this information to display something on screen?
I intend to download VideoLAN source and examine it but some explanation would be helpful.
Thank you in advance for your help.
Regards,
Peter
None of the above.
Usually the output will be a stream of bytes that contains just the color data. The X,Y location is implied by the dimensions of the video.
In other words, the first three bytes might encode the color value at (0, 0), the second three byte the value at (0, 1), and so on. Some formats might use four bytes groups, or even a number of bits that doesn't add up to one byte -- for example, if you use 5 bits for each color component and you have three color components, that's 15 bits per pixel. This might be padded to 16 bits (exactly two bytes) for efficiency, since that will align data in a way that CPUs can better process it.
When you've processed exactly as many values as the video is wide, you've reached the end of that row. When you've processed exactly as many rows as the video is high, you've reached the end of that frame.
As for the interpretation of those bytes, that depends on the color space used by the codec. Common color spaces are YUV, RGB, and HSL/HSV.
It depends strongly on the codec in use and what input format(s) it supports; the output format is usually restricted to the set of formats that are acceptable inputs.
Timestamp data is a bit more complex, since that can be encoded in the video stream itself, or in the container. At a minimum, the stream would need a framerate; from that, the time of each frame can be determined by counting how many frames have been decoded already. Other approaches, like the one taken by AVI, is to include a byte-offset for every Nth frame (or just the keyframes) at the end of the file to enable rapid seeking. (Otherwise, you would need to decode every frame up to the timestamp you're looking for in order to determine where in the file that frame is.)
And if you're considering audio data too, note that with most codecs and containers, the audio and video streams are independent and know nothing about each other. During encoding, the software that writes both streams into the container format does a process called muxing. It will write out the data in chunks of N seconds each, alternating between streams. This allows whoever is reading the stream to get N seconds of video, then N seconds of audio, then another N seconds of video, and so on. (More than one audio stream might be included too -- this technique is frequently used to mux together video, and English and Spanish audio tracks into a single file that contains three streams.) In fact, even subtitles can be muxed in with the other streams.
cdhowie got most of it.
When it comes to timestamps the MPEG4 container contains tables for each frame that tells the video client when to display each frame. You should look at the spec for MPEG4. You normally have to pay for this I think but it's definitely downloadable from places.
http://en.wikipedia.org/wiki/MPEG-4_Part_14

Resources