Im currently researching the mp3 format in order to build an mp3 decoder.
After some thinking I figured out that the simplest way to calculate the length of the song would be to divide the size by the bitrate (taking in account the size of the ID3 tag etc.), and transform the result to minutes. Using this method on a few songs I got accurate times.
I always assumed the time of the song is the length of the pure audio data, but in this method, frames are also "considered" part of the song (when calculating the time).
Also, the I understood that the audio data in the mp3 file is compressed, so when its decompressed the size of it will be larger of course, and then the time calculation seems un accurate.
Am I missing something here? because it just doesnt make any sense to me that the songs length is calculated with the compressed data and not the uncompressed ones, and frames which are a DWORD each are not ignored.
I always assumed the time of the song is the length of the pure audio data, but in this method, frames are also "considered" part of the song (when calculating the time). Also, the I understood that the audio data in the mp3 file is compressed, so when its decompressed the size of it will be larger of course, and then the time calculation seems un accurate.
When a media stream, such as an MP3 file, is compressed with a constant bitrate, that bitrate reflects the compressed size of the data, not the uncompressed size. So your math is fine.
What will throw you off with this approach is metadata tags (e.g, ID3) -- those are part of the file size, but are not counted in the bitrate, since they aren't audio data. Luckily, those tend to be relatively small, so they won't affect your results much.
Related
I'm studying network in college.
I want to know buffer size of Youtube live streaming.
If I use statistics I can only get buffer size in seconds, but I want to get this buffer size in bytes.
I tried to calculate streaming rate(in bytes) multiple with buffer size(In seconds), but it doesn't show me accurate buffer size.
Is there any way to get accurate buffer size of live streaming in bytes?
Thanks
Havent personally tried this yet. Youtube has a deprecated function namely getVideoBytesLoaded() which returns the number of bytes loaded for the current video. It's now been replaced with getVideoLoadedFraction().
I have to extract all frames from video file and then save them to file.
I tried to use AVAssetImageGenerator, but it's very slow - it takes 1s - 3s per each frame ( sample 1280x720 MPEG4 video ) without saving to file process.
Is there anyway to make it much faster?
OpenGL, GPU, (...)?
I will be very grateful for showing me right direction.
AVAssetImageGenerator is a random access (seeking) interface, and seeking takes time, so one optimisation could be to use an AVAssetReader which will quickly and sequentially vend you frames. You can also choose to work in yuv format, which will give you smaller frames (and I think) faster decoding.
However, those raw frames are enormous: are 1280px * 720px * 4 bytes/pixel (if in RGBA), which is about 3.6MB each. You're going to need some pretty serious compression if you want to keep them all (MPEG4 # 720p comes to mind :).
So what are you trying to achieve?
Are you sure you want fill up your users' disks at a rate of 108MB/s (at 30fps) or 864MB/s (at 240fps)?
I've been looking at using AVAudioPlayer to play sounds. The problem here is the format- I have a bunch of WAV samples in a buffer but no WAV header at all. AVAudioPlayer doesn't seem to allow you to just set the format, it has to try to interpret it from the NSData buffer. Having to copy over the entire sound before it can start playing is not going to be a good experience for my users.
If I have a buffer of WAV samples (not a WAV file in memory), how can I play it back with AVAudioPlayer without crippling my performance with a gigantic copy?
Instead of copying the entire (gigantic) sound, just copy a WAV file header in front the first sample of the existing buffer. When initially obtaining your wave data into memory (file load, download, or synthesis, etc.), make sure there is at least 44 bytes of padding in front, so there's room to copy the header in. If it's a buffer snippet, save a bit of the data, copy the WAV header in, play the sound, and then restore the original 44 bytes.
Folks,
I am wondering if someone can explain to me what exactly is the output of video decoding. Let's say it is a H.264 stream in an MP4 container.
From displaying something on the screen, I guess decoder can provider two different types of output:
Point - (x, y) coordinate of the location and the (R, G, B) color for the pixel
Rectangle (x, y, w, h) units for the rectangle and the (R, G, B) color to display
There is also the issue of time stamp.
Can you please enlighten me or point me the right link on what is generated out of a decoder and how a video client can use this information to display something on screen?
I intend to download VideoLAN source and examine it but some explanation would be helpful.
Thank you in advance for your help.
Regards,
Peter
None of the above.
Usually the output will be a stream of bytes that contains just the color data. The X,Y location is implied by the dimensions of the video.
In other words, the first three bytes might encode the color value at (0, 0), the second three byte the value at (0, 1), and so on. Some formats might use four bytes groups, or even a number of bits that doesn't add up to one byte -- for example, if you use 5 bits for each color component and you have three color components, that's 15 bits per pixel. This might be padded to 16 bits (exactly two bytes) for efficiency, since that will align data in a way that CPUs can better process it.
When you've processed exactly as many values as the video is wide, you've reached the end of that row. When you've processed exactly as many rows as the video is high, you've reached the end of that frame.
As for the interpretation of those bytes, that depends on the color space used by the codec. Common color spaces are YUV, RGB, and HSL/HSV.
It depends strongly on the codec in use and what input format(s) it supports; the output format is usually restricted to the set of formats that are acceptable inputs.
Timestamp data is a bit more complex, since that can be encoded in the video stream itself, or in the container. At a minimum, the stream would need a framerate; from that, the time of each frame can be determined by counting how many frames have been decoded already. Other approaches, like the one taken by AVI, is to include a byte-offset for every Nth frame (or just the keyframes) at the end of the file to enable rapid seeking. (Otherwise, you would need to decode every frame up to the timestamp you're looking for in order to determine where in the file that frame is.)
And if you're considering audio data too, note that with most codecs and containers, the audio and video streams are independent and know nothing about each other. During encoding, the software that writes both streams into the container format does a process called muxing. It will write out the data in chunks of N seconds each, alternating between streams. This allows whoever is reading the stream to get N seconds of video, then N seconds of audio, then another N seconds of video, and so on. (More than one audio stream might be included too -- this technique is frequently used to mux together video, and English and Spanish audio tracks into a single file that contains three streams.) In fact, even subtitles can be muxed in with the other streams.
cdhowie got most of it.
When it comes to timestamps the MPEG4 container contains tables for each frame that tells the video client when to display each frame. You should look at the spec for MPEG4. You normally have to pay for this I think but it's definitely downloadable from places.
http://en.wikipedia.org/wiki/MPEG-4_Part_14
I'm using AVAssetExportSession to export some stuff at 640x480, and the files are kind of monstrous -- predictably monstrous, but still monstrous, given that we need to upload them from the phone over a 3G network. Is there any way to affect the size of the file other than to reduce the resolution? Ideally I'd like to try, e.g., compressing harder (even if that lowers quality), or cutting back to 15 frames/second, or something like that, but there don't seem to be any hooks to do it.
With AVExportSession you can only use presets. If AVAssetExportPresetMediumQuality and AVAssetExportPresetLowQuality don't work for you, you're better off using AVAssetReader and AVAssetWriter. AVAssetWriter supports bitrate setting and optionally you could skip frames when writing to get lower FPS.