How to decode a live555 rtsp stream (h.264) MediaSink data using iOS8's VideoToolbox? - ios

Ok, I know that this question is almost the same as get-rtsp-stream-from-live555-and-decode-with-avfoundation, but now VideoToolbox for iOS8 became public for use and although I know that it can be done using this framework, I have no idea of how to do this.
My goals are:
Connect with a WiFiCamera using rtsp protocol and receive stream data (Done with live555)
Decode the data and convert to UIImages to display on the screen (motionJPEG like)
And save the streamed data on a .mov file
I reached all this goals using ffmpeg, but unfortunately I can't use it due to my company's policy.
I know that I can display on the screen using openGL too, but this time I have to convert to UIImages. I also tried to use the libraries below:
ffmpeg: can't use this time due to company's policy. (don't ask me why)
libVLC: display lags about 2secs and I don't have access to stream data to save into a .mov file...
gstreamer: same as above
I believe that live555 + VideoToolbox will do the job, just can't figure out how to do this happen ...

I did it. VideoToolbox is still poor documented and we have no much information about video programming (without using ffmpeg) so it cost me more time than I really expected.
For stream using live555, I got the SPS and PPS info to create the CMVideoFormatDescription like this:
const uint8_t *props[] = {[spsData bytes], [ppsData bytes]};
size_t sizes[] = {[spsData length], [ppsData length]};
OSStatus result = CMVideoFormatDescriptionCreateFromH264ParameterSets(NULL, 2, props, sizes, 4, &videoFormat);
Now, the difficult part (because I'm noob on video programming): Replace the NALunit header with a 4 byte length code as described here
int headerEnd = 23; //where the real data starts
uint32_t hSize = (uint32_t)([rawData length] - headerEnd - 4);
uint32_t bigEndianSize = CFSwapInt32HostToBig(hSize);
NSMutableData *videoData = [NSMutableData dataWithBytes:&bigEndianSize length:sizeof(bigEndianSize)];
[videoData appendData:[rawData subdataWithRange:NSMakeRange(headerEnd + 4, [rawData length] - headerEnd - 4)]];
Now I was able to create a CMBlockBuffer successfully using this raw data and pass the buffer to VTDecompressionSessionDecodeFrame. From here is easy to convert the response CVImageBufferRef to UIImage... I used this stack overflow thread as reference.
And finally, save the stream data converted on UIImage following the explanation described on How do I export UIImage array as a movie?
I just posted a little bit of my code because I believe this is the important part, or in other words, it is where I was having problems.

Related

How to decode a H.264 frame on iOS by hardware decoding?

I have been used ffmpeg to decode every single frame that I received from my ip cam. The brief code looks like this:
-(void) decodeFrame:(unsigned char *)frameData frameSize:(int)frameSize{
AVFrame frame;
AVPicture picture;
AVPacket pkt;
AVCodecContext *context;
pkt.data = frameData;
pat.size = frameSize;
avcodec_get_frame_defaults(&frame);
avpicture_alloc(&picture, PIX_FMT_RGB24, targetWidth, targetHeight);
avcodec_decode_video2(&context, &frame, &got_picture, &pkt);
}
The code woks fine, but it's software decoding. I want to enhance the decoding performance by hardware decoding. After lots of research, I know it may be achieved by AVFoundation framework.
The AVAssetReader class may help, but I can't figure out what's the next.Could anyone points out the following steps for me? Any help would be appreciated.
iOS does not provide any public access directly to the hardware decode engine, because hardware is always used to decode H.264 video on iOS.
Therefore, session 513 gives you all the information you need to allow frame-by-frame decoding on iOS. In short, per that session:
Generate individual network abstraction layer units (NALUs) from your H.264 elementary stream. There is much information on how this is done online. VCL NALUs (IDR and non-IDR) contain your video data and are to be fed into the decoder.
Re-package those NALUs according to the "AVCC" format, removing NALU start codes and replacing them with a 4-byte NALU length header.
Create a CMVideoFormatDescriptionRef from your SPS and PPS NALUs via CMVideoFormatDescriptionCreateFromH264ParameterSets()
Package NALU frames as CMSampleBuffers per session 513.
Create a VTDecompressionSessionRef, and feed VTDecompressionSessionDecodeFrame() with the sample buffers
Alternatively, use AVSampleBufferDisplayLayer, whose -enqueueSampleBuffer: method obviates the need to create your own decoder.
Edit:
This link provide more detail explanation on how to decode h.264 step by step: stackoverflow.com/a/29525001/3156169
Original answer:
I watched the session 513 "Direct Access to Video Encoding and Decoding" in WWDC 2014 yesterday, and got the answer of my own question.
The speaker says:
We have Video Toolbox(in iOS 8). Video Toolbox has been there on
OS X for a while, but now it's finally populated with headers on
iOS.This provides direct access to encoders and decoders.
So, there is no way to do hardware decoding frame by frame in iOS 7, but it can be done in iOS 8.
Is there anyone figure out how to directly access to video encoding and decoding frame by frame in iOS 8?

CMSampleBufferRef pool to write H.264 AVCC stream

I'm using AVAssetWriter/AVAssetWriterInput to write H.264 raw data to an MP4 file. As I'm receiving the data from a remote server, I use the following CoreMedia APIs to get a sample buffer (CMSampleBufferRef) containing the H.264 data in AVCC format that is in turned appended to an MP4 file by sending to an AVAssetWriterInput the message (BOOL)appendSampleBuffer:(CMSampleBufferRef)sampleBuffer:
CMBlockBufferCreateWithMemoryBlock to create a memory block
CMBlockBufferReplaceDataBytes to write the H.264 in AVCC format to the memory block
CMSampleBufferCreate to create a sample buffer with the memory block and a format descriptor containing the H.264 "extradata"
Everything works as expected, the only problem with this approach is that I'm periodically calling the above APIs and what I would really like is instead to be able to reuse the resources allocated - in particular CMSampleBufferRef and CMBlockBufferRef. Basically, I would like to have a pool of CMSampleBuffer's and be able to update its memory content and format descriptor as I'm receiving new H.264 data from the remote server.
I know that exists AVAssetWriterInputPixelBufferAdaptorthat gives access to a CVPixelBufferPool but, I can't use it in my case because as far as I know, to properly instantiate a pixel buffer adaptor, at minimum I need to be able to pass the video frame dimensions which I would' know until I parse the stream. Further, I don't know how to write the H.264 "extradata" with a CVPixelBuffer. So, I'm thinking that I need to stick with CMSampleBuffer. Unfortunately, it seems that CoreMedia APIs don't offer the possibility to update the memory block nor the format descriptor of a sample buffer once created (as far as I can tell, I only have access to immutable references of those objects). Thus, the best I can do so far is to reuse the memory block CMBlockBufferRef but I'm still recreating the sample buffer. My code is below. Hopefully someone here will have some ideas on how to implement a pool of CMSampleBuffer's or perhaps a more efficient way to write H.264 AVCC stream to MP4?
- (CMSampleBufferRef)sampleBufferWithData:(NSData*)data formatDescriptor:(CMFormatDescriptionRef)formatDescription
{
OSStatus result;
CMSampleBufferRef sampleBuffer = NULL;
// _blockBuffer is a CMBlockBufferRef instance variable
if (!_blockBuffer)
{
size_t blockLength = MAX_LENGTH;
result = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault,
NULL,
blockLength,
kCFAllocatorDefault,
NULL,
0,
blockLength,
kCMBlockBufferAssureMemoryNowFlag,
&_blockBuffer);
// check error
}
result = CMBlockBufferReplaceDataBytes([data bytes], _blockBuffer, 0, [data length]);
// check error
const size_t sampleSizes = [data length];
CMSampleTimingInfo timing = [self sampleTimingInfo];
result = CMSampleBufferCreate(kCFAllocatorDefault,
_blockBuffer,
YES,
NULL,
NULL,
formatDescription,
1,
1,
&timing,
1,
&sampleSizes,
&sampleBuffer);
// check error
return sampleBuffer;
}
If you are receiving raw H.264 data, then there is not much do to and no need to deal with CoreMedia at all.
Buffer all VCL NAL units until you get SPS/PPS NAL units. Create the extradata from them, then just append all buffered and new VCL NAL units to the file. In case you are received the NAL units in Annex B format you need to convert them to AVCC format (basically replacing the start code with a length code)
You only need to work with 'CMSampleBuffer' if you want to decode uncompressed pictures or if you want to decode compressed pictures. As you are already working with a raw H.264 stream and just want to write it into an MP4 file, just do so. No need to touch CoreMedia at all here.
Regarding CoreMedia: you wrap your video information in a CMBlockBuffer. This buffers together with a CMVideoFormatDescriptor (generated from SPS/PPS) plus CMTime make up a CMSampleBuffer. And multiple CMSampleBuffers make up a 'CMSampleBufferPool'.
'CVPixelBuffer' and 'CVPixelBufferPool' are not involved. These are either the input or output of a 'VTCompressionSession' or "VTDecompressionSession' when dealing with encoding/decoding h.264 video.
As said in your case, no need to touch any of the core framworks at all as you are just creating a file.
An overview about Annex B and AVCC stream format can be found here: Possible Locations for Sequence/Picture Parameter Set(s) for H.264 Stream

Extract OpenGL raw RGB(A) texture data from png data stored in NSData using libpng on iOS

Unfortunately, there appears to be no way to using a built-in method on iOS to extract 32 bit RGBA data from a PNG file without losing the alpha channel reference. Therefore, some people have been using libpng to extract their OpenGL textures. However, all the examples have required the png file to be loaded from a file. Assuming these textures are imported over a network connection, they would have to be saved to files from NSData and then read. What is the best way to extract raw PNG data into raw OpenGL RGBA texture data?
Ended up writing a category which solves this problem using the customization capabilities of libpng. Posted a gist here: https://gist.github.com/joshcodes/5681512
Hopefully this helps someone else who needs to know how this is done. The essential part is creating a method
void user_read_data(png_structp png_ptr, png_bytep data, png_size_t length)
{
void *nsDataPtr = png_get_io_ptr(png_ptr);
ReadStream *readStream = (ReadStream*)nsDataPtr;
memcpy(data, readStream->source + readStream->index, length);
readStream->index += length;
}
and using
// init png reading
png_set_read_fn(png_ptr, &readStream, user_read_data);
as a custom read method.

difference between how AVAssetReader and AudioFileReadPackets reads Audio

consider these two scenarios for reading/writing data from Audio files (for the purpose of sending over a network):
Scenario 1: Audio File Services:
Using AudioFileReadPackets from Audio File Services. This generates audio packets that you can easily send over the network. On the receiving side you use AudioFileStreamOpen and AudioFileStreamParseBytes to parse the data.
AudioFileStreamParseBytes then has two callback functions: AudioFileStream_PropertyListenerProc and AudioFileStream_PacketsProc. These guys are called when a new property is discovered in the stream and when packets are received from the stream, respectively. Once you receive the packets, you can feed it to an audio queue using Audio Queue Service which plays the file just fine.
Note: This method does NOT work with music files stored in the iPod library, which brings us to the 2nd scenario:
Scenario 2: AVAssetReader:
With AVAssetReader you can read from the iPod music library and send packets over the network. Typically you would load the packets directly on an Audio Queue similar to above. However, in this scenario you will have to create a thread to ensure that you block receiving packets when the queue is full, and unblock when queue buffers are available (see this example).
Question:
Is it possible to use AVAssetReader to send packets over, only to have it read by AudioFileStreamParseBytes? (the motive would be that the AudioFileStreamParseBytes's callbacks will handle the threading/blocking business and save you that pain). I tried doing it like so:
1. first read the audio file using AVAssetReader
//NSURL *assetURL = [NSURL URLWithString:#"ipod-library://item/item.m4a?id=1053020204400037178"];
AVURLAsset *songAsset = [AVURLAsset URLAssetWithURL:assetURL options:nil];
NSError * error = nil;
AVAssetReader* reader = [[AVAssetReader alloc] initWithAsset:songAsset error:&error];
AVAssetTrack* track = [songAsset.tracks objectAtIndex:0];
// Note: I don't supply an audio format description here, rather I pass on nil to keep the original
// file format. In another piece of code (see here: http://stackoverflow.com/questions/12264799/why-is-audio-coming-up-garbled-when-using-avassetreader-with-audio-queue?answertab=active#tab-top) I can extract the audio format from the track, let's say it's an AAC format.
AVAssetReaderTrackOutput* readerOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:track
outputSettings:nil];
[reader addOutput:readerOutput];
[reader startReading];
2. set up the streamer
// notice how i manually add the audio file type (for the file hint parameter)
// using the info from step one.. If i leave it as 0, this call fails and returns
// the typ? error, which is :The specified file type is not supported.
streamer->err = AudioFileStreamOpen((__bridge void*)streamer,
ASPropertyListenerProc, ASPacketsProc,
kAudioFileAAC_ADTSType, &(streamer->audioFileStream));
3. once I receive the data, I parse the bytes:
streamer->err = AudioFileStreamParseBytes(streamer->audioFileStream, inDataByteSize, inData, 0);
problem: When I do it this way.. I send the bytes and the AudioFileStreamParseBytes does not fail. However, the callbacks *AudioFileStream_PropertyListenerProc* and *AudioFileStream_PacketsProc* are never called. Which makes me think that the parser has failed to parse the bytes and extract any useful information out of them.. in the documentation for AudioStreamParseBytes it states:* You should provide at least more than a single packet’s worth of audio file data, but it is better to provide a few packets to a few seconds data at a time.* I'm sending over 900 bytes, which is just below GKSession's data limit. I'm pretty sure 900 bytes is enough (when testing this under scenario 1, the total bytes was 417 each time and it worked fine).
Any ideas?
The short answer is that it simply doesn't make sense to have packets of audio data be parsed by AudioFileStreamParseBytes.. in the docs AudioFileStreamParseBytes is a function dependent on the existence of an audio file (thus the parameter inAudioFileStream.. which is defined as the ID of the parser to which you wish to pass data. The parser ID is returned by the AudioFileStreamOpen function.)
so lesson learned: don't try to pigeon hole iOS functions to fit your situation.. it should be the other way around.
What I ended up doing was feeding the data directly to an Audio Queue.. without going through all these unnecessary intermediary functions.. a more in depth way would be feeding the data to audio units.. but my application didn't need that level of control

iOS audio manipulation - play local .caf file backwards

I'm wanting to load a local .caf audio file and reverse the audio (play it backwards). I've gathered that I basically need to flip an array of buffer data from posts like this
However, I'm not sure how to access this buffer data from a given audio file. I have a little experience playing sounds back with AVaudioPlayer and ObjectAL(an obj-c openAL library), but I don't know how to access something lower level like this buffer data array.
Could I please get an example of how I would go about getting access to that array?
Your problem reduces to the same problem described here, which was linked by P-i in the comment under your question. Kiran answered that question and re-posted his answer for you here. Kiran's answer is accurate, but you may need a few more details to be able to decide how to proceed because you're starting with a CAF file.
The simplest audio file format, linear pulse-code modulation (LPCM), is the easiest to read byte-for-byte or sample-for-sample. This means it's the easiest to reverse. Kiran's solution does just that.
The CAF format is a container/wrapper format, however. While your CAF file could contain a WAV file, it could also contain a compressed file format that cannot be manipulated in the same fashion.
You should consider first converting the CAF file to WAV, then reversing it as shown in the other solution. There are various libraries that will do this conversion for you, but a good place to start might be with the AudioToolbox framework, which includes Audio Converter Services. Alternately, if you can use the WAV file format from the start, you can prevent the need to convert to WAV.
You may need to know more if you find Kiran's sample code gives you an error (Core Audio is finicky). A great place to start is with the Core Audio 'Bible', "Learning Core Audio", written by Chris Adamson and Kevin Avila. That book builds your knowledge of digital sound using great samples. You should also check out anything written by Michael Tyson, who started the Amazing Audio Engine project on github, and wrote AudioBus.
I have worked on a sample app, which records what user says and plays them backwards. I have used CoreAudio to achieve this. Link to app code.
As each sample is 16-bits in size(2 bytes)(mono channel). You can load each sample at a time by copying it into a different buffer by starting at the end of the recording and reading backwards. When you get to the start of the data you have reversed the data and playing will be reversed.
// set up output file
AudioFileID outputAudioFile;
AudioStreamBasicDescription myPCMFormat;
myPCMFormat.mSampleRate = 16000.00;
myPCMFormat.mFormatID = kAudioFormatLinearPCM ;
myPCMFormat.mFormatFlags = kAudioFormatFlagsCanonical;
myPCMFormat.mChannelsPerFrame = 1;
myPCMFormat.mFramesPerPacket = 1;
myPCMFormat.mBitsPerChannel = 16;
myPCMFormat.mBytesPerPacket = 2;
myPCMFormat.mBytesPerFrame = 2;
AudioFileCreateWithURL((__bridge CFURLRef)self.flippedAudioUrl,
kAudioFileCAFType,
&myPCMFormat,
kAudioFileFlags_EraseFile,
&outputAudioFile);
// set up input file
AudioFileID inputAudioFile;
OSStatus theErr = noErr;
UInt64 fileDataSize = 0;
AudioStreamBasicDescription theFileFormat;
UInt32 thePropertySize = sizeof(theFileFormat);
theErr = AudioFileOpenURL((__bridge CFURLRef)self.recordedAudioUrl,kAudioFileReadPermission, 0, &inputAudioFile);
thePropertySize = sizeof(fileDataSize);
theErr = AudioFileGetProperty(inputAudioFile, kAudioFilePropertyAudioDataByteCount, &thePropertySize, &fileDataSize);
UInt32 dataSize = fileDataSize;
void* theData = malloc(dataSize);
//Read data into buffer
UInt32 readPoint = dataSize;
UInt32 writePoint = 0;
while( readPoint > 0 )
{
UInt32 bytesToRead = 2;
AudioFileReadBytes( inputAudioFile, false, readPoint, &bytesToRead, theData );
AudioFileWriteBytes( outputAudioFile, false, writePoint, &bytesToRead, theData );
writePoint += 2;
readPoint -= 2;
}
free(theData);
AudioFileClose(inputAudioFile);
AudioFileClose(outputAudioFile);
I think this sample code could help you.
Mixer Host Sample Code
It will load two caf files from the bundle and play it. It contains a function call readAudioFilesIntoMemory which is loading a caf file to a data array as you said.
The whole program is an example of core audio, I hope it can help you :)
Why not let CoreAudio's AudioConverter do it for you? See this post about "Getting PCM from MP3/AAC/ALAC File", and Apple's Core Audio Essentials
you can use libsox for iphone framework to apply audio effects easily.
it includes a sample project that shows how to do it.
libsox ios

Resources