H.264 Decoder not working properly? - ios

I've gone over the code for this decoder for elementary h.264 bitstreams a hundred times, tweaking things along the way, with no luck. When I send the output CMSampleBuffers to an AVSampleBufferDisplayLayer, they don't appear, presumably because there's something wrong with how I'm decoding them.
I get no error messages anywhere; the AVSampleBufferDisplayLayer has no error and "status" is "1" (aka .rendering), CMSampleBufferIsValid() returns "true" on the outputted CMSampleBuffers, and I encounter no errors in my decoder either.
I'm stumped and my hope is that a more experienced developer can catch something that I'm missing.
I input raw bytes here (typealias FrameData = [UInt8])
func interpretRawFrameData(_ frameData: inout FrameData) -> CMSampleBuffer? {
let size = UInt32(frameData.count)
var naluType = frameData[4] & 0x1F
var frame: CMSampleBuffer?
// The start indices for nested packets. Default to 0.
var ppsStartIndex = 0
var frameStartIndex = 0
switch naluType {
// SPS
case 7:
print("===== NALU type SPS")
for i in 4..<40 {
if frameData[i] == 0 && frameData[i+1] == 0 && frameData[i+2] == 0 && frameData[i+3] == 1 {
ppsStartIndex = i
spsSize = i - 4 // Does not include the size of the header
sps = Array(frameData[4..<i])
// Set naluType to the nested PPS packet's NALU type
naluType = frameData[i + 4] & 0x1F
break
}
}
// If nested frame was found, fallthrough
if ppsStartIndex != 0 { fallthrough }
// PPS
case 8:
print("===== NALU type PPS")
for i in ppsStartIndex+4..<ppsStartIndex+34 {
if frameData[i] == 0 && frameData[i+1] == 0 && frameData[i+2] == 0 && frameData[i+3] == 1 {
frameStartIndex = i
ppsSize = i - spsSize - 8 // Does not include the size of the header. Subtract 8 to account for both the SPS and PPS headers
pps = Array(frameData[ppsStartIndex+4..<i])
// Set naluType to the nested packet's NALU type
naluType = frameData[i+4] & 0x1F
break
}
}
// If nested frame was found, fallthrough
if frameStartIndex != 0 { fallthrough }
// IDR frame
case 5:
print("===== NALU type IDR frame");
// Replace start code with size
let adjustedSize = size - UInt32(ppsSize) - UInt32(spsSize) - 8
var blockSize = CFSwapInt32HostToBig(adjustedSize)
memcpy(&frameData[frameStartIndex], &blockSize, 4)
if createFormatDescription() {
frame = decodeFrameData(Array(frameData[frameStartIndex...]))
}
// B/P frame
default:
print("===== NALU type B/P frame");
// Replace start code with size
var blockSize = CFSwapInt32HostToBig(size)
memcpy(&frameData[frameStartIndex], &blockSize, 4)
frame = decodeFrameData(Array(frameData[frameStartIndex...]))
break;
}
return frame != nil ? frame : nil
}
And this is where the actual decoding happens:
private func decodeFrameData(_ frameData: FrameData) -> CMSampleBuffer? {
let bufferPointer = UnsafeMutablePointer<UInt8>(mutating: frameData)
let size = frameData.count
var blockBuffer: CMBlockBuffer?
var status = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault,
bufferPointer,
size,
kCFAllocatorNull,
nil, 0, frameData.count,
0, &blockBuffer)
if status != kCMBlockBufferNoErr { return nil }
var sampleBuffer: CMSampleBuffer?
let sampleSizeArray = [size]
status = CMSampleBufferCreateReady(kCFAllocatorDefault,
blockBuffer,
formatDesc,
1, 0, &sampleTimingInfo,
1, sampleSizeArray,
&sampleBuffer)
if let buffer = sampleBuffer, status == kCMBlockBufferNoErr {
let attachments: CFArray? = CMSampleBufferGetSampleAttachmentsArray(buffer, true)
if let attachmentArray = attachments {
let dic = unsafeBitCast(CFArrayGetValueAtIndex(attachmentArray, 0), to: CFMutableDictionary.self)
let key = Unmanaged.passUnretained(kCMSampleAttachmentKey_DisplayImmediately).toOpaque()
let val = Unmanaged.passUnretained(kCFBooleanTrue).toOpaque()
CFDictionarySetValue(dic,
key,
val)
}
print("===== Successfully created sample buffer")
return buffer
}
return nil
}
Other things to note:
The formatDescription contains the correct information (mediaType = "vide", mediaSubType = "avc1", dimensions = "640x480")
The bitstream I'm decoding always groups the SPS, PPS, and IDR frames together and sends them as one big packet every 20 or so frames. Every other time, an individual B/P frame is sent.
Thanks!

That code was pretty ugly so I decided to touch it up a little bit. Turned out that did the trick. Something must have been wrong in there.
Anyways, here's a working version. It sends the decoded & decompressed frame to its delegate. Ideally, whoever calls interpretRawFrameData would be returned a displayable frame, and I'll work on that, but in the meantime this works too.
https://github.com/philipshen/H264Decoder

Related

Thread 1: Fatal error: Index out of range when index is less then array count

I am getting error Thread 1: Fatal error: Index out of range on
func ReaderConverterCallback(_ converter: AudioConverterRef,
_ packetCount: UnsafeMutablePointer<UInt32>,
_ ioData: UnsafeMutablePointer<AudioBufferList>,
_ outPacketDescriptions: UnsafeMutablePointer<UnsafeMutablePointer<AudioStreamPacketDescription>?>?,
_ context: UnsafeMutableRawPointer?) -> OSStatus {
let reader = Unmanaged<Reader>.fromOpaque(context!).takeUnretainedValue()
//
// Make sure we have a valid source format so we know the data format of the parser's audio packets
//
guard let sourceFormat = reader.parser.dataFormat else {
return ReaderMissingSourceFormatError
}
//
// Check if we've reached the end of the packets. We have two scenarios:
// 1. We've reached the end of the packet data and the file has been completely parsed
// 2. We've reached the end of the data we currently have downloaded, but not the file
//
let packetIndex = Int(reader.currentPacket)
let packets = reader.parser.packets
let isEndOfData = packetIndex >= packets.count - 1
if isEndOfData {
if reader.parser.isParsingComplete {
packetCount.pointee = 0
return ReaderReachedEndOfDataError
} else {
return ReaderNotEnoughDataError
}
}
//
// Copy data over (note we've only processing a single packet of data at a time)
//
let packet = packets[packetIndex] <--------- Thread 1: Fatal error: Index out of range on
var data = packet.0
let dataCount = data.count
ioData.pointee.mNumberBuffers = 1
ioData.pointee.mBuffers.mData = UnsafeMutableRawPointer.allocate(byteCount: dataCount, alignment: 0)
_ = data.withUnsafeMutableBytes { (bytes: UnsafeMutablePointer<UInt8>) in
memcpy((ioData.pointee.mBuffers.mData?.assumingMemoryBound(to: UInt8.self))!, bytes, dataCount)
}
ioData.pointee.mBuffers.mDataByteSize = UInt32(dataCount)
//
// Handle packet descriptions for compressed formats (MP3, AAC, etc)
//
let sourceFormatDescription = sourceFormat.streamDescription.pointee
if sourceFormatDescription.mFormatID != kAudioFormatLinearPCM {
if outPacketDescriptions?.pointee == nil {
outPacketDescriptions?.pointee = UnsafeMutablePointer<AudioStreamPacketDescription>.allocate(capacity: 1)
}
outPacketDescriptions?.pointee?.pointee.mDataByteSize = UInt32(dataCount)
outPacketDescriptions?.pointee?.pointee.mStartOffset = 0
outPacketDescriptions?.pointee?.pointee.mVariableFramesInPacket = 0
}
packetCount.pointee = 1
reader.currentPacket = reader.currentPacket + 1
return noErr;
}
even if there is packetIndex is less then packets.count.
Note: Please compare both question before marking it duplicate. Reference possible duplicate doesn't show that index of array is less than array count.
I am using this https://github.com/syedhali/AudioStreamer/ library for playing audio from url.
It looks like a Multi-Thread problem. According to the printed logs, the index seems ok, but another thread may change the data 'packets', that causes the crash. Please consider locking data-related operations between threads.
Additional analyzation: according to the following lines, packets may be shared between threads.
let reader = Unmanaged<Reader>.fromOpaque(context!).takeUnretainedValue()
//......
let packets = reader.parser.packets
Suggestion: check if somewhere the Unmanaged<Reader> change the parser.packets, and make a lock strategy.

Integers Larger than Int64

I'm attempting to get a user input number and find the sum of all the digits. I'm having issues with larger numbers, however, as they won't register under an Int64. Any idea as to what structures I could use to store the value? (I tried UInt64 and that didn't work very well with negatives, however, I'd prefer something larger than UInt64, anyways. I'm having a hard time implementing a UInt128 from Is there a number type with bigger capacity than u_long/UInt64 in Swift?)
import Foundation
func getInteger() -> Int64 {
var value:Int64 = 0
while true {
//we aren't doing anything with input, so we make it a constant
let input = readLine()
//ensure its not nil
if let unwrappedInput = input {
if let unwrappedInt = Int64(unwrappedInput) {
value = unwrappedInt
break
}
}
else { print("You entered a nil. Try again:") }
}
return value
}
print("Please enter an integer")
// Gets user input
var input = getInteger()
var arr = [Int] ()
var sum = 0
var negative = false
// If input is less than 0, makes it positive
if input < 0 {
input = (input * -1)
negative = true
}
if (input < 10) && (input >= 1) && (negative == true) {
var remain = (-1)*(input%10)
arr.append(Int(remain))
input = (input/10)
}
else {
var remain = (input%10)
arr.append(Int(remain))
input = (input/10)
}
}
// Adds numbers in array to find sum of digits
var i:Int = 0
var size:Int = (arr.count - 1)
while i<=size {
sum = sum + arr[i]
i = (i+1)
}
// Prints sum
print("\(sum)")
You can use a string to perform the operation you describe. Loop through each character and convert it to an integer and add to the sum. Be careful to handle errors.

what samples will buffer contain when I use installTap(onBus for multichannel audio?

what samples will buffer contain when I use installTap(onBus for multichannel audio?
If channel count > 1 then it will contain left microphone ? or it will contain samples from left microphone and right microphone?
when I use iphone simulator then Format = pcmFormatFloat32, channelCount = 2, sampleRate = 44100.0, not Interleaved
I use this code
let bus = 0
inputNode.installTap(onBus: bus, bufferSize: myTapOnBusBufferSize, format: theAudioFormat) {
(buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
self.onNewBuffer(buffer)
}
func onNewBuffer(_ inputBuffer:AVAudioPCMBuffer!)
{
var samplesAsDoubles:[Double] = []
for i in 0 ..< Int(inputBuffer.frameLength)
{
let theSample = Double((inputBuffer.floatChannelData?.pointee[i])!)
samplesAsDoubles.append( theSample )
}
}
print("number of input busses = \(inputNode.numberOfInputs)")
it print
number of input buses = 1
for each sample in my samplesAsDoubles array from buffer that I have inside of block from what channel it will be? at what time this sample was recorded?
From the header comments for floatChannelData:
The returned pointer is to format.channelCount pointers to float. Each of these pointers
is to "frameLength" valid samples, which are spaced by "stride" samples.
If format.interleaved is false (as with the standard deinterleaved float format), then
the pointers will be to separate chunks of memory. "stride" is 1.
FloatChannelData gives you a 2D float array. For a non-interleaved 2 channel buffer, you would access the individual samples like this:
let channelCount = Int(buffer.format.channelCount)
let frameCount = Int(buffer.frameLength)
if let channels = buffer.floatChannelData { //channels is 2D float array
for channelIndex in 0..<channelCount {
let channel = channels[channelIndex] //1D float array
print(channelIndex == 0 ? "left" : "right")
for frameIndex in 0..<frameCount {
let sample = channel[frameIndex]
print(" \(sample)")
}
}
}

Size of the NSInputStream buffer

I'm trying to use NSInputStream for receiving data using TCP socket connection. On the server side I send data size before sending of the data itself. on the iOS client side I need to extract first 4 bytes from the NSInputStream, because I need to check if size of data has received completely, but I have a problem with it:
...
case NSStreamEvent.HasBytesAvailable:
if ( aStream == inputstream){
while (inputstream.hasBytesAvailable){
var readBufferRef = UnsafeMutablePointer<UnsafeMutablePointer<UInt8>>()
var readBufferLengthRef = 0
let readBufferIsAvailable = inputstream.getBuffer(readBufferRef, length: &readBufferLengthRef)
...
}
}
break
After receiving of data readBufferLengthRef always equals to 0.
How it can be?
And how can I get size of the NSInputStream buffer?
UPD:
Code:
case NSStreamEvent.HasBytesAvailable:
NSLog("HasBytesAvaible")
var buffer = [UInt8](count: 1024, repeatedValue: 0)
if ( aStream == inputstream){
while (inputstream.hasBytesAvailable){
var readBufferRef: UnsafeMutablePointer<UInt8> = nil
var readBufferLengthRef = 0
let readBufferIsAvailable = inputstream.getBuffer(&readBufferRef, length: &readBufferLengthRef)
//debugger: readBufferLengthRef = (int)0
}
}
break
In your code, readBufferRef is defined as a "pointer to a pointer"
but never allocated, and therefore it is the NULL pointer.
What you should do is to pass the address of an
UnsafeMutablePointer<UInt8> as an inout argument to the function
(assuming Swift 2):
var readBufferRef: UnsafeMutablePointer<UInt8> = nil
var readBufferLengthRef = 0
let readBufferIsAvailable = inputStream.getBuffer(&readBufferRef, length: &readBufferLengthRef)
On return, readBufferRef is set to the read buffer of the stream (valid until the next read operation), and readBufferLengthRef contains
the number of available bytes.

ExtAudioFile into a float buffer produces zeros

I have this code in Swift 3 and my output is 0.0 zeros most of the time and rarely I see very small numbers to the e^-50
The fileURL is a recording.caf with sound in it.
Does anyone know what's up?
func readBuff(_ fileURL:CFURL) {
var fileRef:ExtAudioFileRef? = nil
let openStatus = ExtAudioFileOpenURL(fileURL , &fileRef)
guard openStatus == noErr else {
print("Failed to open audio file '\(fileURL)' with error \(openStatus)")
return
}
var audioFormat2 = AudioStreamBasicDescription()
audioFormat2.mSampleRate = 44100; // GIVE YOUR SAMPLING RATE
audioFormat2.mFormatID = kAudioFormatLinearPCM;
audioFormat2.mFormatFlags = kLinearPCMFormatFlagIsFloat;
audioFormat2.mBitsPerChannel = UInt32(MemoryLayout<Float32>.size) * 8
audioFormat2.mChannelsPerFrame = 1; // Mono
audioFormat2.mBytesPerFrame = audioFormat2.mChannelsPerFrame * UInt32(MemoryLayout<Float32>.size); // == sizeof(Float32)
audioFormat2.mFramesPerPacket = 1;
audioFormat2.mBytesPerPacket = audioFormat2.mFramesPerPacket * audioFormat2.mBytesPerFrame; // = sizeof(Float32)
//apply audioFormat2 to the extended audio file
ExtAudioFileSetProperty(fileRef!, kExtAudioFileProperty_ClientDataFormat,UInt32(MemoryLayout<AudioStreamBasicDescription>.size),&audioFormat2)
let numSamples = 1024 //How many samples to read in at a startTime
let sizePerPacket:UInt32 = audioFormat2.mBytesPerPacket // sizeof(Float32) = 32 byts
let packetsPerBuffer:UInt32 = UInt32(numSamples)
let outputBufferSize:UInt32 = packetsPerBuffer * sizePerPacket //4096
//so the 1 value of outputbuffer is a the memory location where we have reserved space
let outputbuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: MemoryLayout<UInt8>.size * Int(outputBufferSize))
var convertedData = AudioBufferList()
convertedData.mNumberBuffers = 1 //set this for Mono
convertedData.mBuffers.mNumberChannels = audioFormat2.mChannelsPerFrame // also = 1
convertedData.mBuffers.mDataByteSize = outputBufferSize
convertedData.mBuffers.mData = UnsafeMutableRawPointer(outputbuffer)
var frameCount:UInt32 = UInt32(numSamples)
while (frameCount > 0) {
Utility.check(ExtAudioFileRead(fileRef!,
&frameCount,
&convertedData),
operation: "Couldn't read from input file")
if frameCount == 0 {
Swift.print("done reading from file")
return
}
var arrayFloats:[Float] = []
let ptr = convertedData.mBuffers.mData?.assumingMemoryBound(to: Float.self)
var j = 0
var floatDataArray:[Double] = [882000]// SPECIFY YOUR DATA LIMIT MINE WAS 882000 , SHOULD BE EQUAL TO OR MORE THAN DATA LIMIT
if(frameCount > 0){
var audioBuffer:AudioBuffer = convertedData.mBuffers
let floatArr = UnsafeBufferPointer(start: audioBuffer.mData?.assumingMemoryBound(to: Float.self), count: 882000)
for i in 0...1024{
//floatDataArray[j] = Double(floatArr[i]) //put your data into float array
// print("\(floatDataArray[j])")
floatDataArray.append(Double(floatArr[i]))
print(Float((ptr?[i])!))
j += 1
}
// print(floatDataArray)
}
}
}
I'm reading from
guard let fileURL = CFURLCreateWithFileSystemPath(kCFAllocatorDefault, "./output.caf" as CFString!, .cfurlposixPathStyle, false) else {
// unable to create file
exit(-1)
}
steps after recording:
Swift.print("Recording, press <return> to stop:\n")
// wait for a key to be pressed
getchar()
// end recording
Swift.print("* recording done *\n")
recorder.running = false
// stop the Queue
Utility.check(AudioQueueStop(queue!, true),
operation: "AudioQueueStop failed")
// a codec may update its magic cookie at the end of an encoding session
// so reapply it to the file now
Utility.applyEncoderCookie(fromQueue: queue!, toFile: recorder.recordFile!)
// cleanup
AudioQueueDispose(queue!, true)
AudioFileClose(recorder.recordFile!)
readBuff(fileURL)
You're setting up your ExtAudioFile and its client format, but you're not actually reading from it (with ExtAudioFileRead), so your "output" is actually uninitialised, and in your case, very small.

Resources