I am trying to make video calling functional with pjsip. I am using vialerSipLib demo app for this.
Here is the scenario i am trying.
Calling from phoneA to phoneB. Audio is working for both incoming and outgoing calls. But the problem is, video is working on phoneB(ReceiverSide) but I'm unable to get the video on phoneA(CallerSide). Here is my account configuration for video call.
I am using codec H264 for video calling.
acc_cfg.vid_in_auto_show = PJ_TRUE;
acc_cfg.vid_out_auto_transmit = PJ_TRUE;
acc_cfg.vid_cap_dev = PJMEDIA_VID_DEFAULT_CAPTURE_DEV;
acc_cfg.vid_rend_dev = PJMEDIA_VID_DEFAULT_RENDER_DEV;
acc_cfg.reg_retry_interval = 300;
acc_cfg.reg_first_retry_interval = 30;
Here is how i am getting the video window using callid. There is a black window appeared. Or is there any way to check if the ci.media array has valid video?
- (void) displayWindowWithVoid: (UIView *) parent call:(VSLCall *)call {
int vid_idx;
pjsua_vid_win_id wid;
vid_idx = pjsua_call_get_vid_stream_idx((int)call.callId);
if (vid_idx >= 0) {
pjsua_call_info ci;
pjsua_call_get_info((int)call.callId, &ci);
wid = ci.media[vid_idx].stream.vid.win_in;
ci.setting.vid_cnt = 1;
pjsua_vid_win_info wi;
if (pjsua_vid_win_get_info(wid, &wi) == PJ_SUCCESS) {
pjsua_vid_win_set_show(wid, true);
UIView *view = (__bridge UIView *)wi.hwnd.info.ios.window;
[parent addSubview:view];
} }
If any one can tell me a client for video calling so i can test the behaviour. Either its the issue on app side or server side. Any help or suggestion will be highly appreciated.
I got it working. Posting this answer as this might be helpful for any other and can save much of his time.
You must be changing your video formats
pjmedia_vid_codec_param param;
pjsua_vid_codec_get_param(&videoCodecInfo[i].codec_id, ¶m);
param.ignore_fmtp = PJ_TRUE;
param.enc_fmt.det.vid.size.w = 1280;
param.enc_fmt.det.vid.size.h = 720;
param.enc_fmt.det.vid.fps.num = 30;
param.enc_fmt.det.vid.fps.denum = 1;
param.dec_fmt.det.vid.size.w = 1280;
param.dec_fmt.det.vid.size.h = 720;
Might be what exactly format your server support may be helping you.
If you need any more help. Please let me know
I need to detect number of channels and the format of audio (interleaved or non-interleaved) from AVAssetTrack. I tried the following code to detect the number of channels. As can be seen in the code, there are two ways to detect number of channels. I want to know which one is more reliable and correct, or none of them perhaps (irrespective of audio format)?
if let formatDescriptions = track.formatDescriptions as? [CMAudioFormatDescription],
let audioFormatDesc = formatDescriptions.first,
let asbd = CMAudioFormatDescriptionGetStreamBasicDescription(audioFormatDesc)
//First way to detect number of channels
numChannels = asbd.pointee.mChannelsPerFrame
var aclSize:size_t = 0
var currentChannelLayout:UnsafePointer<AudioChannelLayout>? = nil
currentChannelLayout = CMAudioFormatDescriptionGetChannelLayout(audioFormatDesc, sizeOut: &aclSize)
if let currentChannelLayout = currentChannelLayout, aclSize > 0 {
let channelLayout = currentChannelLayout.pointee
//second way of detecting number of channels
numChannels = AudioChannelLayoutTag_GetNumberOfChannels(channelLayout.mChannelLayoutTag)
And I don't know how to get audio format details (interleaved or non-interleaved). Looking for help in this.
Use the AudioStreamBasicDescription. All audio CMFormats have one, while the AudioChannelLayout is optional:
AudioChannelLayouts are optional; this API returns NULL if one doesn’t exist.
I am working on an application that plays back video and allows the user to scrub forwards and backwards in the video. The scrubbing has to happen smoothly, so we always re-write the video with SDAVAssetExportSession with the video compression property AVVideoMaxKeyFrameIntervalKey:#1 so that each frame will be a keyframe and allow smooth reverse scrubbing. This works great and provides smooth playback. The application uses video from a variety of sources and can be recorded on android or iOS devices and even downloaded from the web and added to the application, so we end up with quite different encodings, some of which are already suited for scrubbing (each frame is a keyframe). Is there a way to detect the keyframe interval of a video file so I can avoid needless video processing? I have been through much of AVFoundation's docs and don't see an obvious way to get this information. Thanks for any help on this.
If you can quickly parse the file without decoding the images by creating an AVAssetReaderTrackOutput with nil outputSettings. The frame sample buffers you encounter have an attachment array containing a dictionary with useful information, include whether the frame depends on other frames, or whether other frames depend on it. I would interpret that former as indicating a keyframe, although it gives me some low number (4% keyframes in one file?). Anyway, the code:
let asset = AVAsset(url: inputUrl)
let reader = try! AVAssetReader(asset: asset)
let videoTrack = asset.tracks(withMediaType: AVMediaTypeVideo)[0]
let trackReaderOutput = AVAssetReaderTrackOutput(track: videoTrack, outputSettings: nil)
var numFrames = 0
var keyFrames = 0
while true {
if let sampleBuffer = trackReaderOutput.copyNextSampleBuffer() {
// NB: not every sample buffer corresponds to a frame!
if CMSampleBufferGetNumSamples(sampleBuffer) > 0 {
numFrames += 1
if let attachmentArray = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, false) as? NSArray {
let attachment = attachmentArray[0] as! NSDictionary
// print("attach on frame \(frame): \(attachment)")
if let depends = attachment[kCMSampleAttachmentKey_DependsOnOthers] as? NSNumber {
if !depends.boolValue {
keyFrames += 1
} else {
print("\(keyFrames) on \(numFrames)")
N.B. This only works for local file assets.
p.s. you don't say how you're scrubbing or playing. An AVPlayerViewController and an AVPlayer?
Here is the Objective C version of the same answer. After implementing this and using it, Videos that should have all keyframes are returning about 96% keyframes from this code. I'm not sure why, so I am using that number as a determining factor even though I would like it to be more accurate. I am also only looking through the first 600 frames or the end of the video (whichever comes first) since I don't need to read through a whole 20 minute video to make this determination.
+ (BOOL)videoNeedsProcessingForSlomo:(NSURL*)fileUrl {
BOOL needsProcessing = YES;
AVAsset* anAsset = [AVAsset assetWithURL:fileUrl];
NSError *error;
AVAssetReader *assetReader = [AVAssetReader assetReaderWithAsset:anAsset error:&error];
if (error) {
DLog(#"Error:%#", error.localizedDescription);
return YES;
AVAssetTrack *videoTrack = [[anAsset tracksWithMediaType:AVMediaTypeVideo] objectAtIndex:0];
AVAssetReaderTrackOutput *trackOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:videoTrack outputSettings:nil];
[assetReader addOutput:trackOutput];
[assetReader startReading];
float numFrames = 0;
float keyFrames = 0;
while (numFrames < 600) { // If the video is long - only parse through 20 seconds worth.
CMSampleBufferRef sampleBuffer = [trackOutput copyNextSampleBuffer];
if (sampleBuffer) {
// NB: not every sample buffer corresponds to a frame!
if (CMSampleBufferGetNumSamples(sampleBuffer) > 0) {
numFrames += 1;
NSArray *attachmentArray = ((NSArray*)CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, false));
if (attachmentArray) {
NSDictionary *attachment = attachmentArray[0];
NSNumber *depends = attachment[(__bridge NSNumber*)kCMSampleAttachmentKey_DependsOnOthers];
if (depends) {
if (depends.boolValue) {
keyFrames += 1;
else {
needsProcessing = keyFrames / numFrames < 0.95f; // If more than 95% of the frames are keyframes - don't decompress.
return needsProcessing;
Using kCMSampleAttachmentKey_DependsOnOthers was giving me 0 key frames in some cases, when ffprobe would return key frames.
To get the same number of key frames as ffprobe shows, I used:
if attachment[CMSampleBuffer.PerSampleAttachmentsDictionary.Key.notSync] == nil {
keyFrames += 1
In the CoreMedia header it says:
/// Boolean (absence of this key implies Sync)
public static let notSync: CMSampleBuffer.PerSampleAttachmentsDictionary.Key
for dependsOnOthers key it says:
/// `true` (e.g., non-I-frame), `false` (e.g. I-frame), or absent if
/// unknown
public static let dependsOnOthers: CMSampleBuffer.PerSampleAttachmentsDictionary.Key
I’m looking for a way to maintain a seamless audio track while flipping between front and back camera. Many apps in the market can do this, one example is SnapChat…
Solutions should use AVCaptureSession and AVAssetWriter. Also it should explicitly not use AVMutableComposition since there is a bug between AVMutableComposition and AVCaptureSession ATM. Also, I can't afford post processing time.
Currently when I change the video input the audio recording skips and becomes out of sync.
I’m including the code that could be relevant.
Flip Camera
-(void) updateCameraDirection:(CamDirection)vCameraDirection {
if(session) {
AVCaptureDeviceInput* currentInput;
AVCaptureDeviceInput* newInput;
BOOL videoMirrored = NO;
switch (vCameraDirection) {
case CamDirection_Front:
currentInput = input_Back;
newInput = input_Front;
videoMirrored = NO;
case CamDirection_Back:
currentInput = input_Front;
newInput = input_Back;
videoMirrored = YES;
[session beginConfiguration];
//disconnect old input
[session removeInput:currentInput];
//connect new input
[session addInput:newInput];
//get new data connection and config
dataOutputVideoConnection = [dataOutputVideo connectionWithMediaType:AVMediaTypeVideo];
dataOutputVideoConnection.videoOrientation = AVCaptureVideoOrientationPortrait;
dataOutputVideoConnection.videoMirrored = videoMirrored;
[session commitConfiguration];
Sample Buffer
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection {
//not active
//start session if not started
if(!startedSession) {
startedSession = YES;
[assetWriter startSessionAtSourceTime:CMSampleBufferGetPresentationTimeStamp(sampleBuffer)];
//Process sample buffers
if (connection == dataOutputAudioConnection) {
if([assetWriterInputAudio isReadyForMoreMediaData]) {
BOOL success = [assetWriterInputAudio appendSampleBuffer:sampleBuffer];
} else if (connection == dataOutputVideoConnection) {
if([assetWriterInputVideo isReadyForMoreMediaData]) {
BOOL success = [assetWriterInputVideo appendSampleBuffer:sampleBuffer];
Perhaps adjust audio sample timeStamp?
Hey I was facing the same issue and discovered that after switching cameras the next frame was pushed far out of place. This seemed to shift every frame after that thus causing the the video and audio to be out of sync. My solution was to shift every misplaced frame to it's correct position after switching cameras.
Sorry my answer will be in Swift 4.2
You'll have to use AVAssetWriterInputPixelBufferAdaptor in order to append the sample buffers at a specify presentation timestamp.
previousPresentationTimeStamp is the presentation timestamp of the previous frame and currentPresentationTimestamp is as you guessed the presentation timestamp of the current. maxFrameDistance worked every well when testing but you can change this to your liking.
let currentFramePosition = (Double(self.frameRate) * Double(currentPresentationTimestamp.value)) / Double(currentPresentationTimestamp.timescale)
let previousFramePosition = (Double(self.frameRate) * Double(previousPresentationTimeStamp.value)) / Double(previousPresentationTimeStamp.timescale)
var presentationTimeStamp = currentPresentationTimestamp
let maxFrameDistance = 1.1
let frameDistance = currentFramePosition - previousFramePosition
if frameDistance > maxFrameDistance {
let expectedFramePosition = previousFramePosition + 1.0
//print("[mwCamera]: Frame at incorrect position moving from \(currentFramePosition) to \(expectedFramePosition)")
let newFramePosition = ((expectedFramePosition) * Double(currentPresentationTimestamp.timescale)) / Double(self.frameRate)
let newPresentationTimeStamp = CMTime.init(value: CMTimeValue(newFramePosition), timescale: currentPresentationTimestamp.timescale)
presentationTimeStamp = newPresentationTimeStamp
let success = assetWriterInputPixelBufferAdator.append(pixelBuffer, withPresentationTime: presentationTimeStamp)
if !success, let error = assetWriter.error {
Also please note - This worked because I kept the frame rate consistent, so make sure that you have total control of the capture device's frame rate throughout this process.
I have a repo using this logic here
I did manage to find an intermediate solution for the sync problem I found on the Woody Jean-louis solution using is repo.
The results are similar to what instagram does but it seems to work a little bit better. Basically what I do is to prevent the assetWriterAudioInput to append new samples when switching cameras. There is no way to know exactly when this happens so I figured out that before and after the switch the captureOutput method was sending video samples every 0.02 seconds +- (max 0.04 seconds).
Knowing this I created a self.lastVideoSampleDate that is updated every time a video sample is appended to assetWriterInputPixelBufferAdator and I only allow the audio sample to be appended to assetWriterAudioInput is that date is lower than 0.05.
if let assetWriterAudioInput = self.assetWriterAudioInput,
output == self.audioOutput, assetWriterAudioInput.isReadyForMoreMediaData {
let since = Date().timeIntervalSince(self.lastVideoSampleDate)
if since < 0.05 {
let success = assetWriterAudioInput.append(sampleBuffer)
if !success, let error = assetWriter.error {
let success = assetWriterInputPixelBufferAdator.append(pixelBuffer, withPresentationTime: presentationTimeStamp)
if !success, let error = assetWriter.error {
self.lastVideoSampleDate = Date()
The most 'stable way' to fix this problem - is to 'pause' recording when switching sources.
But also you can 'fill the gap' with blank video and silent audio frames.
This is what I have implemented in my project.
So, create boolean to block ability to append new CMSampleBuffer's while switching cameras/microphones and reset it after some delay:
let idleTime = 1.0
self.recordingPaused = true
DispatchQueue.main.asyncAfter(deadline: .now() + idleTime) {
self.recordingPaused = false
In writeAllIdleFrames method you need to calculate how many frames you need to write:
func writeAllIdleFrames() {
let framesPerSecond = 1.0 / self.videoConfig.fps
let samplesPerSecond = 1024 / self.audioConfig.sampleRate
let videoFramesCount = Int(ceil(self.switchInputDelay / framesPerSecond))
let audioFramesCount = Int(ceil(self.switchInputDelay / samplesPerSecond))
for index in 0..<max(videoFramesCount, audioFramesCount) {
// creation synthetic buffers
recordingQueue.async {
if index < videoFramesCount {
let pts = self.nextVideoPTS()
self.writeBlankVideo(pts: pts)
if index < audioFramesCount {
let pts = self.nextAudioPTS()
self.writeSilentAudio(pts: pts)
How to calculate next PTS?
func nextVideoPTS() -> CMTime {
guard var pts = self.lastVideoRawPTS else { return CMTime.invalid }
let framesPerSecond = 1.0 / self.videoConfig.fps
let delta = CMTime(value: Int64(framesPerSecond * Double(pts.timescale)),
timescale: pts.timescale, flags: pts.flags, epoch: pts.epoch)
pts = CMTimeAdd(pts, delta)
return pts
Tell me, if you also need code that creates blank/silent video/audio buffers :)
I want to build an Apple TV app that plays a list of short videos and plays music over them.
To achieve this I need to do two following things:
1) Mute the videos or remove the audio tracks from them
I have no Idea if/how this is possible. I looked around the TVJS documentation for Player and MediaItem but found nothing.
2) Play two media items at the same time.
From this I already know that this is at least not possible with two players. I also tried to use the background audio of my TVML Template but this didn't work either.
Does anyone know of a way how something like this would be possible?
Edit (some more information):
For testing stuff I used the code from this article
At the suggesion of Daniel Storm I tried to change the load function in Presenter.js to both
load: function(event) {
var self = this,
ele = event.target,
videoURL = ele.getAttribute("videoURL")
if(videoURL) {
var player = new Player();
var playlist = new Playlist();
var mediaItem = new MediaItem("video", videoURL);
player.playlist = playlist;
mediaItem.volume = 0.0;
load: function(event) {
var self = this,
ele = event.target,
videoURL = ele.getAttribute("videoURL")
if(videoURL) {
var player = new Player();
var playlist = new Playlist();
var mediaItem = new MediaItem("video", videoURL);
player.playlist = playlist;
player.volume = 0.0;
but neither worked.
I'm implementing an AVAssetResourceLoaderDelegate, and I'm having a bit of trouble getting to to behave correctly. My goal is to intercept any requests made by the AVPlayer, make the request myself, write the data out to a file, then respond to the AVPlayer with the file data.
The issue I'm seeing: I can intercept the first request, which is only asking for two bytes, and respond to it. After that, I'm not getting any more requests hitting my AVAssetResourceLoaderDelegate.
When I intercept the very first AVAssetResourceLoadingRequest from the AVPlayer it looks like this:
<AVAssetResourceLoadingRequest: 0x17ff9e40,
URL request = <NSMutableURLRequest: 0x17f445a0> { URL: fakeHttp://blah.com/blah/blah.mp3 },
request ID = 1,
content information request = <AVAssetResourceLoadingContentInformationRequest: 0x17ff9f30,
content type = "(null)",
content length = 0,
byte range access supported = NO,
disk caching permitted = NO>,
data request = <AVAssetResourceLoadingDataRequest: 0x17e0d220,
requested offset = 0,
requested length = 2,
current offset = 0>>
As you can see, this is only a request for the first two bytes of data. I'm taking the fakeHttp protocol in the URL, replacing it with just http, and making the request myself.
Then, here's how I'm responding to the request once I have some data:
- (BOOL)resourceLoader:(AVAssetResourceLoader *)resourceLoader shouldWaitForLoadingOfRequestedResource:(AVAssetResourceLoadingRequest *)loadingRequest {
//Make the remote URL request here if needed, omitted
CFStringRef contentType = UTTypeCreatePreferredIdentifierForTag(kUTTagClassMIMEType, (__bridge CFStringRef)([self.response MIMEType]), NULL);
loadingRequest.contentInformationRequest.byteRangeAccessSupported = YES;
loadingRequest.contentInformationRequest.contentType = CFBridgingRelease(contentType);
loadingRequest.contentInformationRequest.contentLength = [self.response expectedContentLength];
//Where responseData is the appropriate NSData to respond with
[loadingRequest.dataRequest respondWithData:responseData];
[loadingRequest finishLoading];
return YES;
I've stepped through this and verified that everything in the contentInformationRequest is filled in correctly, and that the data I'm sending is NSData with the appropriate length (in this case, two bytes).
No more requests get sent to my delegate, and the player does not play (presumably because it only has two bytes of data, and hasn't requested any more).
Does anyone have experience with this to point me toward an area where I may be doing something wrong? I'm running iOS 7.
Edit: Here's what my completed request looks like, after I call finishedLoading:
<AVAssetResourceLoadingRequest: 0x16785680,
URL request = <NSMutableURLRequest: 0x166f4e90> { URL: fakeHttp://blah.com/blah/blah.mp3 },
request ID = 1,
content information request = <AVAssetResourceLoadingContentInformationRequest: 0x1788ee20,
content type = "public.mp3",
content length = 7695463,
byte range access supported = YES,
disk caching permitted = NO>,
data request = <AVAssetResourceLoadingDataRequest: 0x1788ee60,
requested offset = 0,
requested length = 2,
current offset = 2>>
- (BOOL)resourceLoader:(AVAssetResourceLoader *)resourceLoader shouldWaitForLoadingOfRequestedResource:(AVAssetResourceLoadingRequest *)loadingRequest
loadingRequest.contentInformationRequest.contentType = #"public.aac-audio";
loadingRequest.contentInformationRequest.contentLength = [self.fileData length];
loadingRequest.contentInformationRequest.byteRangeAccessSupported = YES;
NSData *requestedData = [self.fileData subdataWithRange:NSMakeRange((NSUInteger)loadingRequest.dataRequest.requestedOffset,
[loadingRequest.dataRequest respondWithData:requestedData];
[loadingRequest finishLoading];
return YES;
This implementation works for me. It always asks for the first two bytes and then for the whole data. If you don't get another callback it means that there was something wrong with the first response you have made. I guess the problem is that you are using MIME content type instead of UTI.
Circling back to answer my own question in case anyone was curious.
The issue boiled down to threading. Though it's not explicitly documented anywhere, AVAssetResourceLoaderDelegate does some weird stuff with threads.
Essentially, my issue was that I was creating the AVPlayerItem and AVAssetResourceLoaderDelegate on the main thread, but responding to delegate calls on a background thread (since they were the result of network calls). Apparently, AVAssetResourceLoader just completely ignores responses coming in on a different thread than it was expecting.
I solved this by just doing everything, including AVPlayerItem creation, on the same thread.