THE SCENARIO I am working on an application that uses RTPlayer to play prerecorded Video and Audio from our server.
THE SUSPECTS RTPlayer has 2 useful properties; .initialPlaybackTime and .currentPosition for calculating media time position in seconds. .initialPlaybackTime sets where in the media the player should start playing from, and .currentPosition tells you where you left off to resume at the same position in the media.
THE ISSUE The .initialPlaybackTime property is of int64_t type, and .currentPosition is of type float. When I "plug" the .currentPosition value into .initialPlaybackTime there is always about 8-10 seconds ADDED to the player's position.
QUESTION How can I convert the .currentPosition float value to a int64_t and keep the same value?
The "8-10 seconds being added to the player's position" may have something to do with the underlying HTTP Live Streaming (HLS) technology.
If the media you are playing is streamed it is likely that it conforms to this technology and, if so, will be split into several smaller chunks of media (my experience is this is usually about 15 seconds in length for video) at a variety of bitrates.
In that case, unless the initialPlaybackTime is set to a value that coincides with the start time of one of those media segments, it is possible the player is just using the nearest media segment and jumping to the beginning of that segment (a common practice) or to the next segment if it is near the end of the current segment to reduce loading a full segment's worth of media data without playing it.
Related
I want to use appendBuffer and append only piece of the media I have.
To cut the piece from the end, I use appendWindowEnd and it works.
To cut it from the beginning I have to set timestampOffset lower than appendWindowStart. I have seen shaka-player doing something similar.
var appendWindowStart = Math.max(0, currentPeriod.startTime - windowFudge);
var appendWindowEnd = followingPeriod ? followingPeriod.startTime : duration;
...
var timestampOffset = currentPeriod.startTime -mediaState.stream.presentationTimeOffset;
From my tests, it works when timestampOffset is
same as appendWindowStart
1/10 second lower
Does't work when timestampOffset is lower than that. The segment doesn't get added. Does that have something to do with my media or the spec/implementation doesn't allow it?
From MDN web docs:
The appendWindowStart property of the SourceBuffer interface controls the timestamp for the start of the append window, a timestamp range that can be used to filter what media data is appended to the SourceBuffer. Coded media frames with timestamps within this range will be appended, whereas those outside the range will be filtered out.
Just found this in the specification, so I am updating the question:
If presentation timestamp is less than appendWindowStart, then set the need random access point flag to true, drop the coded frame, and jump to the top of the loop to start processing the next coded frame.
Some implementations may choose to collect some of these coded frames with presentation timestamp less than appendWindowStart and use them to generate a splice at the first coded frame that has a presentation timestamp greater than or equal to appendWindowStart even if that frame is not a random access point. Supporting this requires multiple decoders or faster than real-time decoding so for now this behavior will not be a normative requirement.
If frame end timestamp is greater than appendWindowEnd, then set the need random access point flag to true, drop the coded frame, and jump to the top of the loop to start processing the next coded frame.
Some implementations may choose to collect coded frames with presentation timestamp less than appendWindowEnd and frame end timestamp greater than appendWindowEnd and use them to generate a splice across the portion of the collected coded frames within the append window at time of collection, and the beginning portion of later processed frames which only partially overlap the end of the collected coded frames. Supporting this requires multiple decoders or faster than real-time decoding so for now this behavior will not be a normative requirement. In conjunction with collecting coded frames that span appendWindowStart, implementations may thus support gapless audio splicing.
If the need random access point flag on track buffer equals true, then run the following steps:
If the coded frame is not a random access point, then drop the coded frame and jump to the top of the loop to start processing the next coded frame.
Set the need random access point flag on track buffer to false.
and
Random Access Point
A position in a media segment where decoding and continuous playback can begin without relying on any previous data in the segment. For video this tends to be the location of I-frames. In the case of audio, most audio frames can be treated as a random access point. Since video tracks tend to have a more sparse distribution of random access points, the location of these points are usually considered the random access points for multiplexed streams.
Does that mean, that for a video, I have to choose timeOffset, which lands on 'I' frame?
The use of timestampOffset doesn't require an I-Frame. It just shifts the timestamp of each frame by that value. That shift calculations is performed before anything else (before appendWindowStart getting involved)
It's the use of appendWindowStart that are impacted to where your I-frames are.
appendWindowStart and appendWindowEnd act as an AND over the data you're adding.
MSE doesn't reprocess your data, by setting appendWindowStart you're telling the source buffer that any data contained prior that time are to be excluded
Also MSE works at the fundamental level of GOP (group of picture): from one I-Frame to another.
So let's imagine this group of images, made of 16 frames GOP, each having a duration of 1s.
.IPPPPPPPPPPPPPPP IPPPPPPPPPPPPPPP IPPPPPPPPPPPPPPP IPPPPPPPPPPPPPPP
Say now you set appendWindowStart to 10
In the ideal world you would have:
. PPPPPPP IPPPPPPPPPPPPPPP IPPPPPPPPPPPPPPP IPPPPPPPPPPPPPPP
All previous 9 frames with a time starting prior appendWindowStart have been dropped.
However, now those P-Frames can't be decoded, hence MSE set in the spec the "need random access point flag" to true, so the next frame added to the source buffer can only be an I-Frame
and so you end up in your source buffer with:
. IPPPPPPPPPPPPPPP IPPPPPPPPPPPPPPP IPPPPPPPPPPPPPPP
To be able to add the frames between appendWindowStart and the next I-Frame would be incredibly hard and time expensive.
It would require to decode all frames before adding them to the source buffer, storing them either as raw YUV data, or if hardware accelerated storing the GPU backed image.
A source buffer could contain over a minute of video at any given time. Imagine if it had to deal with decompressed data now rather than compressed one.
Now, if we wanted to preserve the same memory constraint as now (which is around 100MiB of data maximum per source buffer), you would have to recompress on the fly the content before adding it to the source buffer.
not gonna happen.
The perfect example of what I am trying to achieve can be seen in the Flow ● Slow and Fast Motion app .
One can change the playback rate of the video by dragging points on the curve up or down. The video can also be saved in this state.
I am looking for a way to dynamically speed up/down a video , so that the playback rate can be changed while the video is being played.
Video explanation
WHAT I'VE TRIED
The playback rate property of AVPlayer .But it Only works with a few values for playback Rate(0.50, 0.67, 0.80, 1.0, 1.25, 1.50, and 2.0 ) and one cannot save the video
The scaleTimeRange(..) property of AVMutableComposition. But it doesn't work when you want to ramp the video for gradually decreasing slow/fast motion.
Display video frames on screen using CAEAGLLayer and CADisplayLink. But my many attempts on trying to achieve Slow/Fast motion with this have been unsuccessful .
All this has taken me months and I'm starting to doubt if I'll be able to accomplish this at all.
Thus any suggestion , would be immensely valuable.
In IOS, the MPNowPlayingInfoCenter object contains a 'nowPlayingInfo' dictionary whose contents describe the item being played. It is advised that you start the playback at the 'currentplaybackrate' and then set the speed. See this thread on the developer's forum.
You might possibly end up with something like this (but this is javascript) where the playback rate of the video has been sped up by 4.
document.querySelector('video').playbackRate = 4.0;
document.querySelector('video').play();
video{width:400px;
height:auto;}
<video controls preload="true" autoplay>
<source src="http://www.rachelgallen.com/nature.mp4" type="video/mp4" >
</video>
So I'm not sure I fully understand the use case you're going for, but I think
func setRate(_ rate: Float,
time itemTime: CMTime,
atHostTime hostClockTime: CMTime)
[Apple Documentation Source]
Is something that you're looking for. While this may not be exactly what you need, I'm also not sure where in the docs there is exactly what you're looking for, but with the above method alone, you could do the following to save videos at a variable rate:
Use the above method to play the video throughout (assuming it's not too long, otherwise this will be computationally impossible/timeout-worthy on some devices) at the desired rates each second. Design UI to adjust this per second rate.
under the hood you can actually play the video at that speed "frame by frame" and capture the frames you want (in the right # which will give you the rate you desire) and voila -- saving the right number of frames together (skipping/duplicating as needed to increase/lower desired rate based on "picker" UI) you've now accomplished what you desire
To be clear, what I'm talking about here is a video output # 60FPS has 60 frames per second. You would literally "cut and paste" frames together from the source video into the "destination" video based on whatever UI steppers values you receive from your user (in the screenshot-ed example the question contains, as my basis), and pick up that many frames. AKA if the user says seconds 2-10 of their 20 second video should be at 2X, only put in 30 frames for each of those seconds (if filmed at 60 FPS) alternating frames. The output will, at 60FPS, seem like 2X speed (since there are now 30 frames per 1 second of original video, which is 0.5 seconds at 60 FPS). Similarly, any value can appropriately be factored into:
(desired consistent FPS) = (source video FPS) = (destination video FPS) (ie 60 or 90)
(rate) = (rate from UI steppers/graph UI to pick rate # each time interval) (ie 1X/2x/0.25X)
(desired consistent FPS) * (rate) = (# frames kept in destination video)
(destination video frames) = (source video) * (desired consistent FPS) ~modulated by~ (per custom time interval rate)
The exact mechanisms for ^^ might actually be built into AVPlayer and I didn't find the details, but this alone should be a good start to get you going in that direction.
In my AudioInputRenderCallback I'm looking to capture an accurate time stamp of certain audio events. To test my code, I'm inputting a click track #120BPM or every 500 milliseconds (The click is accurate, I checked, and double checked). I first get the decibel of every sample, and check if it's over a threshold, this works as expected. I then take the hostTime from the AudioTimeStamp, and convert it to milliseconds. The first click gets assigned to that static timestamp and the second time through does a calculation of the interval and then reassigns to the static one. I expected to see a 500 interval. To be able to calculate the click correctly I have to be with in 5 milliseconds. The numbers seem to bounce back and forth between 510 & 489. I understand it's not an RTOS, but can iOS be this accurate? Is there any issues with using the mach_absolute_time member of the AudioUnitTimeStamp?
Audio Units are buffer based. The minimum length of an iOS Audio Unit buffer seems to be around 6 mS. So if you use the time-stamps of the buffer callbacks, your time resolution or time sampling jitter will be about +- 6 mS.
If you look at the actual raw PCM samples inside the Audio Unit buffer and pattern match the "attack" transient (by threshold or autocorrelation, etc.) you might be able get sub-millisecond resolution.
I am trying to synchronize several CABasicAnimations with AVAudioPlayer. The issue I have is that CABasicAnimation uses CACurrentMediaTime() as a reference point when scheduling animations while AVAudioPlayer uses deviceCurrentTime. Also for the animations, CFTimeInterval is used, while for sound it's NSTimeInterval (not sure if they're "toll free bridged" like other CF and NS types). I'm finding that the reference points are different as well.
Is there a way to ensure that the sounds and animations use the same reference point?
I don't know the "official" answer, but they are both double precision floating point numbers that measure a number of seconds from some reference time.
From the docs, it sounds like deviceCurrentTime is linked to the current audio session:
The time value, in seconds, of the audio output device. (read-only)
#property(readonly) NSTimeInterval deviceCurrentTime Discussion The
value of this property increases monotonically while an audio player
is playing or paused.
If more than one audio player is connected to the audio output device,
device time continues incrementing as long as at least one of the
players is playing or paused.
If the audio output device has no connected audio players that are
either playing or paused, device time reverts to 0.
You should be able to start an audio output session, call CACurrentMediaTime() then get the deviceCurrentTime of your audio session in 2 sequential statements, then calculate an offset constant to convert between them. That offset would be accurate within a few nanoseconds.
The offset would only be valid while the audio output session is active. You'd have to recalculate it each time you remove all audio players from the audio session.
I think the official answer just changed, though currently under NDA.
See "What's New in Camera Capture", in particular the last few slides about the CMSync* functions.
https://developer.apple.com/videos/wwdc/2012/?id=520
It seems to me that CoreAudio adds sound waves together when mixing into a single channel. My program will make synthesised sounds. I know the amplitudes of each of the sounds. When I play them together should I add them together and multiply the resultant wave to keep within the range? I can do it like this:
MaxAmplitude = max1 + max2 + max3 //Maximum amplitude of each sound wave
if MaxAmplitude > 1 then //Over range
Output = (wave1 + wave2 + wave3)/MaxAmplitude //Meet range
else
Output = (wave1 + wave2 + wave3) //Normal addition
end if
Can I do it this way? Should I pre-analyse the sound waves to find the actual maximum amplitude (Because the maximum points may not match on the timeline) and use that?
What I want is a method to play several synthesised sounds together without reducing the volume throughout extremely and sounding seamless. If I play a chord with several synthesised instruments, I don't want to require single notes to be practically silent.
Thank you.
Changing the scale suddenly on a single sample basis, which is what your "if" statement does, can sound very bad, similar to clipping.
You can look into adaptive AGC (automatic gain control) which will change the scale factor more slowly, but could still clip or get sudden volume changes during fast transients.
If you use lookahead with the AGC algorithm to prevent sudden transients from clipping, then your latency will get worse.
If you do use AGC, then isolated notes may sound like they were played much more loudly than when played in a chord, which may not correctly represent a musical composition's intent (although this type of compression is common in annoying TV and radio commercials).
Scaling down the mixer output volume so that the notes will never clip or have their volume reduced other than when the composition indicates will result in a mix with greatly reduced volume for a large number of channels (which is why properly reproduced classical music on the radio is often too quiet to draw enough viewers to make enough money).
It's all a trade-off.
I don't see this is a problem. If you know the max amplitude of all your waves (for all time) it should work. Be sure not to change the amplitude on per sample basis but decide for every "note-on". It is a very simple algorithm but could suit your needs.