I am using VTCompressionSessionEncodeFrameWithOutputHandler to compress pixel buffers from camera into raw h264 stream. I am using kVTEncodeFrameOptionKey_ForceKeyFrame to be sure that every output from VTCompressionSessionEncodeFrame is not dependent on other pieces. Also, there is kVTCompressionPropertyKey_AllowFrameReordering = false, kVTCompressionPropertyKey_RealTime = true options during session initialization and VTCompressionSessionCompleteFrames called after each VTCompressionSessionEncodeFrame call.
I also collect samples, produced by VTCompressionSessionEncodeFrame and periodically save them as MP4 file (using Bento4 library).
But final track is always shorter than samples, feeded to VTCompressionSessionEncodeFrame on 1-2 seconds. After several attempts to resolve this, i can be sure, that is it VTCompressionSessionEncodeFrame outputs frames, that depends on later frames to be decoded properly - so this frames are lost, since they can not be used to produce "final chunks" of the track.
So the question - how one can force VTCompressionSessionEncodeFrame to produce totally independent data chunks?
Turn out this was... FPS issue! NAL units do not have special timing itself (aside of pts, which is capture-fps-bound in my case), so it is quite important they are produced at exact rate as FPS in movie is expecting them to be... Nothing was lost, just saved frames were played faster (this was not so easy to spot, in fact)
Related
I'm trying to demodulate a signal using GNU Radio Companion. The signal is FSK (Frequency-shift keying), with mark and space frequencies at 1200 and 2200 Hz, respectively.
The data in the signal text data generated by a device called GeoStamp Audio. The device generates audio of GPS data fed into it in real time, and it can also decode that audio. I have the decoded text version of the audio for reference.
I have set up a flow graph in GNU Radio (see below), and it runs without error, but with all the variations I've tried, I still can't get the data.
The output of the flow graph should be binary (1s and 0s) that I can later convert to normal text, right?
Is it correct to feed in a wav audio file the way I am?
How can I recover the data from the demodulated signal -- am I missing something in my flow graph?
This is a FFT plot of the wav audio file before demodulation:
This is the result of the scope sink after demodulation (maybe looks promising?):
UPDATE (August 2, 2016): I'm still working on this problem (occasionally), and unfortunately still cannot retrieve the data. The result is a promising-looking string of 1's and 0's, but nothing intelligible.
If anyone has suggestions for figuring out the settings on the Polyphase Clock Sync or Clock Recovery MM blocks, or the gain on the Quad Demod block, I would greatly appreciate it.
Here is one version of an updated flow graph based on Marcus's answer (also trying other versions with polyphase clock recovery):
However, I'm still unable to recover data that makes any sense. The result is a long string of 1's and 0's, but not the right ones. I've tried tweaking nearly all the settings in all the blocks. I thought maybe the clock recovery was off, but I've tried a wide range of values with no improvement.
So, at first sight, my approach here would look something like:
What happens here is that we take the input, shift it in frequency domain so that mark and space are at +-500 Hz, and then use quadrature demod.
"Logically", we can then just make a "sign decision". I'll share the configuration of the Xlating FIR here:
Notice that the signal is first shifted so that the center frequency (middle between 2200 and 1200 Hz) ends up at 0Hz, and then filtered by a low pass (gain = 1.0, Stopband starts at 1 kHz, Passband ends at 1 kHz - 400 Hz = 600 Hz). At this point, the actual bandwidth that's still present in the signal is much lower than the sample rate, so you might also just downsample without losses (set decimation to something higher, e.g. 16), but for the sake of analysis, we won't do that.
The time sink should now show better values. Have a look at the edges; they are probably not extremely steep. For clock sync I'd hence recommend to just go and try the polyphase clock recovery instead of Müller & Mueller; chosing about any "somewhat round" pulse shape could work.
For fun and giggles, I clicked together a quick demo demod (GRC here):
which shows:
I'd like to be able to determine at what byte positions a segment of an NSData compressed mp3 file begins and ends.
For example, if I am playing an mp3 file using the AVPlayer (or any player) that is 1 minute long and 1000000 bytes, I'd like to know approximately at how many bytes in the file the 30 second mark happens, then how many bytes the 40 second mark happens.
Note that due to the mp3 file being compressed I can't just divide the bytes in half to determine the 30 second mark.
If this can't be done with Swift/Objective-C, do you know if this determination can be done with any programming language? Thanks!
It turns out I had a different problem to solve. I was trying to approximate the byte position of a specific time, say, the 4:29 point of a 32:45 long podcast episode, within a few seconds of accuracy.
I used a function along these lines to calculate the approximate byte position:
startTimeBytesPosition = (startTimeInSeconds / episodeDuration) * episodeFileSize
That function worked like a charm for some episodes, but for others the resulting start time would be off by about 30-40 seconds.
It turns out this inaccuracy was happening because some mp3s contain metadata at the very beginning, and image files stored within metadata can be +500KB, so my calculation of time based on byte position for any episode with an image file would be off by about 500KB (which translated into about 30-40 seconds in this case).
To resolve this, I am first determining the size in bytes of the metadata in an mp3 file, and then use that to offset the approximation function:
startTimeBytesPosition = metadataBytesOffset + (startTimeInSeconds / episodeDuration) * episodeFileSize
So far this code seems to be doing a good job of approximating time based on byte position accurately within a few seconds.
I should note that this assumes that the metadata for the image will always appear at the beginning of the mp3 file, and I don't know if that will always be the case.
I'm trying to develop an iPhone app that will use the camera to record only the last few minutes/seconds.
For example, you record some movie for 5 minutes click "save", and only the last 30s will be saved. I don't want to actually record five minutes and then chop last 30s (this wont work for me). This idea is called "Loop recording".
This results in an endless video recording, but you remember only last part.
Precorder app do what I want to do. (I want use this feature in other context)
I think this should be easily simulated with a Circular buffer.
I started a project with AVFoundation. It would be awesome if I could somehow redirect video data to a circular buffer (which I will implement). I found information only on how to write it to a file.
I know I can chop video into intervals and save them, but saving it and restarting camera to record another part will take time and it is possible to lose some important moments in the movie.
Any clues how to redirect data from camera would be appreciated.
Important! As of iOS 8 you can use VTCompressionSession and have direct access to the NAL units instead of having to dig through the container.
Well luckily you can do this and I'll tell you how, but you're going to have to get your hands dirty with either the MP4 or MOV container. A helpful resource for this (though, more MOV-specific) is Apple's Quicktime File Format Introduction manual
http://developer.apple.com/library/mac/#documentation/QuickTime/QTFF/QTFFPreface/qtffPreface.html#//apple_ref/doc/uid/TP40000939-CH202-TPXREF101
First thing's first, you're not going to be able to start your saved movie from an arbitrary point 30 seconds before the end of the recording, you'll have to use some I-Frame at approximately 30 seconds. Depending on what your Keyframe Interval is, it may be several seconds before or after that 30 second mark. You could use all I-frames and start from an arbitrary point, but then you'll probably want to re-encode the video afterward because it will be quite large.
SO knowing that, let's move on.
First step is when you set up your AVAssetWriter, you will want to set its AVAssetWriterInput's expectsMediaDataInRealTime property to YES.
In the captureOutput callback you'll be able to do an fread from the file you are writing to. The first fread will get you a little bit of MP4/MOV (whatever format you're using) header (i.e. 'ftyp' atom, 'wide' atom, and the beginning of the 'mdat' atom). You want what's inside the 'mdat' section. So the offset you'll start saving data from will be 36 or so.
Each read will get you 0 or more AVC NAL Units. You can find a listing of NAL unit types from ISO/IEC 14496-10 Table 7-1. They will be in a slightly different format than specified in Annex B, but it's fine. Additionally, there will only be IDR slices and non-IDR slices in the MP4/MOV file. IDR will be the I-Frame you're looking to hang onto.
The NAL unit format in the MP4/MOV container is as follows:
4 bytes - Size
[Size] bytes - NALU Data
data[0] & 0x1F - NALU Type
So now you have the data you're looking for. When you go to save this file, you'll have to update the MPV/MOV container with the correct length, sample count, you'll have to update the 'stsz' atom with the correct sizes for each sample and things like updating the media headers and track headers with the correct duration of the movie and so on. What I would probably recommend doing is creating a sample container on first run that you can more or less just overwrite/augment with the appropriate data for that particular movie. You'll want to do this because the encoders on the various iDevices don't all have the same settings and the 'avcC' atom contains encoder information.
You don't really need to know much about the AVC stream in this case, so you'll probably want to concentrate your experimenting around updating the container format you choose correctly. Good luck.
I'm trying to simultaneously read and write H.264 mov file written by AVAssetWriter. I managed to extract individual NAL units, pack them into ffmpeg's AVPackets and write them into another video format using ffmpeg. It works and the resulting file plays well except the playback speed is not right. How do I calculate the correct PTS/DTS values from raw H.264 data? Or maybe there exists some other way to get them?
Here's what I've tried:
Limit capture min/max frame rate to 30 and assume that the output file will be 30 fps. In fact its fps is always less than values that I set. And also, I think the fps is not constant from packet to packet.
Remember each written sample's presentation timestamp and assume that samples map one-to-one to NALUs and apply saved timestamp to output packet. This doesn't work.
Setting PTS to 0 or AV_NOPTS_VALUE. Doesn't work.
From googling about it I understand that raw H.264 data usually doesn't contain any timing info. It can sometimes have some timing info inside SEI, but the files that I use don't have it. On the other hand, there are some applications that do exactly what I'm trying to do, so I suppose it is possible somehow.
You will either have to generate them yourself, or access the Atom's containing timing information in the MP4/MOV container to generate PTS/DTS information. FFmpeg's mov.c in libavformat might help.
Each sample/frame you write with AVAssetWriter will map one to one with the VCL NALs. If all you are doing is converting then have FFmpeg do all the heavy lifting. It will properly maintain the timing information when going from one container format to another.
The bitstream generated by AVAssetWriter does not contain SEI data. It only contains SPS/PPS/I/P frames. The SPS also does not contain VUI or HRD parameters.
-- Edit --
Also, keep in mind that if you are saving PTS information from the CMSampleBufferRef's then the time base may be different from that of the target container. For instance AVFoundation time base is nanoseconds, and a FLV file is milliseconds.
I'm working in iOS and have a simple OpenAL project running.
The difference to most openAL projects i've seen is that im not loading in a sound file. Instead I load an array of raw data into the alBufferData. Using a couple of equations I can load in data to produce white noise, sine and pulse waves. And all is working well.
My problem is that I need a way to modify this data whilst the sound is playing in real-time.
Is there a way to modify this data without having to create a new buffer (i tried the approach of creating a new buffer with new data and then use it instead but its nowhere near quick enough).
Any help or suggestions of other ways to accomplish this would be much appreciated.
Thanks
I haven't done it on iOS, but with openAL on the PC what you would do is chain a few buffers together. Each buffer would have a small time period's worth of data. Periodically, check to see if the playing buffer is done, and if so, add it to a free list for reuse When you want to change the sound, write the new waveform into a free buffer and add it to the chain. You select the buffer size to balance latency and required update rate - smaller buffers allow faster response to changes, but need to be generated more often.
This page suggests that a half second update rate is doable. Whether you can go faster depends on the complexity of your calculations as well as on the overhead of the OS.
Changing the data during playback is not supported in OpenAL.
However, you can still try it and see if you get acceptable defaults (though you'll be racing against the OpenAL playback mechanism, and any lag-outs in your app could throw it off, so do this at your own risk).
There's an Apple extension version of ALBufferData that tells OpenAL to use the data you give it directly, rather than making its own local copy. You set it up like so:
typedef ALvoid AL_APIENTRY (*alBufferDataStaticProcPtr) (const ALint bid,
ALenum format,
const ALvoid* data,
ALsizei size,
ALsizei freq);
static alBufferDataStaticProcPtr alBufferDataStatic = NULL;
alBufferDataStatic = (alBufferDataStaticProcPtr) alcGetProcAddress(NULL, (const ALCchar*) "alBufferDataStatic");
Call alBufferDataStatic() it like you would call alBufferData():
alBufferDataStatic(bufferId, format, data, size, frequency);
Since it's now using your sound data buffer rather than its own, you could conceivably modify that data and it won't be the wiser (provided you're not modifying things too close to where it's currently playing from in the buffer).
However, this approach is risky, since it depends on timing you're not fully in control of. To be 100% safe you'll need to use Audio Units.