How to calculate correct PTS value for frame before encoding in FFmpeg C API?
For encoding I'm using function avcodec_encode_video2 and then writing it by av_interleaved_write_frame.
I found some formulas, but none of them work.
In doxygen example they are using
frame->pts = 0;
for (;;) {
// encode & write frame
// ...
frame->pts += av_rescale_q(1, video_st->codec->time_base, video_st->time_base);
}
This blog says that formula must be like this:
(1 / FPS) * sample rate * frame number
Someone uses only frame number to set pts:
frame->pts = videoCodecCtx->frame_number;
Or an alternative way:
int64_t now = av_gettime();
frame->pts = av_rescale_q(now, (AVRational){1, 1000000}, videoCodecCtx->time_base);
And the last one:
// 40 * 90 means 40 ms and 90 because of the 90kHz by the standard for PTS-values.
frame->pts = encodedFrames * 40 * 90;
Which one is correct? I think answer for this question will be helpful for not only for me.
It's better to think about PTS more abstractly before trying code.
What you're doing is meshing 3 "time sets" together. The first is time we're used to, based on 1000 ms per second, 60 seconds per minute, and so on. The second is the codec time for the particular codec you are using. Each codec has a certain way it wants to represent time, usually in a 1/number format meaning that for every second there is "number" amount of ticks. The third format works similar to the second except that it is the time base for the container that you are used.
Some people prefer to start with actual time, others frame count, neither is "wrong".
Starting with a frame count you need to first convert it based on your frame rate. Note all conversions I speak of use av_rescale_q(...). The purpose of this conversion is to turn a counter into time, so you rescale with your frame rate (video steam time base usually). Then you have to convert that into the time_base of your video codec before encoding.
Similarly, with a real time, your first conversion needs to be from current_time - start_time scaled to your video codec time.
Anyone using only frame counter is probably using a codec with a time_base equal to their frame rate. Most codecs do not work like this and their hack is not portable. Example:
frame->pts = videoCodecCtx->frame_number; // BAD
Additionally, anyone using hardcoded numbers in their av_rescale_q is leveraging the fact that they know what their time_base is and this should be avoided. The code isn't portable to other video formats. Instead use video_st->time_base, video_st->codec->time_base, and output_ctx->time_base to figure things out.
I hope understanding it from a higher level will help you see which of those are "correct" and which are "bad practice". There is no single answer, but maybe now you can decide which approach is best for you.
Time is measured not in seconds or milliseconds or any standard unit. Instead, it is measured by the avCodecContext's timebase.
So if you set the codecContext->time_base to 1/1, it means using second for measurement.
cctx->time_base = (AVRational){1, 1};
Assuming you want to encode at a steady fps of 30. Then, the time when a frame is encoded is framenumber * (1.0/fps)
But once again, the PTS is also not measured in seconds or any standard unit. It's measured by avStream's time_base.
In the question, the author mentioned 90k as the standard resolution for pts. But you will see that this is not always true. The exact resolution is saved in avstream. you can read it back by:
if ((err = avformat_write_header(ofctx, NULL)) < 0) {
std::cout << "Failed to write header" << err << std::endl;
return -1;
}
av_dump_format(ofctx, 0, "test.webm", 1);
std::cout << stream->time_base.den << " " << stream->time_base.num << std::endl;
The value of stream->time_stamp is only populated after calling avformat_write_header
Therefore, the right formula for calculating PTS is:
//The following assumes that codecContext->time_base = (AVRational){1, 1};
videoFrame->pts = frameduration * (frameCounter++) * stream->time_base.den / (stream->time_base.num * fps);
So really there are 3 components in the formula,
fps
codecContext->time_base
stream->time_base
so pts = fps*codecContext->time_base/stream->time_base
I have detailed my discovery here
There's also the option with setting it like frame->pts = av_frame_get_best_effort_timestamp(frame) but I'm not sure this is the correct approach either.
Related
I am trying to program a simple Babymonitor for Windows (personal use).
The babymonitor should just detect the dB level of the microphone and triggers at a certain volume.
After some research, I found the Bass.dll library and came across it's function BASS_ChannelGetLevel, which is great but seems to have limitations and doesn't fit my needs (Peak equals to a DWORD value).
In the examples I found a livespec example which is "almost" what I need. The example uses BASS_ChannelGetData, but I don't quite know how to handle the returned array...
I want to keep it as simple as possible: Detect the volume from the microphone as dB or any other value (e.g. value 0-MAXINT).
How can this be done with the Bass.dll library?
The BASS_ChannelGetLevel returns the value that is capped to 0dB (return value is 32768 in this case). If you adjust your source level (lower microphone level in sound card settings) then it will work just fine.
Another way, if you want to get uncapped value is to use the BASS_ChannelGetLevelEx function instead: it returns floating point levels, where 1 is maximum (0dB) value that corresponds to BASS_ChannelGetLevel's 32767, but it can exceed 1 to detect sound levels above 0dB which is what you may need.
I also suggest you to monitor sound level for a while: trigger only if certain level exists for 2-3 seconds at least (this way you will exclude false alarms).
Here is how you obtain the db level given an input stream handle (streamHandle):
var peak = (double)Bass.BASS_ChannelGetLevel(streamHandle);
var decibels = 20 * Math.Log10(peak / Int32.MaxValue);
Alternatively, you can use the following to get the RMS (average) peak. To get the RMS value, you have to pass in a sample length into BASS_ChannelGetLevel. I'm using 20 milliseconds here but you can play with the value to see which works best for your needs.
var decibels = 0m;
var channelCount = 2; //Assuming two channels
var sampleLengthMS = 20f;
var rmsLevels = new float[channelCount];
var rmsObtained = Bass.BASS_ChannelGetLevel(streamHandle, rmsLevels, sampleLengthMS / 1000f, BASSLevel.BASS_LEVEL_RMS);
if (rmsObtained)
decibels = 20*Math.Log10(rmsLevels[0]); //using first channel (index 0) but you can get both if needed.
else
Console.WriteLine(Bass.BASS_ErrorGetCode());
Hope this helps.
I am writing software in C# measuring or utilizing out or in octets (bytes) via SNMP. I need to do how many bytes pass in 1000 secs?
According to my research, its value gets timed out or reset sometimes because some results give a negative value.
.1.3.6.1.2.1.2.2.1.10 for input stream in .139
In 1024 secs it gives result of -2,1 MBytes.
How can I get accurate measurement of traffic (in or out) ?
EDIT : This code I use for calculations. It takes value in everysec and gets result.
private void timer1_Tick(object sender, EventArgs e)
{
Cursor.Current = Cursors.WaitCursor;
SnmpObject objSnmpObject, objSnmpIfSpeed;
objSnmpObject = (SnmpObject)objSnmpManager.Get(".1.3.6.1.2.1.2.2.1.16.139");
objSnmpIfSpeed = (SnmpObject)objSnmpManager.Get(".1.3.6.1.2.1.2.2.1.5.139");
if (GetResult() == 0)
{
float value = Int64.Parse(objSnmpObject.Value);
float ifSpeed = Int64.Parse(objSnmpIfSpeed.Value);
float Bytes = (value * 8 * 100 / ifSpeed);
// float megaBytes = Bytes / 1024;
sum += Bytes;
tb_calc.Text = (sum.ToString() + " Bytes");
}
_gv_timeSec++;
lb_timer.Text = _gv_timeSec.ToString();
Cursor.Current = Cursors.Default;
}
1.3.6.1.2.1.2.2.1.10 is the OID for IF-MIB::ifInOctets which is described by MIB as a Counter32 which has a upper limit of 2^32-1 (4294967295 decimal).
"The total number of octets received on the interface,
including framing characters.
Discontinuities in the value of this counter can occur at
re-initialization of the management system, and at other times as
indicated by the value of ifCounterDiscontinuityTime."
Quoting this SO answer :
a Counter32 has no defined initial value, so a single reading of
Counter32 has no information content. This is why you have to take two
(or more) readings to make sense of it. An example of this would be
the number of packets received on an ethernet interface. If you take a
reading and get back 4 million packets, you haven't learned anything:
the wire could have been pulled out of the interface for the past
year, or it could be passing millions of packets per second. You have
to take multiple readings to know anything.
I'd recommend ifHCInOctets .1.3.6.1.2.1.31.1.1.1.6 and ifHCOutOctets .1.3.6.1.2.1.31.1.1.1.10 which are 64 bit versions of OIDs mentioned by #k1eran
Those counters don't rotate so quickly when dealing with higher speeds.
I have audio files, with different durations. They have common content and unique content. E.g. two files, 70 seconds each, last 10 seconds of the first file is the same as first two seconds of the second file. How can I find the exact position of common content (e.g. 60.0 of the first file)?
Sounds a little bit messy, hope the following image can help https://drive.google.com/file/d/0BzBE2Kfw8uQoUWNTN1RXOEtLVEk/view?usp=sharing
So, I'm looking for the red mark - common content starts at 60.0 sec of the first file.
The problem is that I have files with different durations. Sometimes it's 70 seconds long, sometimes one file is 70 seconds, the other is 80 seconds long, etc. Most likely they have 60.0 seconds of unique content, but I'm not sure (it could be 59.9 of unique content, etc.).
Thus, I assume I need to get a short snippet of the second file from first 10 seconds and find it in the first file:
For example, output: 2.5 sec of the second file = 62.5 from the first file - works for me, as well.
THE MAIN GOAL IS TO PLAY FILE AFTER FILE GAPLESS. If I get the values, I'll be able to do this. Sometimes the values can be: 2.5 = 63.7, that's why I need the exact match.
Can anybody help with the code or at least some information of how to compare two snippets of audio content? Thanks in advance!
Wow, that is quite a problem to solve. And I must confess that i've not done anything exactly like this or have any code based suggestions.
All I will say is that if I were looking to try and solve this problem, then I would try and save the audio file as some kind of uncompressed and fixed size (as in a known number of bytes per second) format.
Then you could take a section of one file and byte match it with another, then you would know how many bytes inwards that snippet occurred. Then, knowing the bytes per ms (sort of frame size), you could work out the exact time position.
It's a bit hair brained, but i've used that technique with images before but at least audio is linear!
Here is an approximate example of how I would go about doing the comparison of a sample within a sound file.
- (int)positionOf:(NSData*)sample inData:(NSData*)soundfile {
// the block size has to be big enough to find something genuinely unique but small enough to ensure it is still fast.
int blockSize = 128;
int position = 0;
int returnPosition = INT32_MAX;
// check to see if the block size exceeds the sample or data file size
if (soundfile.length < blockSize || sample.length < blockSize) {
return returnPosition;
}
// create a byte array of the sample, ready to use to compare with the shifting buffer
char* sampleByteArray = malloc(sample.length);
memcpy(sampleByteArray, sample.bytes, sample.length);
// now loop through the sound file, shifting the window along.
while (position < (soundfile.length - blockSize)) {
char* window = malloc(blockSize);
memcpy(window, soundfile.bytes + position, blockSize);
// check to see if this is a match
if(!memcmp(sampleByteArray, window, blockSize)) {
// these are the same, now to check if the whole sample is the same
if ((position + sample.length) > soundfile.length) {
// the sample won't fit in the remaining soundfile, so it can't be this!
free(window);
break;
}
if(!memcmp(sampleByteArray, soundfile.bytes + position, sample.length)) {
// this is an entire match, position marks the start in bytes of the sample.
free(window);
returnPosition = position;
break;
}
}
free(window);
position++;
}
free(sampleByteArray);
return returnPosition;
}
It compiles, didn't have time to setup the scenario to check your exact case, but i'm quite confident this may help.
I have a panel data set for which I would like to calculate moving averages across years.
Each year is a variable for which there is an observation for each state, and I would like to create a new variable for the average of every three year period.
For example:
P1947=rmean(v1943 v1944 v1945), P1947=rmean(v1944 v1945 v1946)
I figured I should use a foreach loop with the egen command, but I'm not sure about how I should refer to the different variables within the loop.
I'd appreciate any guidance!
This data structure is quite unfit for purpose. Assuming an identifier id you need to reshape, e.g.
reshape long v, i(id) j(year)
tsset id year
Then a moving average is easy. Use tssmooth or just generate, e.g.
gen mave = (L.v + v + F.v)/3
or (better)
gen mave = 0.25 * L.v + 0.5 * v + 0.25 * F.v
More on why your data structure is quite unfit: Not only would calculation of a moving average need a loop (not necessarily involving egen), but you would be creating several new extra variables. Using those in any subsequent analysis would be somewhere between awkward and impossible.
EDIT I'll give a sample loop, while not moving from my stance that it is poor technique. I don't see a reason behind your naming convention whereby P1947 is a mean for 1943-1945; I assume that's just a typo. Let's suppose that we have data for 1913-2012. For means of 3 years, we lose one year at each end.
forval j = 1914/2011 {
local i = `j' - 1
local k = `j' + 1
gen P`j' = (v`i' + v`j' + v`k') / 3
}
That could be written more concisely, at the expense of a flurry of macros within macros. Using unequal weights is easy, as above. The only reason to use egen is that it doesn't give up if there are missings, which the above will do.
FURTHER EDIT
As a matter of completeness, note that it is easy to handle missings without resorting to egen.
The numerator
(v`i' + v`j' + v`k')
generalises to
(cond(missing(v`i'), 0, v`i') + cond(missing(v`j'), 0, v`j') + cond(missing(v`k'), 0, v`k')
and the denominator
3
generalises to
!missing(v`i') + !missing(v`j') + !missing(v`k')
If all values are missing, this reduces to 0/0, or missing. Otherwise, if any value is missing, we add 0 to the numerator and 0 to the denominator, which is the same as ignoring it. Naturally the code is tolerable as above for averages of 3 years, but either for that case or for averaging over more years, we would replace the lines above by a loop, which is what egen does.
There is a user written program that can do that very easily for you. It is called mvsumm and can be found through findit mvsumm
xtset id time
mvsumm observations, stat(mean) win(t) gen(new_variable) end
I have this line, which shows the minutes and seconds. But I have to add milliseconds to it as well for greater accuracy. How do I add that in this line, or is there an easier way to get the desired result?
#duration = [cd.ExactDuration/60000000, cd.ExactDuration/1000000 % 60].map{|t| t.to_s.rjust(2, '0') }.join(':'))
The exact duration type is saved in microseconds. So the first converts to microseconds to minutes, the second part is microseconds to seconds. Now I need to add milliseconds.
cd.ExactDuration/1000 % 1000 should do the trick.
Of course you may also want to tweak the formatting, since that's a datum you don't want to right-justify in a 2-wide field;-). I'd suggest sprintf for string-formatting, though I realize its use is not really intuitive unless you come from a C background.