Per macroblock encoding in libx264 - video-encoding

I know that in x264 encoding, the process is going on with the unit of macroblock. However, is that possible to set the parameters for each macroblocks? For example, if I want to let the QP of some specific area to be smaller than others. Is that possible? If I need to modify the functions and Apis in libx264, where should I begin?

If the only thing you want to change per macroblock is QP than yes it is possible. And no, you don't need to change libx264 API for this. For such things like ROI (Region of Interest) there is quant_offsets in field in x264_image_properties_t struct (prop in x264_picture_t struct of pic_in for x264_encoder_encode call). You can read more how to use it in comments of x264.h

Related

AudioConverterRef with different number of channels

I have an audio format with 4 channels that I want to convert into 2 channels format. kAudioConverterChannelMap can only discard the extra inputs:
When In > Out, the first Out inputs are routed to the first Out outputs, and the remaining inputs are discarded.
Is it possible to specify channel map with kAudioConverterChannelMap to merge 1 + 2 channel into one and 3 + 4 channels into the second of the output format?
If not, what should I use for this conversion?
AFAIK Apple does not provide an API that reduces the channel count of an audio stream, apart from discarding extra channels.
Why? I guess because how you do this is such a personal choice that if they did think about it at all they may have come to the conclusion that there was no one approach would make everyone happy and so best instead to choose the approach that would make everyone angry: discard.
So what's the big deal? I think it's mainly to do with the "meaning" of the individual channels. Consider the "obviously simple" case of converting stereo to mono - it's surprisingly nuanced, depending on how the stereo audio was recorded. And that's in an ideal case where your 2-channel audio is actually declared to be "stereo" (e.g. via an AudioChannelLayoutTag like kAudioChannelLayoutTag_Stereo).
So even "tagged" multi-channel audio is terribly ambiguous, so the only person who knows how to interpret and reduce the channel count of your 4 channel audio is you.
That said, would it have killed them to provide a function that summed even channels with even and odd with odd using some Accelerate/vDSP array functions, like vDSP_vadd? Because that's probably what you want.

Which method in lua could I use to most effectively get the hertz frequency of a certain position in a .wav file?

So I want to be able to convert a .wav file to a json table using lua, which would probably include something like {time="0:39.34",hz=440} or something. I already have all my json libraries but I just need a method to be able to convert a .wav file into something that I could use to convert it into json. If there's already a library that can do this then I need the source code of the library to be able to implement it into my code for a single-file program.
At each point in wav you'll have a full spectrum, not just "the hertz frequency". You'll have to perform a fourier transform on the data, and from many peaks in spectrum select the one you're interested in - be it fundamental or dominant, etc.
There are libs for Fast Fourier Transform out there, like LuaFFT, but you'd better get more clear picture of what you really need from the WAV. If you're just trying to read DTMF signal, you don't really need the full scale spectrum analysis.

Get some information from HEVC reference software

I am new to HEVC and I am understanding the reference software now (looking at intra prediction right now).
I need to get information as below after encoding.
the CU structure for a given CTU
for each CU during calculations, it's information (eg. QP value, selected mode for Luma, selected mode for chroma, whether the CU is in final CU structure of the CTU-split decision, etc.)
I know CTU decision are made when m_pcCuEncoder->compressCtu( pCtu ) is called in TEncSlice.cpp. But where exactly I can get these specific information? Can someone help me with this?
p.s. I am learning C++ too (I have a Java background).
EDIT: This post is a solution for the encoder side. However, the decoder side solution is far less complex.
Getting CTU information (partitioning etc.) is a bit tricky at encoder if you are new to the code. But I try to help you with it.
Everything that I am going to tell you is based on the JEM code and not HM, but I am pretty sure that you can apply them to HM too.
As you might have noticed, there are two completely separate phases for compression/encoding of each CTU:
The RDO phase: first there is the Rate-Distortion Optimization loop to "make the decisions". In this phase, literally all possible combinations of the parameters are tested (e.g. differetn partitionings, intra modes, filters etc.). At the end of this phase the RDO determines the best combination and passes them to the second phase.
The encoding phase: Here the encoder does the actual final encoding step. This includes writing all the bins into the bitstream, based on the parameters determined during the RDO phase.
In the CTU level, these two phases are performed by the m_pcCuEncoder->compressCtu( pCtu ) and the m_pcCuEncoder->encodeCtu( pCtu ) functions, respectively, both in the compressSlice() function of the TEncSlice.cpp file.
Given the above information, you must look for what you are looking for, in the second phase and not the first phase (you may already know these things, but I suspected that you might be looking at the first phase).
So, now this is my suggestion for getting your information. It's not the best way to do it, but is easier to explain here.
You first go to this point in your HM code:
compressGOP() -> encodeSlice() -> encodeCtu() -> xEncodeCU()
Then you find the line where the prediction mode (intra/inter) is encoded:
m_pcEntropyCoder->encodePredMode()
At this point, you have access to the pcCU object which contains all the final decisions, including the information you look for, that are made during the first phase. At this point of the code, you are dealing with a single CU and not the entire CTU. But if you want your information for the entire CTU, you may go back to
compressGOP() -> encodeSlice() -> encodeCtu()
and find the line where the xEncodeCU() function is called for the first time. There, you will have access to the pCtu object.
Reminder: each TComDataCU object (pcCU if you are in the CU level, or pCtu if you are in the CTU level) of size WxH is split to NumPartition=(W/4)x(H/4) partitions of size 4x4. Each partition is accessible by an index (uiAbsPartIdx) which indicates its Z-scan order. For example, the uiAbsPartIdx for the partition at <x=8,y=0> is 4.
Now, you do the following steps:
Get the number of partitions (NumPartition) within your pCtu by calling pCtu->getTotalNumPart().
Loop over all NumPartition partitions and call the functions pCtu->getWidth(idx), pCtu->getHeight(idx), pCtu->getCUPelX(idx) and pCtu->getCUPelY(), where idx is your loop iterator. These functions return the following information for each CU coincided with the 4x4 partition at idx: width, height, positionX, positionY. [both positions are relative to the pixel <0,0> of the frame]
The above information is enough for deriving the CTU partitioning of the current pCtu! So the last step is to write a piece of code to do that.
This was an example of how to extract CTU partitioning information during the second phase (i.e. encoding phase). However, you may call some proper functions to get the other information in your second question. For example, to get selected luma intra mode, you may call pCtu->getIntraDir(CHANNEL_TYPE_LUMA, idx), instead of getWidth()/getHeight() functions. Or pCtu->getQP(CHANNEL_TYPE_LUMA, idx) to get the QP value.
You can always find a list of functions that provide useful information at the pCtu level, in the TComDataCU class (TComDataCU.cpp).
I hope this helps you. If not, let me know!
Good luck,

Scan video for text string?

My goal is to find the title screen from a movie trailer. I need a service where I can search a video for a string, then return the frame with that string. Pretty obscure, does anything like this exist?
e.g. for this movie, I'd scan for "Sausage Party" and retrieve this frame:
Edit: I found the cloudsight api which would actually work except cost is prohibitive # $.04 per call assuming I need to split the video into 1s intervals and scan every image (at least 60 calls per video).
No exact service that I can find, but you could attempt to do this yourself...
ffmpeg -i sausage_party.mp4 -r 1 %04d.png
/usr/local/bin/parallel --no-notice -j 8 \
/usr/local/bin/tesseract -psm 6 -l eng {} {.} \
::: *.png
This extracts one frame a second from the video file, and then uses tesseract to extract the text via OCR into files of the same name as the image frame (eg. 0135.txt. However your results are going to vary massively depending on the font used and the quality of the video file.
You'd probably find it cheaper/easier to use something like Amazon Mechanical Turk , especially since the OCR is going to have a hard time doing this automatically.
Another option could be implementing this service by yourself using the Scene Text Detection and Recognition module in OpenCV (docs.opencv.org/3.0-beta/modules/text/doc/text.html). You can take a look at this video to get an idea of how such a system would operate. As pointed out above the accuracy would depend on the font used in the movie titles, the quality of the video files, and the OCR.
OpenCV relies on Tesseract as the underlying OCR but, alternatively, you could use the text detection and localization functions (docs.opencv.org/3.0-beta/modules/text/doc/erfilter.html) in OpenCV to find text areas in the image and then employ a different OCR to perform the recognition. The text detection and localization stage can be done very quickly thus achieving real time performance would be mostly a matter of picking a fast OCR.

Set sampling algo

Im trying to create a thumbnail from a jpg image using PythonMagick:
img.quality(90)
# TODO set the sampling algo e.g. bilinear
img.sample(Geometry(scaledWid, scaledHei))
img.crop(Geometry(THUMBNAIL_WID-1, THUMBNAIL_HEI-1,cropLeft, cropTop))
img.write(destFilePath)
How do I set the sampling algo to be used? I believe right now it is using nearest neighbor which kinda looks ugly.
The resizing filter is set via a property on the image set before calling .resize() or, I imagine, .sample().
The property is .filterType() and is set from one of the enumerated values on PythonMagick.FilterTypes - which is the same list as the Magick++ FilterTypes.
(NB. There is never any documentation for PythonMagick, but as it's essentially just a wrapper for Magick++, just use the API documentation for that.)
So try (untested, I don't use Python):
img.quality(90)
img.filterType(PythonMagick.FilterTypes.SincFilter)
img.sample(Geometry(scaledWid, scaledHei))
img.crop(Geometry(THUMBNAIL_WID-1, THUMBNAIL_HEI-1,cropLeft, cropTop))
img.write(destFilePath)
Note the default filter is usually LanczosFilter, or at least it is supposed to be, which is pretty good.

Resources