How to convert an FFMPEG AVFrame in YUVJ420P to AVFoundation cVPixelBufferRef? - ios

I have an FFMPEG AVFrame in YUVJ420P and I want to convert it to a CVPixelBufferRef with CVPixelBufferCreateWithBytes. The reason I want to do this is to use AVFoundation to show/encode the frames.
I selected kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange and tried converting it since the AVFrame has the data in three planes
Y480 Cb240 Cr240. And according to what I've researched this matches the selected kCVPixelFormatType. By being biplanar I need to convert it into a buffer that contains Y480 and CbCr480 Interleaved.
I tried to create a buffer with 2 planes:
frame->data[0] on the first plane,
frame->data[1] and frame->data[2] interleaved on the second plane.
However, I'm getting return error -6661 (invalid a) from CVPixelBufferCreateWithBytes:
"Invalid function parameter. For example, out of range or the wrong type."
I don't have expertise on image processing at all, so any pointers to documentation that can get me started in the right approach to this problem are appreciated. My C skills aren't top of the line either so maybe I'm making a basic mistake here.
uint8_t **buffer = malloc(2*sizeof(int *));
buffer[0] = frame->data[0];
buffer[1] = malloc(frame->linesize[0]*sizeof(int));
for(int i = 0; i<frame->linesize[0]; i++){
if(i%2){
buffer[1][i]=frame->data[1][i/2];
}else{
buffer[1][i]=frame->data[2][i/2];
}
}
int ret = CVPixelBufferCreateWithBytes(NULL, frame->width, frame->height, kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange, buffer, frame->linesize[0], NULL, 0, NULL, cvPixelBufferSample)
The frame is the AVFrame with the rawData from FFMPEG Decoding.

My C skills aren't top of the line either so maybe im making a basic mistake here.
You're making several:
You should be using CVPixelBufferCreateWithPlanarBytes(). I do not know if CVPixelBufferCreateWithBytes() can be used to create a planar video frame; if so, it will require a pointer to a "plane descriptor block" (I can't seem to find the struct in the docs).
frame->linesize[0] is the bytes per row, not the size of the whole image. The docs are unclear, but the usage is fairly unambiguous.
frame->linesize[0] refers to the Y plane; you care about the UV planes.
Where is sizeof(int) from?
You're passing in cvPixelBufferSample; you might mean &cvPixelBufferSample.
You're not passing in a release callback. The documentation does not say that you can pass NULL.
Try something like this:
size_t srcPlaneSize = frame->linesize[1]*frame->height;
size_t dstPlaneSize = srcPlaneSize *2;
uint8_t *dstPlane = malloc(dstPlaneSize);
void *planeBaseAddress[2] = { frame->data[0], dstPlane };
// This loop is very naive and assumes that the line sizes are the same.
// It also copies padding bytes.
assert(frame->linesize[1] == frame->linesize[2]);
for(size_t i = 0; i<srcPlaneSize; i++){
// These might be the wrong way round.
dstPlane[2*i ]=frame->data[2][i];
dstPlane[2*i+1]=frame->data[1][i];
}
// This assumes the width and height are even (it's 420 after all).
assert(!frame->width%2 && !frame->height%2);
size_t planeWidth[2] = {frame->width, frame->width/2};
size_t planeHeight[2] = {frame->height, frame->height/2};
// I'm not sure where you'd get this.
size_t planeBytesPerRow[2] = {frame->linesize[0], frame->linesize[1]*2};
int ret = CVPixelBufferCreateWithPlanarBytes(
NULL,
frame->width,
frame->height,
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange,
NULL,
0,
2,
planeBaseAddress,
planeWidth,
planeHeight,
planeBytesPerRow,
YOUR_RELEASE_CALLBACK,
YOUR_RELEASE_CALLBACK_CONTEXT,
NULL,
&cvPixelBufferSample);
Memory management is left as an exercise to the reader, but for test code you might get away with passing in NULL instead of a release callback.

Related

How do I interpret an AudioBuffer and get the power?

I am trying to make a volume-meter for my app, which will show while recording a video. I have found a lot of support for such meters for iOS, but mostly for AVAudioPlayer, which is no option for me. I am using AVCaptureSession to record, and will then end up with the delegate method shown below:
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
CMFormatDescriptionRef formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer);
CFRetain(sampleBuffer);
CFRetain(formatDescription);
if(connection == audioConnection)
{
CMBlockBufferRef blockBuffer;
AudioBufferList audioBufferList;
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer,
NULL, &audioBufferList, sizeof(AudioBufferList), NULL, NULL,
kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
&blockBuffer);
SInt16 *data = audioBufferList.mBuffers[0].mData;
}
//Releases etc..
}
(Only showing relevant code)
Of what I understand, I receive a 'sample buffer', containing either audio or video. Once I've verified that the connection indeed is audio, then I 'extract' the audioBufferList from the buffer, and I am sitting here left with a list of one (or more?) audioBuffers. The actual data is, as I understand, represented as SInt16, or '16 bits signed integer', which as far as I understand has a range from -32,768 to 32,767. However, if I simply print out this received value, I get A LOT of bouncing numbers. When in "silence" I get values bouncing rapidly between -200 and 200, and when there's noise I get values from -4,000 to 13,000, completely out of order.
As I've understood from reading, the value 0 will represent silence. However, I do not understand the difference between negative and positive values, as well as I do not know if the are able to reach all the way up/down to +-32,768.
I believe I need a percentage of how 'loud' it is, but have been unable to find anything.
I have read a couple of tutorials and references on the matter, but nothing makes sense to me. I followed one guide by doing this(appending to the code above, inside the if):
float accumulator = 0;
for(int i = 0; i < audioBufferList.mBuffers[0].mDataByteSize; i++)
accumulator += data[i] * data[i];
float power = accumulator / audioBufferList.mBuffers[0].mDataByteSize;
float decibels = log10f(power);
NSLog(#"%f", decibels);
Apparently, this code was supposed to align from -1 to +1, but that did not happen. I am now getting values around 6.194681 when silence, and 7.773492 for some noise. This is feels like the correct 'range', but in the 'wrong place'. I can't simply subtract 7 from the number and assume I'm between -1 and +1. There should be some logic and science behind how this should work, but I do not know enough about how digital audio works.
Does anyone know the logic behind this? Is 0 always silence while -32,768 and 32,767 are loud noises? Can I then simply multiply all negative values by -1 to always get positive values, and then find out how many percent they are at (between 0 and 32767)? Somehow, I don't believe this will work, as I guess there is a reason for the negative values.. I'm not completely sure what to try.
The code in your question is wrong in several ways. This code is trying to copy that from the article below, but you've not handled it properly converting from the float-based code in the article to 16-bit integer math. You're also looping on the wrong number of values (max i) and will end up pulling in garbage data. So this is all kinds of wrong.
https://www.mikeash.com/pyblog/friday-qa-2012-10-12-obtaining-and-interpreting-audio-data.html
The code in the article is correct. Here's what it is, expanded a bit. This is only looking at the first buffer in a 32-bit float buffer list.
float accumulator = 0;
AudioBuffer buffer = bufferList->mBuffers[0];
float * data = (float *)buffer.mData;
UInt32 numSamples = buffer.mDataByteSize / sizeof(float);
for (UInt32 i = 0; i < numSamples; i++) {
accumulator += data[i] * data[i];
}
float power = accumulator / (float)numSamples;
float decibels = 10 * log10f(power);
As the article says, the result here is decibels uses 0dB reference. eg, 0.0 is the maximum value. This is the same thing that AVAudioPlayer's averagePowerForChannel returns for example.
To use this in your 16-bit integer context, you'd need to a) loop appropriately through each 16-bit sample, b) convert the data[i] value from a 16-bit integer to a floating point value in the [-1.0, 1.0] range before squaring and adding to the accumulator.

Update directX texture

How can I solve following task: some app need to
use dozens dx9 terxtures (render them with dx3d)
and
update some of them (whole or in part).
I.e. sometimes (once per frame/second/minute) i need to write bytes (void *) in different formats (argb, bgra, rgb, 888, 565) to some sub-rect of existing texture.
In openGL solution is very simple - glTexImage2D. But here unfamiliar platform features completely confused me.
Interested in solution for both dx9 and dx11.
To update a texture, make sure the texture is created in D3DPOOL_MANAGED memory pool.
D3DXCreateTexture( device, size.x, size.y, numMipMaps,usage, textureFormat, D3DPOOL_MANAGED, &texture );
Then call LockRect to update the data
RECT rect = {x,y,z,w}; // the dimensions you want to lock
D3DLOCKED_RECT lockedRect = {0}; // "out" parameter from LockRect function below
texture->LockRect(0, &lockedRect, &rect, 0);
// copy the memory into lockedRect.pBits
// make sure you increment each row by "Pitch"
unsigned char* bits = ( unsigned char* )lockedRect.pBits;
for( int row = 0; row < numRows; row++ )
{
// copy one row of data into "bits", e.g. memcpy( bits, srcData, size )
...
// move to the next row
bits += lockedRect.Pitch;
}
// unlock when done
texture->UnlockRect(0);

View GPU Memory / View Texture2D memory space for debugging

I've got a question about a PixelShader I am trying to implement, and what I currently do (this is just for debugging, and trying to figure stuff out):
int3 loc;
loc.x = (int)(In.TextureUV.x * resolution_XY.x);
loc.y = (int)(In.TextureUV.x * resolution_XY.x);
loc.z = 0;
float4 r = g_txDiffuse.Load(loc);
return float4(r.x, r.y, r.z, 1);
The point is, this is always 0,0,0,1
The texture buffer is created:
D3D11_TEXTURE2D_DESC tDesc;
tDesc.Height = 480;
tDesc.Width = 640;
tDesc.Usage = D3D11_USAGE_DYNAMIC;
tDesc.MipLevels = 1;
tDesc.ArraySize = 1;
tDesc.SampleDesc.Count = 1;
tDesc.SampleDesc.Quality = 0;
tDesc.Format = DXGI_FORMAT_R8_UINT;
tDesc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
tDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
tDesc.MiscFlags = 0;
V_RETURN(pd3dDevice->CreateTexture2D(&tDesc, NULL, &g_pCurrentImage));
I upload the texture (which should be a live display at the end) via:
D3D11_MAPPED_SUBRESOURCE resource;
pd3dImmediateContext->Map(g_pCurrentImage, 0, D3D11_MAP_WRITE_DISCARD, 0, &resource);
memcpy( resource.pData, g_Images.GetData(), g_Images.GetDataSize() );
pd3dImmediateContext->Unmap( g_pCurrentImage, 0 );
I've checked the resource.pData, the data in there is a valid 8bit monochrome image. I made sure the data coming from the camera is 8bit monochrome 640x480.
There's a few things I don't fully understand:
if I run the Map / memcpy / Unmap routine in every frame, the driver will ultimately crash, the system will be unresponsive. Is there a different way to update a complete texture every frame which should be done?
the texture I uploaded is 8bit, why is the Texture2D.load() a float4 return? Do I have to use a different method to access the texture data? I tried to .sample it, but that didn't work either. Would I have to use a int buffer or something instead?
is there a way to debug the GPU memory, to check if the memcpy worked in the first place?
The Map, memcpy, Unmap really ought not to crash unless2 you are trying to copy too much data into the texture. It would be interesting to know what "GetDataSize()" returns. Does it equal 307,200? If its more than that then there lies your problem.
Texture2D returns a float4 because thats what you've asked for. If you write float r = g_txDiffuse.Load( ... ). The 8-bits get extended to a normalised float as part of the load process. Are you sure, btw, that your calculation of "loc" is correct because as you have it now loc.x and loc.y will always be the same.
You can debug whats going on with DirectX using PIX. Its a great tool and I highly recommend you familiarise yourself with it.

Using cvGet2D OpenCV function

I'm trying to get information from an image using the function cvGet2D in OpenCV.
I created an array of 10 IplImage pointers:
IplImage *imageArray[10];
and I'm saving 10 images from my webcam:
imageArray[numPicture] = cvQueryFrame(capture);
when I call the function:
info = cvGet2D(imageArray[0], 250, 100);
where info:
CvScalar info;
I got the error:
OpenCV Error: Bad argument (unrecognized or unsupported array type) in cvPtr2D, file /build/buildd/opencv-2.1.0/src/cxcore/cxarray.cpp, line 1824
terminate called after throwing an instance of 'cv::Exception'
what(): /build/buildd/opencv-2.1.0/src/cxcore/cxarray.cpp:1824: error: (-5) unrecognized or unsupported array type in function cvPtr2D
If I use the function cvLoadImage to initialize an IplImage pointer and then I pass it to the cvGet2D function, the code works properly:
IplImage* imagen = cvLoadImage("test0.jpg");
info = cvGet2D(imagen, 250, 100);
however, I want to use the information already stored in my array.
Do you know how can I solve it?
Even though its a very late response, but I guess someone might be still searching for the solution with CvGet2D. Here it is.
For CvGet2D, we need to pass the arguments in the order of Y first and then X.
Example:
CvScalar s = cvGet2D(img, Y, X);
Its not mentioned anywhere in the documentation, but you find it only inside core.h/ core_c.h. Try to go to the declaration of CvGet2D(), and above the function prototypes, there are few comments that explain this.
Yeah the message is correct.
If you want to store a pixel value you need to do something like this.
int value = 0;
value = ((uchar *)(img->imageData + i*img->widthStep))[j*img->nChannels +0];
cout << "pixel value for Blue Channel and (i,j) coordinates: " << value << endl;
Summarizing, to plot or store data you must create an integer value (pixel value varies between 0 and 255). But if you only want to test pixel value (like in an if closure or something similar) you can access directly to pixel value without using an integer value.
I think thats a little bit weird when you start but when you work with it 2 o 3 times you will work without difficulties.
Sorry, cvGet2D is not the best way to obtain pixel value. I know its the shortest and clear way because you in only one line of code and knowing coordinates obtain the pixel value.
I suggest you this option. When you see this code you you wiil think that is so complicated but is more effecient.
int main()
{
// Acquire the image (I'm reading it from a file);
IplImage* img = cvLoadImage("image.bmp",1);
int i,j,k;
// Variables to store image properties
int height,width,step,channels;
uchar *data;
// Variables to store the number of white pixels and a flag
int WhiteCount,bWhite;
// Acquire image unfo
height = img->height;
width = img->width;
step = img->widthStep;
channels = img->nChannels;
data = (uchar *)img->imageData;
// Begin
WhiteCount = 0;
for(i=0;i<height;i++)
{
for(j=0;j<width;j++)
{ // Go through each channel of the image (R,G, and B) to see if it's equal to 255
bWhite = 0;
for(k=0;k<channels;k++)
{ // This checks if the pixel's kth channel is 255 - it can be faster.
if (data[i*step+j*channels+k]==255) bWhite = 1;
else
{
bWhite = 0;
break;
}
}
if(bWhite == 1) WhiteCount++;
}
}
printf("Percentage: %f%%",100.0*WhiteCount/(height*width));
return 0;
This code count white pixels and gives you a percetage of white pixels in the image.

SDL_Surface to IDirect3DTexture

Can anybody help with converting an SDL_Surface object, a texture loaded from a file, into an IDirect3DTexture9 object.
I honestly don't know why you would ever want to do this. It sounds like a truly horrible idea for a variety of reasons, so please tell us why you want to do this so we can convince you not to ;).
In the meanwhile, a quick overview of how you'd go about it:
IDirect3DTexture9* pTex = NULL;
HRESULT hr = S_OK;
hr = m_d3dDevice->CreateTexture(
surface->w,
surface->h,
1,
usage,
format,
D3DPOOL_MANAGED,
&pTex,
NULL);
This creates the actual texture with the size and format of the SDL_Surface. You'll have to fill in the usage on your own, depending on how you want to use it (see D3DUSAGE). You'll also have to figure out the format on your own - you can't directly map a SDL_PixelFormat to a D3DFORMAT. This won't be easy, unless you know exactly what pixel format your SDL_Surface is.
Now, you need to write the data into the texture. You can't use straight memcpy here, since the SDL_Surface and the actual texture may have different strides. Here's some untested code that may do this for you:
HRESULT hr;
D3DLOCKED_RECT lockedRect;
// lock the texture, so that we can write into it
// Note: if you used D3DUSAGE_DYNAMIC above, you should
// use D3DLOCK_DISCARD as the flags parameter instead of 0.
hr = pTex->LockRect(0, &lockedRect, NULL, 0);
if(SUCCEEDED(hr))
{
// use char pointers here for byte indexing
char* src = (char*) surface->pixels;
char* dst = (char*) lockedRect->pBits;
size_t numRows = surface->h;
size_t rowSize = surface->w * surface->format->BytesPerPixel;
// for each row...
while(numRows--)
{
// copy the row
memcpy(dst, src, rowSize);
// use the given pitch parameters to advance to the next
// row (since these may not equal rowSize)
src += surface->pitch;
dst += lockedRect->Pitch;
}
// don't forget this, or D3D won't like you ;)
hr = pTex->UnlockRect(0);
}

Resources