I'm now using Videotoolbox to deal with h.264 encoding.
And I found a sample code and it works fine :
#define VTB_HEIGHT 480
#define VTB_WIDTH 640
int bitRate = VTB_WIDTH * VTB_HEIGHT * 3 * 4 * 8;
CFNumberRef bitRateRef = CFNumberCreate(kCFAllocatorDefault,
kCFNumberSInt32Type,
&bitRate);
VTSessionSetProperty(encodingSession,
kVTCompressionPropertyKey_AverageBitRate,
bitRateRef);
CFRelease(bitRateRef);
int bitRateLimit = bitRate / 8;
CFNumberRef bitRateLimitRef = CFNumberCreate(kCFAllocatorDefault,
kCFNumberSInt32Type,
&bitRateLimit);
VTSessionSetProperty(encodingSession,
kVTCompressionPropertyKey_DataRateLimits,
bitRateLimitRef);
CFRelease(bitRateLimitRef);
But these two lines, I don't understand:
int bitRate = VTB_WIDTH * VTB_HEIGHT * 3 * 4 * 8;
int bitRateLimit = bitRate / 8;
What's the right way to use them?
hope someone can tell me.
Thanks for your time!
From the document of kvtcompressionpropertykey_dataratelimits say:
Each hard limit is described by a data size in bytes and a
duration in seconds...
So you need set this property with 2 parameters(data size in bytes, duration in seconds)
int bitRate = VTB_WIDTH * VTB_HEIGHT * 3 * 4 * 8;
int bitRateLimit = bitRate / 8;
// that's say we set data in byte/second
CFNumberRef byteNum = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt32Type, &bitRateLimit);
int second = 1;
CFNumberRef secNum = CFNumberCreate(kCFAllocatorDefault, kCFNumberSInt32Type, &second);
// add parameters into a array
const void* numbers[2] = {byteNum, secNum};
CFArrayRef dataRateLimits = CFArrayCreate(NULL, numbers, 2, &kCFTypeArrayCallBacks);
// then set property with array
status = VTSessionSetProperty(compressionSession, kVTCompressionPropertyKey_DataRateLimits, arrayValues);
Related
i want to convert pixelBuffer from BGRA to YUV(420V).
Using the convert function, most of the videos in my mobile phone photo albums are running normally ,
Execpt the one video from my colleagues, after converted the pixels are insanity,
the video from my colleagues is quite normal,
Video
ID : 1
Format : AVC
Format/Info : Advanced Video Codec
Format profile : Main#L3.1
Format settings : CABAC / 1 Ref Frames
Format settings, CABAC : Yes
Format settings, Reference frames : 1 frame
Format settings, GOP : M=1, N=15
Codec ID : avc1
Codec ID/Info : Advanced Video Coding
Duration : 6 s 623 ms
Source duration : 6 s 997 ms
Bit rate : 4 662 kb/s
Width : 884 pixels
Clean aperture width : 884 pixels
Height : 492 pixels
Clean aperture height : 492 pixels
Display aspect ratio : 16:9
Original display aspect ratio : 16:9
Frame rate mode : Variable
Frame rate : 57.742 FPS
Minimum frame rate : 20.000 FPS
Maximum frame rate : 100.000 FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.186
Stream size : 3.67 MiB (94%)
Source stream size : 3.79 MiB (97%)
Title : Core Media Video
Encoded date : UTC 2021-10-29 09:54:03
Tagged date : UTC 2021-10-29 09:54:03
Color range : Limited
Color primaries : Display P3
Transfer characteristics : BT.709
Matrix coefficients : BT.709
Codec configuration box : avcC
this is my function, i do not know what is wrong.
CFDictionaryRef CreateCFDictionary(CFTypeRef* keys, CFTypeRef* values, size_t size) {
return CFDictionaryCreate(kCFAllocatorDefault,
keys,
values,
size,
&kCFTypeDictionaryKeyCallBacks,
&kCFTypeDictionaryValueCallBacks);
}
static void bt709_rgb2yuv8bit_TV(uint8_t R, uint8_t G, uint8_t B, uint8_t &Y, uint8_t &U, uint8_t &V)
{
Y = 0.183 * R + 0.614 * G + 0.062 * B + 16;
U = -0.101 * R - 0.339 * G + 0.439 * B + 128;
V = 0.439 * R - 0.399 * G - 0.040 * B + 128;
}
CVPixelBufferRef RGB2YCbCr8Bit(CVPixelBufferRef pixelBuffer)
{
CVPixelBufferLockBaseAddress(pixelBuffer, 0);
uint8_t *baseAddress = (uint8_t *)CVPixelBufferGetBaseAddress(pixelBuffer);
int w = (int) CVPixelBufferGetWidth(pixelBuffer);
int h = (int) CVPixelBufferGetHeight(pixelBuffer);
// int stride = (int) CVPixelBufferGetBytesPerRow(pixelBuffer) / 4;
OSType pixelFormat = kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange;
CVPixelBufferRef pixelBufferCopy = NULL;
const size_t attributes_size = 1;
CFTypeRef keys[attributes_size] = {
kCVPixelBufferIOSurfacePropertiesKey,
};
CFDictionaryRef io_surface_value = CreateCFDictionary(nullptr, nullptr, 0);
CFTypeRef values[attributes_size] = {io_surface_value};
CFDictionaryRef attributes = CreateCFDictionary(keys, values, attributes_size);
CVReturn status = CVPixelBufferCreate(kCFAllocatorDefault,
w,
h,
pixelFormat,
attributes,
&pixelBufferCopy);
if (status != kCVReturnSuccess) {
std::cout << "YUVBufferCopyWithPixelBuffer :: failed" << std::endl;
return nullptr;
}
if (attributes) {
CFRelease(attributes);
attributes = nullptr;
}
CVPixelBufferLockBaseAddress(pixelBufferCopy, 0);
size_t y_stride = CVPixelBufferGetBytesPerRowOfPlane(pixelBufferCopy, 0);
size_t uv_stride = CVPixelBufferGetBytesPerRowOfPlane(pixelBufferCopy, 1);
int plane_h1 = (int) CVPixelBufferGetHeightOfPlane(pixelBufferCopy, 0);
int plane_h2 = (int) CVPixelBufferGetHeightOfPlane(pixelBufferCopy, 1);
uint8_t *y = (uint8_t *) CVPixelBufferGetBaseAddressOfPlane(pixelBufferCopy, 0);
memset(y, 0x80, plane_h1 * y_stride);
uint8_t *uv = (uint8_t *) CVPixelBufferGetBaseAddressOfPlane(pixelBufferCopy, 1);
memset(uv, 0x80, plane_h2 * uv_stride);
int y_bufferSize = w * h;
int uv_bufferSize = w * h / 4;
uint8_t *y_planeData = (uint8_t *) malloc(y_bufferSize * sizeof(uint8_t));
uint8_t *u_planeData = (uint8_t *) malloc(uv_bufferSize * sizeof(uint8_t));
uint8_t *v_planeData = (uint8_t *) malloc(uv_bufferSize * sizeof(uint8_t));
int u_offset = 0;
int v_offset = 0;
uint8_t R, G, B;
uint8_t Y, U, V;
for (int i = 0; i < h; i ++) {
for (int j = 0; j < w; j ++) {
int offset = i * w + j;
B = baseAddress[offset * 4];
G = baseAddress[offset * 4 + 1];
R = baseAddress[offset * 4 + 2];
bt709_rgb2yuv8bit_TV(R, G, B, Y, U, V);
y_planeData[offset] = Y;
//隔行扫描 偶数行的偶数列取U 奇数行的偶数列取V
if (j % 2 == 0) {
(i % 2 == 0) ? u_planeData[u_offset++] = U : v_planeData[v_offset++] = V;
}
}
}
for (int i = 0; i < plane_h1; i ++) {
memcpy(y + i * y_stride, y_planeData + i * w, w);
if (i < plane_h2) {
for (int j = 0 ; j < w ; j+=2) {
//NV12 和 NV21 格式都属于 YUV420SP 类型。它也是先存储了 Y 分量,但接下来并不是再存储所有的 U 或者 V 分量,而是把 UV 分量交替连续存储。
//NV12 是 IOS 中有的模式,它的存储顺序是先存 Y 分量,再 UV 进行交替存储。
memcpy(uv + i * y_stride + j, u_planeData + i * w/2 + j/2, 1);
memcpy(uv + i * y_stride + j + 1, v_planeData + i * w/2 + j/2, 1);
}
}
}
free(y_planeData);
free(u_planeData);
free(v_planeData);
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
CVPixelBufferUnlockBaseAddress(pixelBufferCopy, 0);
return pixelBufferCopy;
}
pixelBuffer BGRA is normal
pixelBuffer YUV insanity
In the video metadata there is a line Color space: YUV It looks like that this video isn't BGRA
When you calculate source pixel you must use stride (length of image row in bytes) instead of width because distance between rows in image may be bigger than width * pixel_size_in_bytes. I recommend to check this case on images with odd width.
int offset = i * stride + j;
You already has it commented at the beginning of function:
int stride = (int) CVPixelBufferGetBytesPerRow(pixelBuffer) / 4;
It is better to use builtin functions for converting images. Here is an example from one of my projects:
vImage_CGImageFormat out_cg_format = CreateVImage_CGImageFormat( target_pixel_format );
CGColorSpaceRef color_space = CGColorSpaceCreateDeviceRGB();
vImageCVImageFormatRef in_cv_format = vImageCVImageFormat_Create(
MSPixFmt_to_CVPixelFormatType(source_pixel_format),
kvImage_ARGBToYpCbCrMatrix_ITU_R_601_4,
kCVImageBufferChromaLocation_Center,
color_space,
0 );
CGColorSpaceRelease(color_space);
CGColorSpaceRelease(out_cg_format.colorSpace);
vImage_Error err = kvImageNoError;
vImageConverterRef converter = vImageConverter_CreateForCVToCGImageFormat(in_cv_format, &out_cg_format, NULL, kvImagePrintDiagnosticsToConsole, &err);
vImage_Buffer src_planes[4] = {{0}};
vImage_Buffer dst_planes[4] = {{0}};
unsigned long source_plane_count = vImageConverter_GetNumberOfSourceBuffers(converter);
for( unsigned int i = 0; i < source_plane_count; i++ )
{
src_planes[i] = (vImage_Buffer){planes_in[i], pic_size.height, pic_size.width, strides_in[i]};
}
unsigned long target_plane_count = vImageConverter_GetNumberOfDestinationBuffers(converter);
for( unsigned int i = 0; i < target_plane_count; i++ )
{
dst_planes[i] = (vImage_Buffer){planes_out[i], pic_size.height, pic_size.width, strides_out[i]};
}
err = vImageConvert_AnyToAny(converter, src_planes, dst_planes, NULL, kvImagePrintDiagnosticsToConsole);
I am implementing a nearest neighborhood kernel function to resize the input image. But the result is wrong and I have no idea.
Here is the input image
the result is wrong.
I use opencv to read the input image.
cv::Mat image = cv::imread("/home/tumh/test.jpg");
unsigned char* data = image.data;
int outH, outW;
float *out_data_host = test(data, image.rows, image.cols, outH, outW);
cv::Mat out_image(outH, outW, CV_32FC3);
memcpy(out_image.data, out_data_host, outH * outW * 3 * sizeof(float));
float* test(unsigned char* in_data_host, const int &inH, const int &inW, int &outH, int &outW) {
// get the output size
int im_size_min = std::min(inW, inH);
int im_size_max = std::max(inW, inH);
float scale_factor = static_cast<float>(640) / im_size_min;
float im_scale_x = std::floor(inW * scale_factor / 64) * 64 / inW;
float im_scale_y = std::floor(inH * scale_factor / 64) * 64 / inH;
outW = inW * im_scale_x;
outH = inH * im_scale_y;
int channel = 3;
unsigned char* in_data_dev;
CUDA_CHECK(cudaMalloc(&in_data_dev, sizeof(unsigned char) * channel * inH * inW));
CUDA_CHECK(cudaMemcpy(in_data_dev, in_data_host, 1 * sizeof(unsigned char) * channel * inH * inW, cudaMemcpyHostToDevice));
// image pre process
const float2 scale = make_float2( im_scale_x, im_scale_y);
float * out_buffer = NULL;
CUDA_CHECK(cudaMalloc(&out_buffer, sizeof(float) * channel * outH * outW));
float *out_data_host = new float[sizeof(float) * channel * outH * outW];
const dim3 threads(32, 32);
const dim3 block(iDivUp(outW, threads.x), iDivUp(outW, threads.y));
gpuPreImageNet<<<block, threads>>>(scale, in_data_dev, inW, out_buffer, outW, outH);
CUDA_CHECK(cudaFree(in_data_dev));
CUDA_CHECK(cudaMemcpy(out_data_host, out_buffer, sizeof(float) * channel * outH * outW, cudaMemcpyDeviceToHost));
CUDA_CHECK(cudaFree(out_buffer));
return out_data_host;
}
Here is the resize kernel function
__global__ void gpuPreImageNet( float2 scale, unsigned char* input, int iWidth, float* output, int oWidth, int oHeight )
{
const int x = blockIdx.x * blockDim.x + threadIdx.x;
const int y = blockIdx.y * blockDim.y + threadIdx.y;
const int n = oWidth * oHeight;
int channel = 3;
if( x >= oWidth || y >= oHeight )
return;
const int dx = ((float)x * scale.x);
const int dy = ((float)y * scale.y);
const unsigned char* px = input + dy * iWidth * channel + dx * channel ;
const float3 bgr = make_float3(*(px + 0), *(px + 1), *(px + 2));
output[channel * y * oWidth + channel * x + 0] = bgr.x;
output[channel * y * oWidth + channel * x + 1] = bgr.y;
output[channel * y * oWidth + channel * x + 2] = bgr.z;
}
Most of the implementation is from https://github.com/soulsheng/ResizeNN/blob/master/resizeCUDA/resizeNN.cu
Any idea?
Maybe you are observing an uninitialized memory problem.
As i understand your code, out_data_host allocation is too big
new float[sizeof(float) * channel * outH * outW];
should be
new float[channel * outH * outW]
Then out_buffer is uninitialized, add a cudaMemset after the cudaMalloc line.
To clarify your code, since you already use OpenCV to load images, why don't you use opencv to resize your images ?
cv::resize // Host side method is probably better since you'll have less data copied through PCI-Express
// or
cv::cuda::resize
It took me around two days to figure out a solution for this problem. Basically, I was building a GPU based image preprocessing pipeline for my project. Here's the custom Cuda Kernel.
For Gray scale Image Resizing, change channel from 3 -> 1 and it should work.
__global__ void resize_kernel( real* pIn, real* pOut, int widthIn, int heightIn, int widthOut, int heightOut)
{
int i = blockDim.y * blockIdx.y + threadIdx.y;
int j = blockDim.x * blockIdx.x + threadIdx.x;
int channel = 3;
if( i < heightOut && j < widthOut )
{
int iIn = i * heightIn / heightOut;
int jIn = j * widthIn / widthOut;
for(int c = 0; c < channel; c++)
pOut[ (i*widthOut + j)*channel + c ] = pIn[ (iIn*widthIn + jIn)*channel + c ];
}
}
This question already has answers here:
Superpowered: real time pitch shift with timestretcher not working
(2 answers)
Closed 5 years ago.
I'm trying to make a pitch shift in real time from a microphone using superpowerd. I looked at the example that is for the file. Also tried to adapt it. I managed to change the sound, but it turned out very distorted with interference. What am I doing wrong? where to find more information on superpowered and timeStretching?
static bool audioProcessing(void *clientdata,
float **buffers,
unsigned int inputChannels,
unsigned int outputChannels,
unsigned int numberOfSamples,
unsigned int samplerate,
uint64_t hostTime) {
__unsafe_unretained Superpowered *self = (__bridge Superpowered *)clientdata;
float tempBuffer[numberOfSamples * 2 + 16];
SuperpoweredInterleave(buffers[0], buffers[1], tempBuffer, numberOfSamples);
float *outputBuffer = tempBuffer;
SuperpoweredAudiobufferlistElement inputBuffer;
inputBuffer.samplePosition = 0;
inputBuffer.startSample = 0;
inputBuffer.samplesUsed = 0;
inputBuffer.endSample = self->timeStretcher->numberOfInputSamplesNeeded;
inputBuffer.buffers[0] = SuperpoweredAudiobufferPool::getBuffer(self->timeStretcher->numberOfInputSamplesNeeded * 8 + 64);
inputBuffer.buffers[1] = inputBuffer.buffers[2] = inputBuffer.buffers[3] = NULL;
memcpy((float *)inputBuffer.buffers[0], outputBuffer, numberOfSamples * 2 + 16);
self->timeStretcher->process(&inputBuffer, self->outputBuffers);
// Do we have some output?
if (self->outputBuffers->makeSlice(0, self->outputBuffers->sampleLength)) {
while (true) { // Iterate on every output slice.
// Get pointer to the output samples.
int sampleCount = 0;
float *timeStretchedAudio = (float *)self->outputBuffers->nextSliceItem(&sampleCount);
if (!timeStretchedAudio) break;
SuperpoweredDeInterleave(timeStretchedAudio, buffers[0], buffers[1], numberOfSamples);
};
// Clear the output buffer list.
self->outputBuffers->clear();
};
return true;
}
I did the following:
static bool audioProcessing(void *clientdata,
float **buffers,
unsigned int inputChannels,
unsigned int outputChannels,
unsigned int numberOfSamples,
unsigned int samplerate,
uint64_t hostTime) {
__unsafe_unretained Superpowered *self = (__bridge Superpowered *)clientdata;
SuperpoweredAudiobufferlistElement inputBuffer;
inputBuffer.startSample = 0;
inputBuffer.samplesUsed = 0;
inputBuffer.endSample = numberOfSamples;
inputBuffer.buffers[0] = SuperpoweredAudiobufferPool::getBuffer((unsigned int) (numberOfSamples * 8 + 64));
inputBuffer.buffers[1] = inputBuffer.buffers[2] = inputBuffer.buffers[3] = NULL;
SuperpoweredInterleave(buffers[0], buffers[1], (float *)inputBuffer.buffers[0], numberOfSamples);
self->timeStretcher->process(&inputBuffer, self->outputBuffers);
// Do we have some output?
if (self->outputBuffers->makeSlice(0, self->outputBuffers->sampleLength)) {
while (true) { // Iterate on every output slice.
// Get pointer to the output samples.
int numSamples = 0;
float *timeStretchedAudio = (float *)self->outputBuffers->nextSliceItem(&numSamples);
if (!timeStretchedAudio || *timeStretchedAudio == 0) {
break;
}
SuperpoweredDeInterleave(timeStretchedAudio, buffers[0], buffers[1], numSamples);
}
// Clear the output buffer list.
self->outputBuffers->clear();
}
return true;
}
This might not work correctly when changing the speed also, but I wanted live pitch shifting only. People should be able to speak slower or faster themselves.
I am developing an app to show sin wave.
I am using AudioQueueNewOutput to output mono sound is OK, but when I come to stereo output, I have no idea how to do it.
I know the mChannelsPerFrame = 2 can generate wave in both left and right channel.
I also want to know what is the sequence of sending bytes to left and right channel? Is the first byte to left channel and the second byte to right channel?
Code:
_audioFormat = new AudioStreamBasicDescription();
_audioFormat->mSampleRate = SAMPLE_RATE; // 44100
_audioFormat->mFormatID = kAudioFormatLinearPCM;
_audioFormat->mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked;
_audioFormat->mFramesPerPacket = 1;
_audioFormat->mChannelsPerFrame = NUM_CHANNELS; // 1
_audioFormat->mBitsPerChannel = BITS_PER_CHANNEL; // 16
_audioFormat->mBytesPerPacket = BYTES_PER_FRAME; // 2
_audioFormat->mBytesPerFrame = BYTES_PER_FRAME; // 2
and
_sineTableLength = _audioFormat.mSampleRate / SAMPLE_LIMIT_FACTOR; // 44100/100 = 441
_sineTable = new SInt16[_sineTableLength];
for(int i = 0; i < _sineTableLength; i++)
{
// Transfer values between -1.0 and 1.0 to integer values between -sample max and sample max
_sineTable[i] = (SInt16)(sin(i * 2 * M_PI / _sineTableLength) * 32767);
}
and
AudioQueueNewOutput (&_audioFormat,
playbackCallback,
(__bridge void *)(self),
nil,
nil,
0,
&_queueObject);
static void playbackCallback (void* inUserData,
AudioQueueRef inAudioQueue,
AudioQueueBufferRef bufferReference){
SInt16* sample = (SInt16*)bufferReference->mAudioData;
// bufferSize 1024
for(int i = 0; i < bufferSize; i += _audioFormat.mBytesPerFrame, sample++)
{
// set value for *sample
// 9ms sin wave and 4.5ms 0
...
}
...
AudioQueueEnqueueBuffer(...)
}
Several days later,I have found the answer.
First: AudioStreamBasicDescription can set just like this ;
Then: bufferSize change from 1024 to 2048 ;
And: SInt16 in SInt16* sample = (SInt16*)bufferReference->mAudioData; all change to SInt32. Because the channel double,the bits double;
Last: Each 16 bits contains data that left or right channel need in sample,just feed it whatever you want.
An intermediate step of my current project requires conversion of opencv's cv::Mat to MTLTexture, the texture container of Metal. I need to store the Floats in the Mat as Floats in the texture; my project cannot quite afford the loss of precision.
This is my attempt at such a conversion.
- (id<MTLTexture>)texForMat:(cv::Mat)image context:(MBEContext *)context
{
id<MTLTexture> texture;
int width = image.cols;
int height = image.rows;
Float32 *rawData = (Float32 *)calloc(height * width * 4,sizeof(float));
int bytesPerPixel = 4;
int bytesPerRow = bytesPerPixel * width;
float r, g, b,a;
for(int i = 0; i < height; i++)
{
Float32* imageData = (Float32*)(image.data + image.step * i);
for(int j = 0; j < width; j++)
{
r = (Float32)(imageData[4 * j]);
g = (Float32)(imageData[4 * j + 1]);
b = (Float32)(imageData[4 * j + 2]);
a = (Float32)(imageData[4 * j + 3]);
rawData[image.step * (i) + (4 * j)] = r;
rawData[image.step * (i) + (4 * j + 1)] = g;
rawData[image.step * (i) + (4 * j + 2)] = b;
rawData[image.step * (i) + (4 * j + 3)] = a;
}
}
MTLTextureDescriptor *textureDescriptor = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA16Float
width:width
height:height
mipmapped:NO];
texture = [context.device newTextureWithDescriptor:textureDescriptor];
MTLRegion region = MTLRegionMake2D(0, 0, width, height);
[texture replaceRegion:region mipmapLevel:0 withBytes:rawData bytesPerRow:bytesPerRow];
free(rawData);
return texture;
}
But it doesn't seem to be working. It reads zeroes every time from the Mat, and throws up EXC_BAD_ACCESS. I need the MTLTexture in MTLPixelFormatRGBA16Float to keep the precision.
Thanks for considering this issue.
One problem here is you’re loading up rawData with Float32s but your texture is RGBA16Float, so the data will be corrupted (16Float is half the size of Float32). This shouldn’t cause your crash, but it’s an issue you’ll have to deal with.
Also as “chappjc” noted you’re using ‘image.step’ when writing your data out, but that buffer should be contiguous and not ever have a step that’s not just (width * bytesPerPixel).