i got the Flutter Camera working with a preview and all that. But the quality of the recorded video is way to bad. I get 1MB for 10 seconds. The resolution is fine (1080 x 1440), but i think the bitrate is way to low somehow.
I looked into the CameraPlugin:
https://github.com/flutter/plugins/blob/master/packages/camera/android/src/main/java/io/flutter/plugins/camera/CameraPlugin.java
mediaRecorder.setVideoEncoder(MediaRecorder.VideoEncoder.H264);
mediaRecorder.setVideoEncodingBitRate(1024 * 1000);
mediaRecorder.setAudioSamplingRate(16000);
Is this the normal configuration and does it work for you guys?
I got a Oneplus 2 and normal camera is taking better videos.
I'm not used to the flutter Method Channel so i can't create my own CustomCameraPlugin and change the important values.
Maybe there is a whole different approach.
Let me know
Greetings Markus
Changing mediaRecorder.setVideoEncodingBitRate from 1024 * 1000 to 3000000 for worked for me for better quality.
https://github.com/flutter/plugins/blob/master/packages/camera/android/src/main/java/io/flutter/plugins/camera/CameraPlugin.java
Related
I realized that when I use OpenCV to grab videos (cv2.VideoCapture('rtsp://...')) in a rtsp url, actually I am getting everyframe of the stream and not the real time frame.
Example: If a video has 30fps and 10 seconds long, if I get the first frame and wait 1 second to get the next, I get the frame number 2 and not the real time frame (it should be frame number 30 or 31).
I am worried about these because if my code take a little longer to do the video processing (deep learning convolutions), the result will always be delivered later and not in real time.
Any ideas how can I manage to always get the current frame when I capture from rtsp?
Thanks!
This is not about the code. Many IP cameras gives you encoded output(H.265/H.264).
When you used VideoCapture() , the output data of the camera is decoded by the CPU. Getting delay as you mentioned also such as between 1 sec and 2 sec is normal.
What can be done to make it faster:
If you have a GPU hardware, you can decode the data via on it. This
will give you really good results(according to experiences by using
latest version of NVIDIA gpus: you will get almost 25 milisecond
delay) To achieve that on your code, you need:
CUDA installation
CUDA enabled OpenCV installation
VideoReader class of OpenCV
You can use VideoCapture() with FFMPEG flag, FFMPEG has advanced methods to decode encoded data and this will give you probably most faster output which you can get with your CPU. But this will not decrease time much.
I have been working on an H264 hardware accelerated encoder implementation using VideoToolbox's VTCompressionSession for a while now, and a consistent problem has been the unreliable bitrate coming out of it. I have read many forum posts and looked through existing code for this, and tried to follow suit, but the bitrate out of my encoder is almost always somewhere between 5% and 50% off what it is set at, and on occasion I've seen some huge errors, like even 400% overshoot, where even one frame will be twice the size of the given average bitrate.
My session is setup as follows:
kVTCompressionPropertyKey_AverageBitRate = desired bitrate
kVTCompressionPropertyKey_DataRateLimits = [desired bitrate / 8, 1]; accounting for bits vs bytes
kVTCompressionPropertyKey_ExpectedFrameRate = framerate (30, 15, 5, or 1 fps)
kVTCompressionPropertyKey_MaxKeyFrameInterval = 1500
kVTCompressionPropertyKey_MaxKeyFrameIntervalDuration = 1500 / framerate
kVTCompressionPropertyKey_AllowFrameReordering = NO
kVTCompressionPropertyKey_ProfileLevel = kVTProfileLevel_H264_Main_AutoLevel
kVTCompressionPropertyKey_RealTime = YES
kVTCompressionPropertyKey_H264EntropyMode = kVTH264EntropyMode_CABAC
kVTCompressionPropertyKey_BaseLayerFrameRate = framerate / 2
And I adjust the average bitrate and datarate values throughout the session to try and compensate for the volatility (if it's too high, I reduce them a bit, if too low, I increase them, with restrictions on how high and low to go).
I create the session and then apply the above configuration as a single dictionary using VTSessionSetProperties and feed frames into it like this:
VTCompressionSessionEncodeFrame(compressionSessionRef,
static_cast<CVImageBufferRef<(pixelBuffer),
CMTimeMake(capturetime, 1000),
kCMTimeInvalid,
frameProperties,
frameDetailsStruct,
&encodeInfoFlags);
So I'm supplying timing information as the API says to do.
Then I add up the size of the output for each frame and divide over a periodic time period, to determine the outgoing bitrate and error from desired. This is where I see the significant volatility.
I'm looking for any help in getting the bitrate under control, as I'm not sure what to do at this point. Thank you!
I think you can check the frameTimestamp set in VTCompressionSessionEncodeFrame, it seems affects the bitrate. If you change frame rate, change the frameTimestamp.
After spending over 10 hours to compile tesseract using libc++ so it works with OpenCV, I've got issue getting any meaningful results. I'm trying to use it for digit recognition, the image data I'm passing is a small square (50x50) image with either one or no digits in it.
I've tried using both eng and equ tessdata (from google code), the results are different but both get guess 0 digits. Using eng data I get '4\n\n' or '\n\n' as a result most of the time (even when there's no digit in the image), with confidence anywhere from 1 to 99.
Using equ data I get '\n\n' with confidence 0-4.
I also tried binarizing the image and the results are more or less the same, I don't think there's a need for it though since images are filtered pretty good.
I'm assuming that there's something wrong since the images are pretty easy to recognize compared to even simplest of the example images.
Here's the code:
Initialization:
_tess = new TessBaseAPI();
_tess->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");
_tess->SetVariable("tessedit_char_whitelist", "0123456789");
_tess->SetVariable("classify_bln_numeric_mode", "1");
Recognition:
char *text = _tess->TesseractRect(imageData, (int)bytes_per_pixel, (int)bytes_per_line, 0, 0, (int)imageSize.width, (int)imageSize.height);
I'm getting no errors. TESSDATA_PREFIX is set properly and I've tried different methods for recognition. imageData looks ok when inspected.
Here are some sample images:
http://imgur.com/a/Kg8ar
Should this work with the regular training data?
Any help is appreciated, my first time trying tessarect out and I could have missed something.
EDIT:
I've found this:
_tess->SetPageSegMode(PSM_SINGLE_CHAR);
I'm assuming it must be used in this situation, tried it but got the same results.
I think Tesseract is a bit overkill for this stuff. You would be better off with a simple neural network, trained explicitly for your images. At my company, recently we were trying to use Tesseract on iOS for an OCR task (scanning utility bills with the camera), but it was too slow and inaccurate for our purposes (scanning took more than 30 seconds on an iPhone 4 at a tremendously low FPS). At the end, I trained a neural-network specifically for our target font, and this solution not only beat Tesseract (it could scan stuff flawlessly even on an iPhone 3Gs), but also a commercial ABBYY OCR engine, which we were given a sample from the company.
This course's material would be a good start in machine learning.
This one keeps me awake:
I have an OS X audio application which has to react if the user changes the current sample rate of the device.
To do this I register a callback for both in- and output devices on ‘kAudioDevicePropertyNominalSampleRate’.
So if one of the devices sample rates get changed I get the callback and set the new sample rate on the devices with 'AudioObjectSetPropertyData' and 'kAudioDevicePropertyNominalSampleRate' as the selector.
The next steps were mentioned on the apple mailing list and i followed them:
stop the input AudioUnit and the AUGraph which consists of a mixer and the output AudioUnit
uninitalize them both.
check for the node count, step over them and use AUGraphDisconnectNodeInput to disconnect the mixer from the output
now set the new sample rate on the output scope of the input unit
and on the in- and output scope on the mixer unit
reconnect the mixer node to the output unit
update the graph
init input and graph
start input and graph
Render and Output callbacks start again but now the audio is distorted. I believe it's the input render callback which is responsible for the signal but I'm not sure.
What did I forget?
The sample rate doesn't affect the buffer size as far as i know.
If I start my application with the other sample rate everything is OK, it's the change that leads to the distorted signal.
I look at the stream format (kAudioUnitProperty_StreamFormat) before and after. Everything stays the same except the sample rate which of course changes to the new value.
As I said I think it's the input render callback which needs to be changed. Do I have to notify the callback that more samples are needed? I checked the callbacks and buffer sizes with 44k and 48k and nothing was different.
I wrote a small test application so if you want me to provide code, I can show you.
Edit: I recorded the distorted audio(a sine) and looked at it in Audacity.
What I found was that after every 495 samples the audio drops for another 17 samples.
I think you see where this is going: 495 samples + 17 samples = 512 samples. Which is the buffer size of my devices.
But I still don't know what I can do with this finding.
I checked my Input and Output render procs and their access of the RingBuffer(I'm using the fixed Version of CARingBuffer)
Both store and fetch 512 frames so nothing is missing here...
Got it!
After disconnecting the Graph it seems to be necessary to tell both devices the new sample rate.
I already did this before the callback but it seems this has to be done at a later time.
Please bear with me, I know that what I'm doing can sound strange, but I can guarantee there's a very good reason for that.
I took a movie with my camera, as avi. I imported the movie into iMovie and then exploded the single frames as PNG. Then I repacked these frames into mov using the following code
movie, error = QTMovie.alloc().initToWritableFile_error_(out_path, None)
mt = QTMakeTime(v, scale)
attrib = {QTAddImageCodecType: "jpeg"}
for path in png_paths:
image = NSImage.alloc().initWithContentsOfFile_(path)
movie.addImage_forDuration_withAttributes_(image, mt, attrib)
movie.updateMovieFile()
The resulting mov works, but it looks like the frames are "nervous" and shaky when compared to the original avi, which appears smoother. The size of the two files is approximately the same, and both the export and repacking occurred at 30 fps. The pics also appear to be aligned, so it's not due to accidental shift of the frames.
My question is: by knowing the file formats and the process I performed, what is the probable cause of such result ? How can I fix it ?
One textbook reason for "shaky" images are field mode issues. Any chance you are working with interlaced material and got your field order messed up? This would cause the results you described...
As for how you may fix this using the API you're using (QTKit?) I am at loss, though, due to lacking experience with it.