I’m working with tesseract on some recognition on some video streams. i need assistance improving and also maybe looking at other image recognition libraries. I have a number of streams that have different elements in them, so each have to be designed for recognition differently.
So the current streams are over twitch. 1 of the issues is that sometimes twitch streams in different quality - I have 720p, 480p, 360p. what i need to know is the winning team and the score.
main issue atm is that tesseract cannot recognise characters or font on a image from a 360p stream. here is the sample image...
http://tinypic.com/view.php?pic=9hm4vp&s=8#.Vi95xxDhCSM
And here's some more 360p quality images in a drive...
http://1drv.ms/1M1O75J
So yeah thats my issue, mainly how to recognise the text well on a 360p image :) i have no idea of best method or libraries so any help would be great :)
Related
I have a rtsp stream from a pretty good camera (my mobile phone).
I am getting the stream using opencv:
cv2.VideoCapture(get_camera_stream_url(camera))
However, the image quality I get is way bellow my mobile phone camera. I understand that rtsp protocol may lower the resolution but still, the image quality is not good for OCR.
However, although I have a VIDEO stream, the object I am recording is a static one. So, it is expected that all frames from the video should more or less the same, except for noise or lighting issues.
I was wondering if it is possible to get a 10 seg video with several frames and combine it to a SINGLE frame with better sharpness, reducing the noise.
Is it viable? How?
Does anyone know how to reproduce the new Notes new scanning feature in iOS 11??
Is AVFoundation used for the camera?
How is the camera detecting the shape of the paper/document/card?
How do they place the overlay over in real time?
How does the camera know when to take the photo?
What's that animated overlay and how can we achieve this?
Does anyone know how to reproduce this?
Not exactly :P
Is AVFoundation used for the camera? Yes
How is the camera detecting the shape of the paper/document/card?
They are using the Vision Framework to do rectangle detection.
It's stated in this WWDC session by one of the demonstrators
How do they place the overlay over in real time?
You Should check out the above video for this as he talks about doing something similar in one of the demos
How does the camera know when to take the photo?
I'm not familiar with this app but it's surely triggered in the capture session, no?
Whats that animated overlay and how can we achieve this?
Not sure about this but I'd imagine it's some kind of CALayer with animation
Is Tesseract framework used for the image afterwards?
Isn't Tesseract OCR for text?
If you're looking for handwriting recognition, you might want to look for a MNIST model
Use Apple’s rectangle detection SDK, which provides an easy-to-use API that can identify rectangles in still images or video sequences in near-realtime. The algorithm works very well in simple scenes with a single prominent rectangle in a clean background, but is less accurate in more complicated scenes, such as capturing small receipts or business cards in cluttered backgrounds, which are essential use-cases for our scanning feature.
An image processor that identifies notable features (such as faces and barcodes) in a still image or video.
https://developer.apple.com/documentation/coreimage/cidetector
For a project I'm working on, I'm trying to stream video to an iPhone through its headphone jack. My estimated bitrate is about 200kbps (If i'm wrong about this, please ignore that).
I'd like to squeeze as much performance out of this bitrate as possible and sound is not important for me, only video. My understanding is that to stream a a real-time video I will need to encode it with some codec on-the-fly and send compressed frames to the iPhone for it to decode and render. Based on my research, it seems that H.265 is one of the most space efficient codecs available so i'm considering using that.
Assuming my basic understanding of live streaming is correct, how would I estimate the FPS I could achieve for a given resolution using the H.265 codec?
The best solution I can think of it to take a video file, encode it with H.265 and trim it to 1 minute of length to see how large the file is. The issue I see with this approach is that I think my calculations would include some overhead from the video container format (AVI, MKV, etc) and from the audio channels that I don't care about.
I'm trying to stream video to an iPhone through its headphone jack.
Good luck with that. Headphone jack is audio only.
My estimated bitrate is about 200kbps
At what resolution? 320x240?
I'd like to squeeze as much performance out of this bitrate as possible and sound is not important for me, only video.
Then, drop the sound streams all together. Really though, 200kbit isn't enough for video of any reasonable size or quality.
Assuming my basic understanding of live streaming is correct, how would I estimate the FPS I could achieve for a given resolution using the H.265 codec?
Nobody knows, because you've told us almost nothing about what's in this video. The bandwidth required for the video is a product of many factors, such as:
Resolution
Desired Quality
Color Space
Visual complexity of the scene
Movement and scene changes
Tweaks and encoding parameters (fast start? low latency?)
You're going to have to decide what sort of quality you're willing to accept, and decide subjectively what the balance between that quality and frame rate is. (Remember too that if there isn't much going on, you basically get frames for free since they take very little bandwidth. Experiment.)
The best solution I can think of it to take a video file, encode it with H.265 and trim it to 1 minute of length to see how large the file is.
Take many videos, typical of what you'll be dealing with, and figure it out from there.
The issue I see with this approach is that I think my calculations would include some overhead from the video container format (AVI, MKV, etc) and from the audio channels that I don't care about.
Your video stream won't have a container at all? Not even TS? You can use FFmpeg to dump the raw stream data for you.
As many of you know Tesseract does character recognition in still photos or images. I'm using xcode for my iOS app and I got this problem. How can I use tesseract to scan the camera live preview. An app that does this is the Word Lens app, it makes a frame by frame live recognition and translation of the text being previewed by the camera. I'm trying to do this live character recognition whithout the translation part. What is the best approach? How can I do a real-time scan of the camera preview frame by frame using Tesseract OCR? Thanks.
I have tested it and Performance is too low. Camera output eight pictures per second, but OCR process one need about 2 seconds.
The link A (quasi-) real-time video processing on iOS
The link tesseract-ios
and How can I make tesseract on iOS faster
Maybe we need use OpenCV.
Or, alternative you can use other free product, that does OCR in camera preview: ABBYY Real-Time Recognition OCR.
Disclaimer: I work for ABBYY.
I am trying to do some image tracking by capturing images from a webcam and comparing it with a reference image. The problem I face is that two images of the exact same spot differ in their bitmaps. I am using OpenCV. I need to know a way to capture images so that this kind of jitter is avoided.
Thanks in advance.
Well, I would say that you can't.
Two images will never be the same, due to illumination changes, and thousands of other effects (including electronic noise).
What you want to do is to find a way to uniformize it like applying some kind of gaussian filter.
http://mmlab.disi.unitn.it/wiki/index.php/Mixture_of_Gaussians_using_OpenCV
There are also some good links in this post :
Natural feature tracking with openCV- evaluating the options