Does anyone know a source for the google bounding box dataset image files? The annotations are available on the site along with the relevant details to find the youtube videos and extract the appropriate frames, but I'm not too keen to download 240k videos...
Related
I'm starting to use Tesseract, my goals is to be able to read basketball game images and read the scores and the game time. I have attached a sample image. Tesseract was able to pickup some, this is what it detected for the attached image.
‘\ semis
1ST 50.9
Some image, it's not able to pickup anything. I went thru the training section but it talks about fonts and such which I do not know. Different games may have a slightly different scorebox. Does anyone know how I can train Tesseract to be able to pickup the boxscore data of the screen or is there a different solution out there?
thanks
testimage
I want to be able to create a panoramic photo app or something that will be able to stitch multiple photos together (much like google photo sphere), but before I start I want to get a bit more information how it is done.
Is it done using the UIImagePickerController framework?
Is there any other useful API's or anything out there I can use?
Can somebody give me a brief overview of how this works.
There is no available native API with stitching algorithm. You should dig into the 3rd party OpenCV library and check their stitcher documentation
Basic Key steps of stitching algorithm:
Detect keypoints in each input image (eg. Harris corners) and extract the invariant descriptors of the images (eg. SIFT)
Match the descriptors between images
Using RANSACcalculate the homography matrix and apply the transformation
Given an scanned image containing graphs and text , how can I extract only images from that picture . Can you mention any image processing algorithms .
You could do connected component analysis, filtering out everything that does not look like character bounding boxes. An example paper is Robust Text Detection from Binarized Document Images
(https://www.researchgate.net/publication/220929376_Robust_Text_Detection_from_Binarized_Document_Images), but there are a lot of approaches. It depends on your exact needs if you get away with something simple.
There is a lot more complex stuff available, too. One example: Fast and robust text detection in images and video frames (http://ucassdl.cn/publication/ivc.pdf).
I have photos/scans of documents in few 10s of known formats. Every document contains some known attributes (date/time, names, and list of items).
Can you please suggest which apps/libs to start with (in Objective)C/C++)? Can OpenCV do that? What about OCR? Layout recognition for OCR?
Thanks!
P.S. Please suggest how to rephrase my post
P.P.S. I have found some promising tool (with examples for iOS): https://code.google.com/p/tesseract-ocr/ and https://github.com/robmathews/OCR-iOS-Example
To detect where the text is on the page I would recommend using OpenCV to do that, then send the regions of text to tesseract.
Find text:
Erode Image
Find Contours
Get bounding boxes of contours
Those bounding boxes should contain text or logo/picture.
Imagine a huge 360° panoramic shot from security camera. I need to find people on this image. Suppose I already have some great classifier to tell if some picture is a picture of a single human being.
What I don't understand - is how to apply this classifier to the panoramic shot, which can contain multiple people? Should I apply it to all possible regions of the image? Or there is some way to search for "interesting points" and feed only regions around this points to my classifier?
Which keywords to google / algorythms to read about to find out ways to search regions of image, that may (or may not) contain information for following classification?