KLT Pyramid BoofCV - boofcv

I am working on an Android application that will use the KLT tracking algorithm. I have downloaded the Android sample provided by BoofCV's website and I have seen the code. However, I need it to work in the background on a different thread without the camera preview while in the front it will be a user interface of some sorts.
Your help is highly appreciated.

You can make camera preview as small as 2x2 pixels effectively making it invisible while still receiving image frames in onPreviewFrame() - that's the way it's done in a BoofCV example application I've found

Related

How to detect text in a photo

I am researching into the best way to detect test in a photo using open source libraries.
I think the standard way is as follows (note: steps 1 - 4 all use OpenCV):
1) detect outline of document
2) transform document so it's flat and cropped, using said outline
3) Make the background of document white, using a filter
4) Feed resulting image to Tesseract
Is this the optimum process, or is there a better way, or better tools?
Also, what happens for case if the photo doesn't have a document outline (It's possible that step 1 & 2 are redundant)?
Is there anyway to automatically detect document orientation (i.e. portrait / landscape)?
I think your process is fine. I've used a similar process for an Android project.
I think that the only way you can discover if a document is portrait/landscape is to reason with the length of the sides of the bounding box of your outline.
I don't think there's an automatic way to do this, maybe you can find the most external contour approximable with a 4 segment polyline (all doable in opencv). In order to get this you'll have to work with contour hierarchy and contous approximation (see cv2.approxPolyDP).
This is how I would go for automatic outline detection. As I said, the rest of your algorithm seems just fine to me.
PS. I'll leave my Android project GitHub link. I don't know if it can be useful to you, but here I specify the outline by dragging some handles, then transform the image and feed it to Tesseract, using Java and OpenCV. Yeah It's a very bad idea to do that in the main thread of an Android app and yeah, the app is not finished. I just wanted to experiment with OCR, so I didn't care much of performance and usability, since this was not intended to use, but just for studying.
Look up the uniform width transform.
What this does is detect edges which have more or less the same width with respect to their opposite edge. So things like drainpipes (which can be eliminated at a later pass) but also the majority of text. Whilst conceptually it's similar to a distance transform, the published method uses rather ad hoc normal projection methods and Canny edge detection.

Is is possible to use Vuforia without a camera?

Is it possible to use Vuforia without a camera for image tracking?
Basically I would like a function I could call with an image as a indata parameter and coordinates of a image target as a result. Does that exist?
It is unfortunately not possible. I've been looking for such an option myself several times while working on a Moodstocks (image recognition SDK) / Vuforia mashup (see these 2 blog posts if you are interested in it), but the Vuforia SDK prevents the use of any other source than the camera.
I guess the main reason for this is that the camera management is fully handled internally by the Vuforia SDK, probably in order to make it easier to use as managing the camera by ourselves is at best a boring task (lines and lines of code to repeat in each project...), at worst a huge pain in the ass (especially on Android where there are sometimes devices than don't behave as expected).
By the way, it looks to me like the Vuforia SDK is not the best solution you can find for your use case: it is mainly an augmented-reality SDK, focussed on real-time tracking, which imply working with a camera stream... so using it to do "simple" image recognition looks really overkill!

Unity3D on iOS, inspecting the device camera image in Obj-C

I have a Unity/iOS app that captures the user's photo and displays it in the 3D environment. Now I'd like to leverage CIFaceFeature to find eye positions, which requires accessing the native (Objective-C) layer. My flow looks like:
Unity -> WebCamTexture (encode and send image to native -- this is SLOW)
Obj-C -> CIFaceFeature (find eye coords)
Unity -> Display eye positions
I've got a working prototype, but it's slow because I'm capturing the image in Unity (WebCamTexture) and then sending it to Obj-C to do the FaceFeature detection. It seems like there should be a way to simply ask my Obj-C class to "inspect the active camera". This would have to be much, much faster than encoding and passing an image.
So my question, in a nutshell:
Can I query in Obj-C 'is there a camera currently capturing?'
If so, how do I 'snapshot' the image from that currently running session?
Thanks!
You can access the Camera's preview capture stream by changing CameraCapture.mm in unity.
I suggest that you have a look at some existing plugin called Camera Capture for an example of how additional camera I/O functionality can be added to the capture session / "capture pipeline".
To set you off in the right direction. have a look at the function initCapture in CameraCapture.mm :
- (bool)initCapture:(AVCaptureDevice*)device width:(int)w height:(int)h fps:(float)fps
Here you will be able to add to the capture session.
And then you should have a look at the code sample provided by Apple on Facial Recognition :
https://developer.apple.com/library/ios/samplecode/SquareCam/Introduction/Intro.html
Cheers
Unity 3D allows execution of native code. In the scripting reference, look for native plugins. In this way you can display a native iOS view (with the camera view, possibly hidden depending on your requirements) and run Objective C code. Then return the results of eye detection to Unity if you need it in a 3D view.

How to make a screenshot of OpenGl ES on top of the live preview camera in iOS (Augmented Reality app)?

I am a very beginner in Objective-C and iOS programming. I spent a month to find out how to show a 3D model using OpenGL ES (version 1.1) on top of the live camera preview by using AvFoundation. I am doing a kind of augmented reality application on iPad. I process the input frames and show 3D object overlay with the camera preview in realtime. These was fine because there are so many site and tutorial about these things (Thanks to this website as well).
Now, I want to make a screen capture of the whole screen (the model with camera preview as the background) as the image and show in the next screen. I found a really good demonstration here, http://cocoacoderblog.com/2011/03/30/screenshots-a-legal-way-to-get-screenshots/. He did everything I want to do. But, as I said before, I am so beginner and don't understand the whole project without explanation in details. So, I'm stuck for a while because I don't know how to implement this.
Does anybody know any of good tutorial or any kind of source in this topic or any suggestion that I should learn more in order to do this screen capture? This will help me a lot to moving on.
Thank you in advance.
I'm currently attempting to solve this same problem to allow a user to take a screenshot of an Augmented Reality app. (We use Qualcomm's AR SDK plugged into Unity 3D to make our AR apps, which saved me from ever having to learn how to programmatically render OpenGL models)
For my solution I am first looking at implementing the second answer found here: How to take a screenshot programmatically
Barring that I will have to re-engineer the "Combined Screenshots" method found in CocoaCoder's Screenshots app.
I'll check back in when I figure out which one works better.
Here are 3 very helpful links to capture screenshot:
OpenGL ES View Snapshot
How to capture video frames from the camera as images using AV Foundation
How do I take a screenshot of my app that contains both UIKit and Camera elements
Enjoy

Extracting slides from video lectures using OpenCV

I would like to extract out all the slides from a video lecture, using OpenCV. Here is an example of a lecture: http://www.youtube.com/watch?v=-hxOpz9c0bY.
What approaches would you recommend? So far, I've tried:
Comparing the change in grayscale intensity from frame to frame. This can have problems when an object in the foreground moves around. For example, in this lecture, there's a hand that moves around: http://www.youtube.com/watch?v=mNzu42FrlHo#t=07m00s.
Using SURF features and doing comparisons frame by frame. This approach seems kind of slow.
Does anyone have other ideas?
Most of this work is most likely already done by video encoder. You just need to extract key-frames and check how well compressed are frames between them.
It should be also fairly easy to distinguish still images. You can save lot of time by examining just the key-frames. Slides are likely to have high contrast, solid shapes, solid background. Lecture hall has blurry shapes and low contrast.
What you need is a scene change detection. After that, you'll have to classify scenes as "lecture hall" or "presentation". As for the problem with hands - you could use background subtraction with an adaptive background (just make sure you mask the foreground... you don't want the foreground to become a part of the background).
You could try an edge detection and look for a rectangular object - the slides (above a certain area threshold). You could further reduce FPs by looking for some text within the rectangle.
There are several reasons to extract slides/frames from a video presentation, especially in the case of education or conference related videos. It allows you to access the study notes without watching the whole video.
I have faced this issue several times, so I decided to create a solution for it myself using python. I have made the code open-source, you can easily set up this tool and run it in few simple steps.
Refer to this for a youtube video tutorial. Steps on how to use this tool.
Clone this project video2pdfslides
Set up your environment by running "pip install -r requirements.txt"
Copy your video path
Run "python video2pdfslides.py <video_path>"
Boom! the pdf slides will be available in the output folder Make notes and enjoy!

Resources