I'm trying to detect the text in a scanned page and get the coordinates of it.
See the attached image for an example of scanned page.
I need the vertical coordinates for spliting page from the useless parts, and then detect the text's coordinates.
What kind of tools could I use to split and detect text's coordinates?
Take a look at the Stroke Width Transform.
See also this SO answer.
Related
I've read ARKit official tutorial RealtimeNumberReader, it uses AVCaptureSession and a specific function layerRectConverted which is only for AVCaptureSession to convert coordinates from bounding box to screen coordinate.
let rect = layer.layerRectConverted(fromMetadataOutputRect: box.applying(self.visionToAVFTransform))
Now I want to recognize text on ARFrame's capturedImage and then display the bound box on screen. Is it possible?
I know how to recognize text on a single image from official tutorial, my problem is how to convert the normalized box coordinate to viewport coordinate.
Please help and thank you very much!!!
Based on #Banane42's answer, I found the theory behind ARkit and VNRecognizeTextRequest
ARKit Sceneview's capturedImage is wider than you can see. check the picture below. I made a small app that has an imageView to display the whole image, and the background image is the sceneview area.
The coordinate of sceneview or image is originated from top left corner, x-axis -> to right and y-axis -> to bottom. But the coordinate of boundingBox that VNRequest returns is originated from bottom left corner and x-axis -> to right and y-axis -> to top.
if you use request.regionOfInterest, this ROI should be normalized coordinate with respect to the whole image. the returned VNRequest boundingBox is in normalized coordinate with respect to the ROI box.
Finally I've got my app working properly. And this is very complicated. so be careful!
Try looking at this git repo. Having messed with it myself it is not the most performant but this should give you a start.
I have an image with various asymmetric regions, is it possible to place a button above each region?
The image will be something similar to this: https://cdn.dribbble.com/users/1557638/screenshots/4367307/proactive_d.png
After way too much research, I’ve decided to go with following solution for an image with multiple asymmetric clickable regions:
Duplicated the image file, colored each region with a different color, and stored as an asset.
Placed a simple tap gesture recognition over the displayed image. When tapped, I get the relative coordinate to the displayed image.
Get the color from the map image at the tapped coordinates.
Compare to a predefined enum.
We are working on MKOverlayView, below is the expected functionality:
An image has to be overlaid on the map and has to be tilted by a certain angle (bearing).
Issue: When the map is zoomed to the maximum level, one of the corners of the overlaid image is getting truncated. But, Complete image is coming back when zoom out a little bit.
Please find the attached screenshot for reference.
I'm also getting overlaying text being clipped at any zoom level. What I noticed is that it clips at some vertical invisible lines. Those invisible lines, they look like as if they were the actual map tiles.
What it still works is other overlays I have on the map, they don't get chopped.
This started to happen with iOS 10.
In the image, the building outlines –colored- are overlays that don't get clipped, but the text overlays (drawn using drawInRect) they do get chopped. Texts are Very Long text1 to see if it truncates, changing text1 for text2 and so forth.
So here is a question that is sure to stump some people.
Here is my scenario. I want a user to take a picture of something, in this case it will be just a black rectangle with white circles on it. I don't care about the size of the circles, but I want to know how many circles there are and where they are located in respect to the photo. Then a user will enter the width and height of the photo they just took and I will be able to tell the distance the circles are from each other.
Does anyone have any clue how I could do this?
I don't think you will get a straight forward answer.But below is my approach.
Take the image, get its pixel data using CGBitmapContext (reference).
Now search in the array, where white pixels are located (white pixels - colorvalue >240/255).
Now try to find its white-circle-centre using some algorithm (reference).
Store those centers in an array,and later when user gives width ,return widths relatively.
I'm showing an image using cv::imshow("binary1", binary1);. I want to put a marker on the image to the check the pixel locations. How can I put marker on the image for a particular row and column value?
It's difficult to understand what is it that you want to do, but I wrote a code a while back that displays the RGB color of a pixel along with it's coordinates on the title of the window. Move the mouse pointer over the image and you'll see it change.
It uses a Qt window, though. You can check cvImage.