I'm a bachelor student and currently working on a final project in Optical Braille Recognition using a real-time camera.
I've successfully processed the image into HSV format and extracted only the value of the HSV image to prevent the ambient light effecting the image, performed the binary threshold,canny edge detection, erode, and dilation for getting only the Braille dots from the camera.
What I would like to ask is how to perform a segmentation in a problem where the distance between each dots always change as the camera move nearer or further to the Braille writings?
Any assistance would be appreciated.
Thank you
To do this, you would detect some sort of relative pair of coordinates that would allow you to detect the "scale" of the braille writing in your image. This can be an identifying pair of points on either end of the writing, or even just some characteristic dots. With the scale you can transform the image to be of uniform size, depending on what distance away the camera is.
There is no simple, general solution to your problem. Surely if I do not immediately understand how these Braille letters are spaced out, it will not be easily solved by a simple algorithm.
Your best bet is to read literature on Braille text, talk with your prof, and have a blind person explain to you how they read Braille.
Other than that, you would have to find the baselines of the Braille text lines and see how they differ, then run a cvPerspectiveTransform in order to straighten out the image, so you can segment the dots without considerations for perspective.
This challenge is very similar to the issues I've encountered in my barcode system. My answer is a generalized description of the method I use.
I'd start by dividing the image into a grid, where a single character cell would fit within a single grid cell. This would make it that any character would be guaranteed to fit within a 2x2 grid cell, no matter how the grid overlays the image.
Convert the image into dots. Dots are identified by local identification using a small area of pixels.
assign each dot a grid cell number. this should be something easy like x/y location divided by 32 pixels cell ((y/32)*(width/32))+(x/32)
Keep a count of dots per grid cell and when all the dots are identified, sort the dot table by grid number and build an index by displacement in the table and number of elements.
If the resolution varies, sample some cells with lots of dots to determine distance between cell pairs.
Look though the cells row by row, but examine each cell using a 2x2 cell group. This way, any dot in the cell being tested, is guaranteed to be matched to a paired dot (if one exists). By using the grid dots only need to be matched to dots local to each other, so while the image may have thousands of dots, individual dots only need to try to match to 1-10 dots.
Pairing dots will create duplicates, which can either be prevented while matching or purged later.
At this point is where you would need to match the dots to Braille. Horizontal pairs of pairs and vertical pairs of pairs should be able to start lining up the Braille text.
Once the lines are aligned, the speck table would then be rotated into the text alignment determined. The pairs would be put into alignement, then from the position of the pair, unmatched specks could be added by matching the grid location of the pair to unpaired dots in the dot table.
Related
I want multiple users to take photographs of a paper which has four dots at the four edges. While user is taking photograph, to make sure that the paper is flat and not crumpled, I want an overlaid frame to turn green, if the camera can detect four dots with the known relative distance, thus signifying the correct position of paper for photograph.
I have zero knowledge of image processing. Let me know if it is even possible, considering the camera application of both iOS and Android.
First you have not put your question in clear way, still I will try to give you an answer, you can use language of your choice, I am assuming that in the frame there will be paper only, and background should be uniform.
1) Threshold your image for detection of Paper only.
2) Once you have a binary image with paper you can extract its contour information. (In opencv use contours = cv2.findContours(thresholdimage, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE))
3) Get corner points of the contour. (In opencv you can usecv2.approxPolyDP and get list of the points).
From here you have multiple options, if you want to check alignment of paper is correct or not, use position of points. or you can find distance between them.
to check alignment of your rectangle you can find arctan between points which should be perpendicular in ideal case, you can use it as a measure.
Thank You.
I have an input image as follows and wish to segment the parts into regions. I also want the segmented parts to not been just the pixels which contribute to the solid color but also the edge anti-aliasing between the edge of the region and the next region.
Does there exist any filter or method to segment the image in this way? The important part is that the end result segmented part must contain the edge anti-aliasing between it and the next regions. A correct solution is shown in yellow.
In these two images I zoomed the pixels to be large so the edge anti-aliasing between region edges can be seen clearly.
An example output that I want for the yellow region is shown.
For a definition of "edge anti-aliasing" see https://markpospesel.wordpress.com/2012/03/30/efficient-edge-antialiasing/
I'm not sure what exactly you want. For example, would some pixels belong to two segments? If that is the case, then I'm relatively sure you have to do something on your own. Otherwise, the following might work:
Opening and Closing
Opening and closing are two morphological operations which will smooth borders
Clustering
There are many clustering algorithms. They are what you want for non-semantic segmentation (for semantic segmentation, you might want to read my literature survey). One example is
P. F. Felzenszwalb, “Graph based image segmentation.”
I would simply give those algorithms a try and see if one directly works.
Other clustering algorithms:
K-means
DB-SCAN
CLARANS
AGNES
DIANA
I'm doing a DIP project. I want to count the total number of words in each paper using Image Processing.
The original image is:
I did some pre-processing and produced the image below:
My idea to count the total number of words in each paper is to detect the digits inside blobs.
So please guide me. how can I count the words in this image? What's your idea?
Thanks.
Using the Digits inside blobs/circles is a good problem definition. I would recommend doing a circle hough transform and only looking for circles of a certain radius and then count the number of circles detected. You'll have to figure out what your radius is in pixels but this might be a good starting point. Good luck
If all pages are somewhat cleanly separated with one definition per line, you could take a very simple approach of counting the filled lines. First detect the list on the page to ignore irrelevant markings (green box) - does not have to exactly detect the edge so long as the bounds are no bigger than the list.
Then look for horizontal lines of pixels with no marking on them, or no dark value greater than X darkness. This is illustrated below with the pink horizontal lines. Lastly count the filled lines (any discrete section of horizontal lines which is not empty) and you have your number of definitions.
first, I've learning just couple of week about image processing, NN, dll, by myself, so I'm really new n really far to pro. n sorry for my bad english.
there's image or photo of my drawing, I want to get the coordinates of object/shape (black dot) n the number around it, the number indicating the sequence number of dot.
How to get it? How to detect the dots? Shape recognition for the dots? Number handwriting recognition for the numbers? Then segmentation to get the position? Or use template matching? But every dot has a bit different shape because of hand drawing. Use neural network? in NN, the neuron is usually contain every pixel to recognize an character, right? can I use an picture of character or drawing dot contained by each neuron to recognize my whole picture?
I'm very new, so I'm really need your advice, correct me if I wrong! Please tell me what I must learn, what I must do, what I must use.
Thank you very much. :'D
This is a difficult problem which can't be solved by a quick solution.
Here is how I would approach it:
Get a better picture. Your image is very noisy and is taken in low light with high ISO. Use a better camera and better lighting conditions so you can get the background to be as white as possible and the dots as black as possible. Try to maximize the contrast.
Threshold the image so that all the background is white and the dots and numbers are black. Maybe you could apply some erosion and/or dilation to help connect the dark edges together.
Detect the rectangle somehow and set your work area to be inside the rectangle (crop the rest of the image so that you are left with the area inside the rectangle). You could do this by detecting the contours in the image and then the contour that has the largest area is the rectangle (because it's the largest object in the image). Of course, this is not the only way. See this: OpenCV find contours
Once you are left with only the dots, circles and numbers you need to find a way to detect them and discriminate between them. You could again find all contours (or maybe you've found them all from the previous step). You need to figure out a way to see if a certain contour is a circle, a filled circle (dot) or a number. This is a problem in it's own. Maybe you could count the white/black pixels in the contour's bounding box. Dots have more black pixels than circles and numbers. You also need to do something about numbers that connect with dots (like the number 5 in your image)
Once you know what is a dot, circle or number you could use an OCR library (Tesseract or any other OCR lib) to try and recognize the numbers. You could also use a neural network library (maybe trained with the MNIST dataset) to recognize the digits. A good one would be a convolutional neural network similar to LeNet-5.
As you can see, this is a problem that requires many different steps to solve, and many different components are involved. The steps I suggested might not be the best, but with some work I think it can be solved.
I have 55 000 image files (in both JPG and TIFF format) which are pictures from a book.
The structure of each page is this:
some text
--- (horizontal line) ---
a number
some text
--- (horizontal line) ---
another number
some text
There can be from zero to 4 horizontal lines on any given page.
I need to find what the number is, just below the horizontal line.
BUT, numbers strictly follow each other, starting at one on page one, so in order to find the number, I don't need to read it: I could just detect the presence of horizontal lines, which should be both easier and safer than trying to OCR the page to detect the numbers.
The algorithm would be, basically:
for each image
count horizontal lines
print image name, number of horizontal lines
next image
The question is: what would be the best image library/language to do the "count horizontal lines" part?
Probably the easiest way to detect your lines is using the Hough transform in OpenCV (which has wrappers for many languages).
The OpenCV Hough tranform will detect all lines in the image and return their angles and start/stop coordinates. You should only keep the ones whose angles are close to horizontal and of adequate length.
O'Reilly's Learning OpenCV explains in detail the function's input and output (p.156).
If you have good contrast, try running connected components and analyze the result. It can be an alternative to finding lines through Hough and cover the case when your structured elements are a bit curved or a line algorithm picks up the lines you don’t want it to pick up.
Connected components is a super fast, two raster scan algorithm and will give you a mask with all you connected elements in it marked with different labels and accounted for. You can discard anything short ( in terms of aspect ratio). Overall, this can be more general, faster but probably a bit more involved than running Hough transform. The Hough transform on the other hand will be more tolerable for contrast artifacts and even accidental gaps in lines.
OpenCV has the function findContours() that find components for you.
you might want to try John' Resig's OCR and Neural Nets in Javascript