Removing a tick mark made using Pen in a scanned pdf - image-processing

I have a PDF file with Multiple choise questions . The answered are marked . I would like to remove the "tick"portion of it without majorly altering the sheet . I would like to automate it for the entire PDF .In the images below , would like to remove the "tick" mark alone . My input file is a PDF . For sake of representation have give a image .
Note . I use GNU/Linux
[

The main difference that I can see between the checkmark and the rest of the image is the thickness of the black line. Thus we can replace the thicker black line with a sample of the background of your image
Here is a way to do this using Python/OpenCV/Numpy:
import cv2
import numpy as np
#load the image
img = cv2.imread("pen_mark.png")
#Threshold the image to a 2-channel black and white image with the text as white and the background as black.
black_and_white = (img < 128).astype(np.uint8) * 255
#erode the thin lines, leaving only the thicker pen-drawn checkmark.
checkmark = cv2.morphologyEx(black_and_white, cv2.MORPH_ERODE, np.ones((5,5)))
#dilate the remaining checkmark back to slightly larger than its original size
checkmark = cv2.morphologyEx(checkmark, cv2.MORPH_DILATE, np.ones((7,7)))
#switch the left and right side of the original image in order to get a sample of the
#original speckled background overlaying the position of the checkmark.
rolled_img = np.roll(img, shift=img.shape[1]//2)
#replace the pixels of the original checkmark with the background
img[checkmark.nonzero()] = rolled_img[checkmark.nonzero()]
#save the new image
cv2.imwrite("result.png", img)
Note that this probably won't be a reliable solution for all of your PDFs. I assume the thickness of the pen mark will change. There will probably also be cases where the pen mark overlaps other lines and you'll end up with blank spots. Another problem is the difficulty in setting the background so that it blends nicely with its surroundings.

Related

Remove color cast using libvips

I have sRGB images with color casts. To remove it manually I usually use Photoshop Level Adjustments. Photoshop also have tools for that: Auto Contrast or even better Auto Tone which also takes shadows, midtones & highlights into account.
If I remove the cast manually I adjust each of the RGB channels individually so that the darkest pixels are set to pure black and the lightest to pure white and then redistribute all other values (spreading the histogram). This is a simple approach but shows good results for my images.
In my node.js app I'm using sharp for image processing which uses libvips as its processing engine. I tried to remove the cast with .normalize() but this command works on all channels together and not individual for each of the RGB channels. So it doesn't work for me.
I also asked this question on the sharp project page. I tested the suggestion from lovell to try it with hist_local but the results are not useable for me.
Now I would like to find out how this could be done using the native libvips. I've played around with nip2 GUI and different commands but could not figure out how it could be achieved:
Histogram > Equalise Histogram > Global => Picture looks over saturated
Image > Levels > Scale to 0 - 255 => Channels ar not all spreading from 0 - 255 (I don't understand exactly what this command does?)
Thanks for every hint!
Addition
Here is a example with pictures from Photoshop to show what I want.
The source image is a picture of a frame from a film negative.
Image before processing
Step1 Invert image
Image after inversion
Step2 using Auto tone in Photoshop (works the same way as my description above about manually remove the color cast)
Image after Auto Tone
This last picture is ok for me.
nip2 has a menu item for this.
Load your image and mark a region on it containing the area you'd like to be neutral. It can be any lightness, it doesn't need to be white.
Use File / Open to get the file dialog and you should see the image loaded in your workspace as a thumbnail.
Doubleclick on the thumbnail to open an image view window.
In the view window, zoom and pan to the right spot. The user guide (press F1) has a section on image navigation.
Hold down CTRL and click and drag down and right to mark a rectangular region.
Back in the main window, click Toolkits / Tasks / Capture / White balance. You should see something like:
You can drag an resize your region to change the neutral point. Use the colour picker to set what white means. You can make other whites with (for example) Colour / New / Colour from CCT and link them together.
Click Colour / New / Colour from CCT to make a colour picker from CCT (correlated colour temperature) -- the temperature in Kelvin of that white.
Set it to something interesting, like 4800 for warm white.
Click on the formula for A5.white to edit it, and enter the cell of your CCT widget (A7 in this case).
Now you can drag the region to adjust the pixels to set the neutral from, and drag the CCT slider to set the temperature.
It can be annoying to find things in the toolkit menu. There's a thing for searching toolkits: in the main window, click View / Toolkit browser. You can enter something like "white" and it'll show related toolkit entries.
Here's another answer, but using pyvips and responding to the previous comments. I didn't want to delete the first answer as it still seemed useful.
This version finds the image histogram, searches for thresholds which will select 0.5% and 99.5% of pixels in each image band, then rescales the image so that those pixel values become 0 and 255.
import sys
import pyvips
# trim off this percentage of pixels from the top and bottom
trim_percent = 0.5
def percent(hist, percentage):
"""From a histogram, find the threshold above which lie
#percentage of pixels."""
# normalised cumulative histogram
norm = hist.hist_cum().hist_norm()
# column and row profile over percentage
c, r = (norm > norm.width * percentage / 100).profile()
return r.avg()
image = pyvips.Image.new_from_file(sys.argv[1])
# photographic negative
image = image.invert()
# find image histogram, split to set of separate bands
bands = image.hist_find().bandsplit()
# for each band, the low and high thresholds
low = [percent(band, trim_percent) for band in bands]
high = [percent(band, 100 - trim_percent) for band in bands]
# rescale image
scale = [255.0 / (h - l) for h, l in zip(high, low)]
image = (image - low) * scale
image.write_to_file(sys.argv[2])
It seems to give roughly similar results to the PS button. If I run:
$ ./autolevel.py ~/pics/before.jpg x.jpg
I see:
In the meantime I've found the Simplest Color Balance Algorithm which exactly describes the problem with color casts and there you can also find a C source code.
It is exactly the same solution as John describes in his second answer but as a small piece of c-code.
I'm now trying to use it as C/C++ addon with N-API under node.js.

Parameter to isolate frames with colored lines

I'm writing a code that should detect frames in a video that have colored lines. I'm new to openCV and would like to know if I should evaluate saturation, entropy, RBG intensity, etc. The lines, as shown in the pictures, come in every color and density. When black and white, but they are all the same color inside a given frame. Any advice?
Regular frame:
Example 1:
Example 2:
You can use something like this to get the mean Saturation and see that it is lower for your greyscale image and higher for your colour ones:
#!/usr/bin/env python3
import cv2
# Open image
im =cv2.imread('a.png',cv2.IMREAD_UNCHANGED)
# Convert to HSV
hsv=cv2.cvtColor(im,cv2.COLOR_BGR2HSV)
# Get mean Saturation - I use index "1" because Hue is index "0" and Value is index "2"
meanSat = hsv[...,1].mean()
Results
first image (greyish): meanSat = 78
second image (blueish): meanSat = 162
third image (redish): meanSat = 151
If it is time-critical, I guess you could just calculate for a small extracted patch since the red/blue lines are all over the image anyway.

How to add transparent border OpenCV copymakeborder

I have 26 PNG files, each with an image of a letter of the alphabet. They've all been fully cropped to the letter shape with the result that when I insert them into an image, letters with tails all 'sit on the line'
Each letter is in black, with a transparent background. Each PNG has different dimensions, because of the differing letter shapes
I thought I'd remediate this by adding a transparent border of a different size depending on the source file, to make common datum for all the letters, so that 'a' for example would have some transparent space at the bottom.
I've coded up the calculcation for each letter, but I have two issues:
1) Even before applying the operation, I can't seem to read the file in and write it to a new unchanged file in OpenCV. The transparency in the image is replaced with black.
2) While I can add a colour border, I can't seem to add a transparent border.
Original Image:
Read in, and written out:
Apparenly with a blue border, but maximum transparancy:
I have a feeling that if I can sort out the first problem, the second might fall in line. Here is my code:
img = cv2.imread(file)
img_with_border = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=[-255,0,0,255])
#img_with_border = img
cv2.imwrite(newfile, img_with_border, [cv2.IMWRITE_JPEG_QUALITY, 100])
I'd appreciate some help on what is going on here with transparancy. Is OpenCV the right tool to use?
Thanks,
Jeff.
To load a PNG image with 4 channels in OpenCV, use im = cv2.imread(file, cv2.IMREAD_UNCHANGED). You will obtain a BGRA image.
To change the alpha value, you have to change the fourth channel of the image. This means that to create your transparent border you have to pass a value (B, G, R, 0) and not [-255, 0, 0, 255]. (What is that -255 by the way ?). B, G and R can be 0, it doesn't matter.
Also, make sure you write to a PNG image to keep the transparency. You seem to be writing your result as JPEG.

How to remove watermark from TIFFs to improve OCR

I have a bunch of uncompressed bitonal TIF document images. All of them have a watermark in the middle. When I run them through OCR, the text that overlaps with the watermark does not get recognized. I am trying to see if I can apply some type of cleanup to remove those watermarks to be able to recognize the missing text.
Again, the images are black and white, but when you look at the watermark it appears grey since it has a pattern of black and white pixels that makes the letters in the watermark less "dense" than regular text. At the same time, the watermark letters are very big, much bigger than the regular text.
An example of a somewhat similar image is this (except this one is color and the watermark characters in my case are a lot thicker and bigger; my watermarks are also a lot shorter: only 3 to 4 letters long)
It seems that there might be some sort of clean up filter that would be similar to removing large black borders from an image except borders are ually "denser" than a watermark so they appear "more black".
I have 3 tools at my disposal: GIMP, ImageMagick and IrfanView. Can you recommend any specific features of any subset of these tools that might help me?
Playing with contrast etc did not help, but I found a different way. As stated above, the regular text is a lot "denser" than the watermark text meaning that a regular black pixel has more surrounding black pixels than a watermark black pixel. So I devised a simple window-based filtering and thresholding algorithm.
Here's how I did it in Matlab, using a 5X5 window:
im=imread('imageWithWmark.tif');
imInv = ~im;
nr=size(imInv,1);
nc=size(imInv,2);
d = 2; % for 5X5 window
counts = zeros(nr,nc);
for rr = d+1 : nr-d-1
for cc = d+1 : nc-d-1
counts(rr,cc) = nnz(imInv(rr-d:rr+d,cc-d:cc+d));
end
end
thresh=10; % 10 out of 25 -- the larger the thresh the thinner the resulting letters are
imThresh = (counts>=thresh) & imInv;
imwrite(~imThresh,sprintf('Thresh_%d.tif',thresh),'Compression','none','Resolution',300);
Of course, the size of the window, the threshold and other parameters depend on the parameters of the regular text on the page (letter bigger/smaller, thicker/thinner etc) but even this initial version worked pretty well

Is there an easy way to cut a slice from an image using Gimp?

Wondering if there is an easy way to remove a rectangular slice across the entire width of an image using Gimp, and have the resulting hole closed up automatically. I hope that makes sense. If I select a slice across an image and do "cut", it leaves a blank "hole" there. I want the new top and bottom of the image to join and fill that hole, reducing the image height by the amount sliced out.
Any easy way to do this?
Here is a method that is quick and often does what you want:
Cut out the middle, leaving a transparent "hole".
Click anywhere to remove the selection (so the hole is not selected).
Click Image > Zealous crop .
This is going to remove the middle part. However, if you also have transparency in other parts of the image (like around the edges) it's going to remove that transparency too.
I believe you're asking to do something like cut out the middle of a page, leaving the header and footer and have the blank space removed with the cut action, effectively joining the header and footer together.
To my knowledge, I don't believe so. Even if you cut, or delete, that space is still part of the image even without content.
But, you would be able to highlight the top or bottom (or left or right) of the remaining space and drag it to align with the other side. It's not ideal for repetitive tasks, but should get you through if you only have to do it a few times.
Install Python and the Python Imaging Library. Back in GIMP, select and cut the full-width areas you don't want to transparent, and export the image to test.png. Then use this Python code (works only if complete lines are transparent; will not work properly if there are 100%-transparent pixels anywhere other than on a full-width row)—
from PIL import Image
i = Image.open("test.png")
b = i.tobytes()
b2 = ''.join(b[n:n+4] for n in xrange(0,len(b),4) if ord(b[n+3]))
newHeight = len(b2)/i.width/4
i2 = Image.frombytes('RGBA',(i.width,newHeight),b2)
i2.save("test.png")
Then re-load test.png and verify that the areas you cut have gone.
In gimp 2.8.1 you can easily create a new image from a selection. So if you select a rectangular than do a copy (Ctrl-C) and a past in a new image
Edit -> Paste as -> new image (or Ctrl-Shift-V).

Resources