Check Digit with Tesseract OCR not confident enough - image-processing

I am trying to recognize the line with higher confidence, it's based on check digit algorithm, so it has to be 100% confidence in order to pass the check digit and extracting the data fine.
I tried with OcrLanguage.Financial and Default OCR, also I tried to scale the image to 300-600 DPI.
what I can do for image preprocessing to accurately extract the line
var Ocr = new IronTesseract();
Ocr.Language = OcrLanguage.Financial;
using (var Input = new OcrInput())
{
Input.AddImage(#"Img/img.tiff");
Input.TargetDPI = 600;
Input.SaveAsImages();
//Input.DeNoise();
//Input.Deskew();
IronOcr.OcrResult Result = Ocr.Read(Input);
Console.WriteLine(Result.Text);
Console.ReadLine();
}

Related

How to rotate a non-squared image in frequency domain

I want to rotate an image in frequency domain. Inspired in the answers in Image rotation and scaling the frequency domain? I managed to rotate square images. (See the following python script using OpenCV)
M = cv2.imread("lenna.png")
M=np.float32(M)
hanning=cv2.createHanningWindow((M.shape[1],M.shape[0]),cv2.CV_32F)
M=hanning*M
sM = fftshift(M)
rotation_center=(M.shape[1]/2,M.shape[0]/2)
rot_matrix=cv2.getRotationMatrix2D(rotation_center,angle,1.0)
FsM = fftshift(cv2.dft(sM,flags = cv2.DFT_COMPLEX_OUTPUT))
rFsM=cv2.warpAffine(FsM,rot_matrix,(FsM.shape[1],FsM.shape[0]),flags=cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT)
IrFsM = ifftshift(cv2.idft(ifftshift(rFsM),flags=cv2.DFT_REAL_OUTPUT))
This works fine with squared images. (Better results could be achieved by padding the image)
However, when only using a non-squared portion of the image, the rotation in frequency domain shows some kind of shearing effect.
Any idea on how to achieve this? Obivously I could pad the image to make it square, however the final purpose of all this is to rotate FFTs as fast as possible for an iterative image registration algorithm and this would slightly slow down the algorithm.
Following the suggestion of #CrisLuengo I found the affine transform needed to avoid padding the image. Obviously it will depend on the image size and the application but for my case avoidding the padding is very interesting.
The modified script looks now like:
#rot_matrix=cv2.getRotationMatrix2D(rotation_center,angle,1.0)
kx=1.0
ky=1.0
if(M.shape[0]>M.shape[1]):
kx= float(M.shape[0]) / M.shape[1]
else:
ky=float(M.shape[1])/M.shape[0]
affine_transform = np.zeros([2, 3])
affine_transform[0, 0] = np.cos(angle)
affine_transform[0, 1] = np.sin(angle)*ky/kx
affine_transform[0, 2] = (1-np.cos(angle))*rotation_center[0]-ky/kx*np.sin(angle)*rotation_center[1]
affine_transform[1, 0] = -np.sin(angle)*kx/ky
affine_transform[1, 1] = np.cos(angle)
affine_transform[1, 2] = kx/ky*np.sin(angle)*rotation_center[0]+(1-np.cos(angle))*rotation_center[1]
FsM = fftshift(cv2.dft(sM,flags = cv2.DFT_COMPLEX_OUTPUT))
rFsM=cv2.warpAffine(FsM,affine_transform, (FsM.shape[1],FsM.shape[0]),flags=cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT)
IrFsM = ifftshift(cv2.idft(ifftshift(rFsM),flags=cv2.DFT_REAL_OUTPUT))

i wanted to detect objects in a hsv image. but i keep getting an error,,Expected Ptr<cv::UMat> for argument '%s'

i was trying to create a trackbar window and get hsv value of the image by adjusting the trackbar. created a mask and then adjusted the trackbar to detect an object of the hsv image
enter code here
def nothing(x):
pass
cv.namedWindow("Tracking")
cv.createTrackbar("LH","Tracking",0,255,nothing)
cv.createTrackbar("LS","Tracking",0,255,nothing)
cv.createTrackbar("LV","Tracking",0,255,nothing)
cv.createTrackbar("UH","Tracking",255,255,nothing)
cv.createTrackbar("US","Tracking",255,255,nothing)
cv.createTrackbar("UV","Tracking",255,255,nothing)
while True:
frame = cv.imread("C:/Users/acer/Desktop/insects/New folder/ins.jpg")
hsv = cv.cvtColor(frame,cv.COLOR_BGR2HSV)
l_h = cv.getTrackbarPos("LH","Tracking")
l_s = cv.getTrackbarPos("LS","Tracking")
l_v = cv.getTrackbarPos("LV","Tracking")
u_h = cv.getTrackbarPos("UH","Tracking")
u_s = cv.getTrackbarPos("US","Tracking")
u_v = cv.getTrackbarPos("UV","Tracking")
l_b = np.array([l_h,l_s,l_v])
u_b = np.array([u_h,u_s,u_v])
mask = (hsv,l_b,u_b)
res = cv.bitwise_and(frame,frame,mask=mask)
cv.imshow("frame",frame)
cv.imshow("mask",mask)
cv.imshow("res",res)
key = cv.waitKey(1)
if key == 27:
break
cv.destroyAllWindows()
There are a few issues with your code:
1) You have no import statements. You need at least:
import cv2 as cv
import numpy as np
2) Your indentation is incorrect. Your function nothing() should not be indented.
3) You omitted to call inRange(), you need:
mask = cv.inRange(hsv,l_b,u_b)
4) You have scaled the Hue into the range 0..255 when it actually has the range 0..180 when used with uint8 images so that 360 degrees comes out as 180 degrees which is less than the 255 upper limit of uint8.
By the way, it is fairly poor practice to do "loop invariant" stuff inside a loop - I mean the part where you hit the disk every millisecond and re-read the image, re-decode the JPEG and convert it to HSV. All that can be done outside the loop, then inside it, just do a quick memory copy of the HSV image.

Images lose quality after saving as GIF

Im developing an iOS app which allows users to take a sequence of photos - afterwards the photos are put in an animation and exported as MP4 and GIF.
While the MP4 presents the source quality, the GIF color grades are visible.
Here the visual comparison:
GIF:
MP4
The code I use for exporting as GIF:
var dictFile = new NSMutableDictionary();
var gifDictionaryFile = new NSMutableDictionary();
gifDictionaryFile.Add(ImageIO.CGImageProperties.GIFLoopCount, NSNumber.FromFloat(0));
dictFile.Add(ImageIO.CGImageProperties.GIFDictionary, gifDictionaryFile);
var dictFrame = new NSMutableDictionary();
var gifDictionaryFrame = new NSMutableDictionary();
gifDictionaryFrame.Add(ImageIO.CGImageProperties.GIFDelayTime, NSNumber.FromFloat(0f));
dictFrame.Add(ImageIO.CGImageProperties.GIFDictionary, gifDictionaryFrame);
InvokeOnMainThread(() =>
{
var imageDestination = CGImageDestination.Create(fileURL, MobileCoreServices.UTType.GIF, _images.Length);
imageDestination.SetProperties(dictFile);
for (int i = 0; i < this._images.Length; i++)
{
imageDestination.AddImage(this._images[i].CGImage, dictFrame);
}
imageDestination.Close();
});
The code I use for exporting as MP4:
var videoSettings = new NSMutableDictionary();
videoSettings.Add(AVVideo.CodecKey, AVVideo.CodecH264);
videoSettings.Add(AVVideo.WidthKey, NSNumber.FromNFloat(images[0].Size.Width));
videoSettings.Add(AVVideo.HeightKey, NSNumber.FromNFloat(images[0].Size.Height));
var videoWriter = new AVAssetWriter(fileURL, AVFileType.Mpeg4, out nsError);
var writerInput = new AVAssetWriterInput(AVMediaType.Video, new AVVideoSettingsCompressed(videoSettings));
var sourcePixelBufferAttributes = new NSMutableDictionary();
sourcePixelBufferAttributes.Add(CVPixelBuffer.PixelFormatTypeKey, NSNumber.FromInt32((int)CVPixelFormatType.CV32ARGB));
var pixelBufferAdaptor = new AVAssetWriterInputPixelBufferAdaptor(writerInput, sourcePixelBufferAttributes);
videoWriter.AddInput(writerInput);
if (videoWriter.StartWriting())
{
videoWriter.StartSessionAtSourceTime(CMTime.Zero);
for (int i = 0; i < images.Length; i++)
{
while (true)
{
if (writerInput.ReadyForMoreMediaData)
{
var frameTime = new CMTime(1, 10);
var lastTime = new CMTime(1 * i, 10);
var presentTime = CMTime.Add(lastTime, frameTime);
var pixelBufferImage = PixelBufferFromCGImage(images[i].CGImage, pixelBufferAdaptor);
Console.WriteLine(pixelBufferAdaptor.AppendPixelBufferWithPresentationTime(pixelBufferImage, presentTime));
break;
}
}
}
writerInput.MarkAsFinished();
await videoWriter.FinishWritingAsync();
I would appreciate for your help!
Kind regards,
Andre
This is just summarization of mine comments...
I do not code on your platform so I only provide generic answer (and insights from mine own GIF encoder/decoder coding experience).
GIF image format supports up to 8bit per pixel leading to max 256 colors per pixel with naive encoding. Cheap encoders just truncates input image to 256 or less colors usually leading to ugly pixelated results. To increase coloring quality of GIF there are 3 approaches I know of:
Multiple frames covering screen with own palettes
Simply you divide image into overlays each with its own palette. This is slow (in therm of decoding as you need to process more frames per single image which can cause sync errors with some viewers and you need to process all frame related chunks multiple times per single image). The encoding itself is fast as you just either separate the frames based on colors or region/position to multiple frames. Here (region/position based) example:
The sample image is taken from here: Wiki
The GIF supports transparency so the sub frames can overlap ... This approach physically increase the colors per pixel possible to N*256 (or N*255 for transparent frames) where N is the number of frames or palettes used per single image.
Dithering
Dithering is technique that approximate color of area to match colors as closely as possible while using only specified colors (from palette) only. This is fast and easily implementable but the result is kind of noisy. For more info see some related answers of mine:
Converting BMP image to set of instructions for a plotter?
c# image dithering routine that accepts an amount of dithering?
Better color quantization method
Cheap encoders just truncate the colors to predefined palette. Much better results are obtained by clustering the used colors based on histogram. For example see:
Effective gif/image color quantization?
The result is usually much better then dithering but the encoding time is huge in comparison to dithering...
The #1 and #3 can be used together to enhance quality even more ...
If you do not have access to the encoding code or pipeline you still can transform image itself before encoding doing the quantization and palette computation instead and load the result directly to GIF encoder which should be possible (if the GIF encoder you are using is at least a bit sophisticated ...)

How to exclude special characters in tesseract?

I am using tesseract and mcr.traineddata to read MICR numbers from the cheque.
This is the part of the cheque that i want to read.
The below is the part of the text that has been detected from the image.
My question is.....
**How do i exclude the special characters from the image?
Would training tesseract for the special characters be an option?
**
Except the special characters the rest of the numbers are getting detected.
My code
let tesseract = G8Tesseract()
tesseract.language = "mcr"
tesseract.engineMode = .TesseractOnly
tesseract.pageSegmentationMode = .Auto
tesseract.maximumRecognitionTime = 60.0
imageView.image = imageView.image?.g8_grayScale()
imageView.image = imageView.image?.g8_blackAndWhite()
tesseract.image = imageView.image
tesseract.recognize()
I created a new traineddata file(my.traineddata). I trained the special characters to be recognized as 'X'. The more images we use the accurate the traineddata file. Then we can accordingly manipulate the recongized text.

opencv sliding window

Is there any built in library for sliding a window (custom size) over an image in opencv version 2.x?
I tried to write the algorithm by myself but I found it very painful and probably error-prone.
I need to slide over an image and create histogram for the input of svm.
there is one for HOG Descriptor, which calculates HOG features but I have my own feature set so I just need an algorithm to let me slide over an image.
You can define a Region of Interest (ROI) on a cv::Mat object, which gives you a new Mat object referring to the sub-window. This does not copy the underlying data, merely a new header with the appropriate metadata.
cv::Mat::operator()
See also this other question:
OpenCV C++, getting Region Of Interest (ROI) using cv::Mat
Basic code can looks like. The code is described good enought. I hope.
This is single scale slideing window 60x60 witch Step 30.
Result of this simple example is ROI.
You can visit this basic tutorial Tutorial Here.
// Parameters of your slideing window
int windows_n_rows = 60;
int windows_n_cols = 60;
// Step of each window
int StepSlide = 30;
for (int row = 0; row <= LoadedImage.rows - windows_n_rows; row += StepSlide)
{
for (int col = 0; col <= LoadedImage.cols - windows_n_cols; col += StepSlide)
{
Rect windows(col, row, windows_n_rows, windows_n_cols);
Mat Roi = LoadedImage(windows);
}
}

Resources