How to describe algorithm in C - image-processing

I'm learning Image processing. My problem is Segmentation in RGB vector space. How to describe the Euclidean distance(Fomula 6.7-1 chap6 in Image processing- Gonzalez) to segment RGB in C programming. Thanks.

Minimum requirements to solve this problem:
1) learn C (to at least an medium-advanced level)
presuming that you're not going to decode jpegs or whatever image format you have from scratch:
2) learn how to use libraries in C.
3) find a library that allows you to read and write the image file format at hand
4) implement the algorithm and apply it to the image data

Related

Research Paper Implementation (Image Processing)

I'm trying to implement this paper on my own, but there're some parts I don't fully understand.
UIumbra has three channels, since it's the result of multiplication between I and n, where I is an original (color) image.
Q. Step 4 requires a binarization image B1 from UIumbra. It uses an integral image technique for binarization, which is equivalent to OpenCV's adaptiveThreshold. Unfortunately, adaptiveThreshold() takes a grayscale image. Is there any method to convert UIumbra to grayscale or does cv2.cvtCOLOR(UI, COLOR_BGR2GRAY) suffice?
Q. LBWF is a binary version of LWF. LWF takes and returns a grayscale image. How do you make a binary version? (ex. binarize the input?)
The paper doesn't explain those details, so I'm having troubles.
(I did send an email to the author, waiting for the answer. Meanwhile, I want to hear your thoughts)
Any help or idea is appreciated.

Image format in segmentation via neural networks

I am doing segmentation via deep learning in pytorch. My dataset is a .raw/.mhd format ultrasound images.
I want to input my dataset into the system via data loader.
I faced few important questions:
Does changing the format of the dataset to either .png or .jpg make the segmentation inaccurate?(I think I lost some information in this way!)
Which format is less data lossy?
How should I make a dumpy array if I don't convert the original image format, i.e., .raw/.mhd?
How should I load this dataset?
Knowing nothing about raw and mhd formats, I can give partial answers.
Firstly, jpg is lossy and png is not. So, you're surely losing information in jpg. png is lossless for "normal" images - 1, 3 or 4 channel, with 8 bit precision in each (perhaps also 16 bits are also supported, don't quote me on that). I know nothing about ultrasound images, but if they use higher precision than that, even png will be lossy.
Secondly, I don't know what mhd is and what raw means in the context of ultrasound images. That being said, a simple google search reveals some package for reading the former to numpy.
Finally, to load the dataset, you can use the ImageFolder class from torchvision. You need to write a custom function which loads an image given its path (for instance using the package mentioned above) and pass it to the loader keyword argument.

Improve image quality

I need to improve image quality, from low quality to high hd quality. I am using OpenCV libraries. I experimented a lot with GaussianBlur(), Laplacian(), transformation functions, filter functions etc, but all I could succeed is to convert image to hd resolution and keep the same quality. Is it possible to do this? Do I need to implement my own algorithm or is there a way how it's done? I will really appreciate any kind of help. Thanks in advance.
I used this link for my reference. It has other interesting filters that you can play with.
If you are using C++:
detailEnhance(Mat src, Mat dst, float sigma_s=10, float sigma_r=0.15f)
If you are using python:
dst = cv2.detailEnhance(src, sigma_s=10, sigma_r=0.15)
The variable 'sigma_s' determines how big the neighbourhood of pixels must be to perform filtering.
The variable 'sigma_r' determines how the different colours within the neighbourhood of pixels will be averaged with each other. Its range is from: 0 - 1. A smaller value means similar colors will be averaged out while different colors remain as they are.
Since you are looking for sharpness in the image, I would suggest you keep the kernel as minimum as possible.
Here is the result I obtained for a sample image:
1. Original image:
2. Sharpened image for lower sigma_r value:
3. Sharpened image for higher sigma_r value:
Check the above mentioned link for more information.
How about applying Super Resolution in OpenCV? A reference article with more details can be found here: https://learnopencv.com/super-resolution-in-opencv/
So basically you will need to have the Python dependency opencv-contrib-python installed, together with a working version of opencv-python.
There are different techniques for the Super Resolution in OpenCV you can choose from, including EDSR, ESPCN, FSRCNN, and LapSRN. Code examples in both Python and C++ have been included in the tutorial article as well for easy reference.
A correction is needed
dst = cv2.detailEnhance(src, sigma_s=10, sigma_r=0.15)
using kernel will give error.
+1 to kris stern answer,
If you are looking for practical implementation of super resolution using pretrained model in OpenCV, have a look at below notebook also video describing details.
https://github.com/pankajr141/experiments/blob/master/Reasoning/ComputerVision/super_resolution_enhancing_image_quality_using_pretrained_models.ipynb
https://www.youtube.com/watch?v=JrWIYWO4bac&list=UUplf_LWNn0a9ubnKCZ-95YQ&index=4
Below is a sample code using opencv
model_pretrained = cv2.dnn_superres.DnnSuperResImpl_create()
# setting up the model initialization
model_pretrained.readModel(filemodel_filepath)
model_pretrained.setModel(modelname, scale)
# prediction or upscaling
img_upscaled = model_pretrained.upsample(img_small)

How to improve Text recognition usingTesseract OCR.?

I had implemented tesseract ocr for text recognition in IOS.I had preprocessed the input image and give into Tesseract method.It gives poor recognition result.
Steps:
1.Erode function
2.Dilate function
3.Bitwise_not function
Mat MCRregion;
cv::dilate ( MCRregion, MCRregion, 24);
cv::erode ( MCRregion, MCRregion, 24);
cv::bitwise_not(MCRregion, MCRregion);
UIImage * croppedMCRregion = [self UIImageFromCVMat:MCRregion];
Tesseract* tesseract = [[Tesseract alloc] initWithDataPath:#"tessdata" language:#"eng"];
[tesseract setVariableValue:#"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.>,'`;-:</" forKey:#"tessedit_char_whitelist"];
[tesseract setImage:[self UIImageFromCVMat:MCRregion]];
// [tesseract setImage:image];
[tesseract recognize];
NSLog(#"%#", [tesseract recognizedText]);
Input Image:
Image Link
1.How to Improve text recognition rate using Tesseract ?
2.Is any other pre processing steps applied in Tesseract.?
3.Is dewarp text Done in Tesseract OCR.?
Tesseract is a highly configurable piece of software -- though its configurations are poorly documented (unless you want to dig deep in the 150K lines of code). A good comprehensive list is present here http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version.
Also look at
https://code.google.com/p/tesseract-ocr/wiki/ControlParams and https://code.google.com/p/tesseract-ocr/wiki/ImproveQuality
You can improve the quality tremendously if you feed more info about the data you're OCR'ing.
e.g. in case the images are all National IDs or Passports which follow certain standard MRZ formats, you can configure tesseract to use that info.
For the image you attach (an MRZ), i got the following result,
IDFRADOUEL<<<<<<<<<<<<<<<<<<<<9320
05O693202O438CHRISTIANE<<N1Z90620<3
by using the following config
# disable dict, freq tables etc which would distract OCR'ing an MRZ
load_system_dawg F
load_freq_dawg F
load_unambig_dawg F
load_punc_dawg F
load_number_dawg F
load_fixed_length_dawgs F
load_bigram_dawg F
wordrec_enable_assoc F
# mrz allows only these chars
tessedit_char_whitelist 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ<
Also see that your installation is trained for the fonts to get more accurate results. In your case it seems it is OCR-B font.
It is not necessary to go through the tedious task of retraining Tesseract. Yes, you will get much better results but in some cases you can get pretty far with the ENG training set.
You can improve your results by paying attention to the following things:
Use a binary image as input and make sure you have black text on a white background
By default Tesseract will try to make words out of things that have no spacing. Try to segment each character seperately and place them in a new image with lots of spacing. Especially if you have combinations of letters and numbers Tesseract will "correct" this to match the surrounding characters.
Try to segment different parts of your image with a whitelist for the characters you know should be in there. If your only looking for digits in the first part then use a seperate instance of Tesseract to detect these numbers with a number only whitelist.
If you use the same object multiple times without resetting it Tesseract seems to have a memory. That means that you can get a different result each time you perform OCR. You can reset Tesseract to counter this or just create a new object.
Last but not least, use the resultIterator to go through the boxes that Tesseract can give as a result. You can check the size and confidence of each character and filter accordingly.
Based on my experience:
1.How to Improve text recognition rate using Tesseract ?
Firstly, preprocessing. Ensure that the input image is binary image with a good threshold. OpenCV has a good set of functions to apply threshold algorithms such as the Otsu algorithm as well as contour detection to help with warping and rotation.
You can also use contour detection in OpenCV to distinguish between lines of text.
Some filtering would also remove noise which often confuse tesseract and increase processing time.
Set up proper configurations for tesseract (e.g. eng.config). Full list of configs here (http://www.sk-spell.sk.cx/tesseract-ocr-parameters-in-302-version). Some examples include blacklists, whitelists, chopping, etc...
Use proper flags. E.g. -psm 6 if you are doing blocks of text rather than lines
Having trained my own language data... I would say do so only if you have lots of time and resources. Or if your font is very peculiar (e.g. dot matrix).
More recent versions of Tesseract (closer to 3.0) allow for multiple language files to be used on the same pass (-l one+two). This means you can have one specially trained for text and another for numbers. In our case, it seemed to work well.
Postprocessing of tesseract results was particularly important for us too. String replacements of typical mis-recognitions and what not.
2.Is any other pre processing steps applied in Tesseract.?
Tesseract uses leptonica library for preprocessing.
3.Is dewarp text Done in Tesseract OCR.?
I am inclined to think yes. Considering that warping functions are part of leptonica.

OpenCV Multilevel B-Spline Approximation

Hi (sorry for my english) .. i'm working in a project for University in this project i need to use the MBA (Multilevel B-Spline Approximation) algorithm to get some points (control points) of a image to use in other operations.
I'm reading a lot of papers about this algorithm, and i think i understand, but i can't writing.
The idea is: Read a image, process a image (OpenCV), then get control points of the image, use this points.
So the problem here is:
The algorithm use a set of points {(x,y,z)} , this set of points are approximated with a surface generated with the control points obtained from MBA. the set of points {(x,y,z)} represents de data we need to approximate (the image)..
So, the image is in a cv::Mat format , how can transform this format to an ordinary array to simply access to the data an manipulate...
Here are one paper with an explanation of the method:
(Paper) REGULARIZED MULTILEVEL B-SPLINE REGISTRATION
(Paper)Scattered Data Interpolation with Multilevel B-splines
(Matlab)MBA
If someone can help, maybe a guideline, idea or anything will be appreciate ..
Thanks in advance.
EDIT: Finally i wrote the algorithm in C++ using armadillo and OpenCV ...
Well i'm using armadillo a C++ linear algebra library to works with matrix for the algorithm

Resources