I've the following issue:
starting from an image (example: a picture of my food pantry) I have to split in sub-images according to different objects contained in it.
after having isolated each object (and got its image) preprocess the image with split() function dividing channels
detect text contained in each image (example "peanuts", on the peanuts pack) and read text contained in it (using tesseract libs)
detect symbols in each image (example coca-cola logo, on a coca-cola bottle) using a SURF library...
Explained my purpose, the question is:
- how can I perform activity 1?
Related
I'm currently an MS student in Medical Physics and I have a great need to be able to overlay an isodose distribution from an RTDOSE file onto a CT image from a .dcm file set.
I've managed to extract the image and the dose pixel arrays myself using pydicom and dicom_numpy, but the two arrays are not the same size! So, if I overlay the two together, the dose will not be in the correct position based on what the Elekta Gamma Plan software exported it as.
I've played around with dicompyler and 3DSlicer and they obviously are able to do this even though the arrays are not the same size. However, I think I cannot export the numerical data when using these softwares.I can only scroll through and view it as an image. How can I overlay the RTDOSE to an CT image?
Thank you
for what you want it sounds like you should use Simple ITK (or equivalent - my experience is with sitk) to do the dicom handling, not pydicom.
Dicom has built in a complete system for 3D point and location specifications for all the pixel data in patient coordinates. This uses a bunch of attributes in the dicom files in the Image Plane Module set of tags. See here for a good overview.
The simple ITK library fully understands and uses the full 3D Image Plane tags to identify and locate any images in patient coordinates by default - irrespective of such things as the specific pixel spacing, slice thickness etc etc.
So - in your case - if you use SITK to open your studies, then you should be able to overlay them correctly "out of the box", because SITK will do all the work to parse the Image Plane Module tags and locate the data in patient coordinates - just like you get with 3DSlicer.
Pydicom, in contrast, doesn't itself try to use any of that information at all. It only gives you the raw pixel arrays (for images).
Note I use both pydicom and SITK. This isn't something bad about pydicom, but more a question of right tool for the job. In fact, for many (most?) things I use pydicom, but for any true 3D type work, SITK is the easier toolkit to use.
I'm working on a college project that involves OCRing a certain digit-code (with a few other characters as seperators - mainly '.','/' etc..) .
that digit code (printed on products for example) is usually in "digital" fonts (e.g. 7-segment-like font, or a pixelated font etc.).
So I am trying to train Tesseract on several digital fonts I've found online, similar to those used with these code.
The thing is, that Tesseract recognizes the tiff files I provide it as blank pages.
Things I've tried:
1. creating a .box file using JTesseract & qt-box (and adjusting the boxes manually) : in this case, the box & tiff are read by Tesseract and I'm getting the output "1 Page", but no characters are recognized and the tr file in blank.
creating a .box file with Tesseract's makebox - in this case no boxes are created at all.
PS - I manage to train it just fine using more traditional fonts (Arial for example)
Any ideas?
Im attaching an image of such an example font.
Thank you!
I managed to work around most of the issues. Posting it in case it could help anyone else:
I did 2 steps to get Tesseract to identify my text:
Image processing on the training images - I've applied some image processing methods (mainly dilate, erode and some blur) to sort of "connect" the pixels in the text that were segmented or separated from one another. Its VERY IMPORTANT to apply the same steps exactly to the images to be fed to the OCR.
I've noticed that simply saving images as TIFF/PNGs via code doesn't save the DPI setting in the header for some reason (and Tesseract identified the as 0 DPI). I assume there's a code-way to do that but I didn't have time, so I just opened the files in Photoshop and saved them from there.
I'm not entirely sure if it was step 1,2 or both that solved my issue, but most characters were eventually identified.
We make images like the following in Excel. The raw image is imported and positioned in the generally correct area within the annotations, which are themselves images linked to ranges, the contents of which differ depending on selections made by the user.
The absolute position and dimensions of each annotation must be adjusted manually for every image. The number of sample names can vary (up to 12 lanes of samples). The size ladder on the left can also vary depending on the size of the protein being analyzed.
After everything is correctly sized and aligned, the range containing the raw image + annotations is copied and saved as a jpg (which is then imported into an Access database).
Though we've automated some parts of this with VBA, the process of tweaking every image (widths of columns, text size, position of size ladder, etc.) can get very tedious. Surely there is some software out there that will make this process more efficient. It takes one of our staff members hours to adjust and finalize about 10-20 of these images.
Any recommendations are welcomed.
This procedure is called electrophoresis. Samples (in this case proteins) are loaded into a polyacrylamide gel (each sample in its own "lane") and pulled through the gel with electricity. This process separates all of the proteins in each lane by size and charge.
The "ladder" is a solution of various proteins of known size. It's used to determine the sizes of the proteins in the other lanes.
The image on the left contains the range of sizes in the ladder (in this case 10, 15,...150, 200). Each "step" in the ladder image is aligned with the black bands that appear in the ladder lane in the experiment (the actual ladder lane that contains the black bands is not present in this case...it's cropped post-alignment to improve the overall look of the image).
The images on the right are protein names and point to the location on the gel where that particular protein should appear. The protein Actin, for example, is supposed to come out at around 42 kilodaltons. The fact that there is a prominent black band in that location is good supporting evidence that this sample contains Actin protein.
Many gels will also describe the sample source at the top or the bottom. So, for example, if the sample in lane 1 was derived from mouse liver cells, lane 1 would be annotated as "mouse liver."
The raw image is captured in the lab and is saved as a jpg. This jpg is then manually copied directly into an Excel sheet, where extraneous parts of the image are cropped. The cropped image is then moved to within the area of the worksheet that contains the annotations (ladder, protein names, sample names). These annotations are themselves images (linked to other ranges in the workbook that change with every experiment...protein names, samples names, ladder type can be different for every experiment). These annotation images require fine positioning in each case (as described previously) to align with the lanes and with the protein sizes. Once everything is aligned, it is saved as a jpg and moved into Access.
My question is...Is there software already out there designed specifically for tasks like these? Just as Excel is not a database program, it is also not an image annotation program. I want to know if there is an application out there, ready to go, that is specifically designed to annotate images with elements that can vary from image to image.
Of course, there will still be a need for manually moving elements around the image to get everything aligned (I'm not looking for a miracle here). I'm thinking that there has to be something better than Excel for this.
I'm trying to read the NIRPP number (social security number) from a French vital card using Tesseract's OCR (I'm using TesseractOCRiOS 4.0.0). So here is what I'm doing :
First, I request a picture of the whole card :
Then, using a custom cropper, I ask the user to zoom specifically on the card number:
And then I catch this image (1291x202px) and using Tesseract I try to read the number:
let tesseract = G8Tesseract(language: "eng")
tesseract?.image = pickedImage
tesseract?.recognize()
print("\(tesseract?.recognizedText ?? "")")
But I'm getting pretty bad results... only like 30% of the time Tesseract is able to find the right number, and among these sometimes I need to trim some characters (like alpha characters, dots, dashes...).
So is there a solution for me to improve these results?
Thanks for your help.
To improve your results :
Zoom your image to appropriate level. Right amount of zoom will improve your accuracy by a lot.
Configure tesseract so that only digits are whitelisted . I am
assuming here what you are trying to extract contains only digits.If
you whitelist only digits then it will improve your chances of
recognizing 0 as 0 and not O character.
If your extracted text matches a regex, you should configure
tesseract to use that regex as well.
Pre process your image to remove any background colors and apply
Morphology effects like erode to increase the space between your
characters/digits. If they are too close , tesseract will have
hard time recognizing them correctly. Most of the image processing
library comes prebuilt with those effects.
Use tiff as image format.
Once you have the right preprocessing pipeline and configuration for tesseract , you will usually get a very good and consistent result.
There are couple of things you need to do it....
1.you need to apply black and white or gray scale on image.
you will use default functionality like Graphics framework or third party libray like openCV or GPUImage for applying black&white or grayscale.
2.and then apply text detection using Vision framework.
From vision text detection you can crop texts according to vision text detected coordinates.
3.pass this cropped images(text detected) to TesseractOCRiOS...
I hope it will work for your use-case.
Thanks
I have a similar issue. I discovered that Tesseract recognizes a text only if the given image contain a region of interest.
I solved the problem using Apple' Vision framework. It has VNDetectTextRectanglesRequest that returns CGRect of detected text according to the image. Then you can crop the image to region where text is present and send them to Tesseract for detection.
Ray Smith says:
Since HP had independently-developed page layout analysis technology that was used in products, (and therefore not released for open-source) Tesseract never needed its own page layout analysis. Tesseract therefore assumes that its input is a binary image with optional polygonal text regions defined.
I have a 92 page catalogue (one image per page, multiple products per image) and no product codes on the image for each product.
Does anyone know of a photoshop action to allow entry of a stock code (<15chars text), that will create a filled, outline box with the text inside? It will be awful to have to do them by hand - there are hundreds and hundreds of products.
If all images have the same code, you could record the action which basically creates a macro of your activity for a single image. (Look for the record/playback buttons on the actions pane.)
If all the images have different codes, you might be better off writing (or commissioning) a small script to process the images, such as a PHP script with GD or C# and a graphics object; in both cases reading from a file so it applies the correct code to the image. However this method wouldn't give you an Adobe Photoshop document at the end of the day with an editable text box; it would be a flattened image (such as a TIFF) with the product code already rendered as part of the image.