I'm trying to train an image segmentation model on the CamVid dataset. I noticed a couple papers (ESPNet and SegNet) both used a condensed version of the CamVid dataset (12 classes instead of 32).
I can't find the labels for the 12 class version of the dataset. The SegNet tutorial contains masks where black is the void class and everything else is one color.
Is there a dataset readily available with the 12 class masks? Or is there a breakdown of how the 32 classes were converted to 12? Thanks!
https://github.com/lih627/CamVid
Contains a breakdown of the original 32 -> 12 classes.
Related
for automating the tests of a legacy Windows application I need to compare screenshots of charts.
Pixel comparison works fine as long as the Windows session opens with same resolution, DPI, color depth, font size, font family, etc., otherwise the screenshot taken during the test may differ slightly from that recorded during the development of the test.
Therefore, I am looking for a method that allows slight variations and produces a score rather than a boolean.
Started with scaling the retrieved screenshot to match the size of recorded one. Of course, pixel comparison fails.
Then I tried to use SSIM to get a similarity score (used https://github.com/rhys-e/structural-similarity). Definitely it does not work for my case -- see below a simplified experiment.
Any suggestions?
Thanks in advance,
Adrian.
SSIM experiments
This is the reference picture:
This one contains a black line slightly above than the reference --> getting 0.9447093986742424
This one, completely different --> getting 0.9516260505445076
I am new to Object Detection with Yolo and I have questions regarding the labeling (for custom objects):
Is there any guideline or tips on how to label images to have high accuracy at the end? Anything I have to take care of?
For example what if I have one object twice next to each other like in the following picture:
enter image description here
https://www.manchestereveningnews.co.uk/news/greater-manchester-news/greater-manchester-28-new-buses-17777605.amp
How would you label the black bus? Just the black part or would you assume the whole bus and thus create a box that would include the blue bus as well?
Update:
Below are two examples of images labeled in COCO Dataset that show complex cases. You may explore further on the dataset to find out how they handled different cases.
Another resource, http://vision.stanford.edu/pdf/bbox_submission.pdf.
Image 1:
Image 2:
These links below may help.
PASCAL Visual Object Classes Challenge 2007 (VOC2007) Annotation
Guidelines on what and how to label,
http://host.robots.ox.ac.uk/pascal/VOC/voc2007/guidelines.html
Quote from link below on labeling best practices,
For occluded objects, label them entirely. If an object is out of view due to another object being in front of it, label the object out
of view as if you could see its entirety. Your model will begin to
understand the true bounds of objects this way.
https://blog.roboflow.ai/getting-started-with-labelimg-for-labeling-object-detection-data/
This article below states to label occluded objects entirely.
https://towardsdatascience.com/how-to-train-a-custom-object-detection-model-with-yolo-v5-917e9ce13208
You can create labels by color. For example if there are buses with different colors like black,red,blue and others you can label them with names like black_bus, red_bus, blue_bus and default_bus. But accuracy depends on the number of training images. You need thousands of images of each colored bus to get better accuracy.
You can label the example image like this:
I've the following issue:
starting from an image (example: a picture of my food pantry) I have to split in sub-images according to different objects contained in it.
after having isolated each object (and got its image) preprocess the image with split() function dividing channels
detect text contained in each image (example "peanuts", on the peanuts pack) and read text contained in it (using tesseract libs)
detect symbols in each image (example coca-cola logo, on a coca-cola bottle) using a SURF library...
Explained my purpose, the question is:
- how can I perform activity 1?
before i asked a same similar question,i tried using a watershed to segmentation the connected character but it does not well.a weeks ago,i get same question at stackoverflow in google search,Segmentation for connected characters,
in the answer users,the author mmgp provide a solution that use a morphology method and closing operation but i not understand all.
i just thinning a image in hit-and-miss morphology.
the original image
the thinning image the big image for the thinning image (enlarge)
the 4-connectivity can split a digit 9 to individual character but 44 still connected.
i have a some of question about Segmentation for connected characters
1.why need resize the original image to 200-pixel and then thinning it.
why not thinning the original image by immediate.
2.how extract these branch points and apply a morphological closing to thinning image.
i just know the closing morphology is a erosion and dilation combine operation.
the closing's vertical line need a 2*height+1(this a structure element height?),i don't know and how setting.the structure element how to constructre(3*3 or other?).
the finally they get a following image
i need some help, someone can tell me how apply closing operation and get above a image.
thanks.
i have solved this problem use a foreground-feature and background-feature.
some of people that details about this algorithm below:
Agenetic framework using contextual knowledge for segmentation and recognition of handwritten numeral strings
Segmentation of Handwritten Numeral Strings in Farsi and English Languages.
the flowing image is my capture.
foreground-region and foreground-skeleton
background-region and background-skeleton
the skeleton image for 44.
based on the above feature-points ,we can constructing a segmentation path to split 449 digit.
Use the below method for closing operation:
kernel = cv2.getStructuringElement(cv2.MORPH_RECT,(2*h+1,1))
closed_img = cv.morphologyEx(img, cv.MORPH_CLOSE, kernel)
I'm scanning a lot of A3 documents using a standard Brother A3 Multifunction and then use FineReader Pro for OCR'ing the images.
However, I'm getting a lot of errors in the characters recognized, and lots of non-alphanumeric strange characters.
Can someone give me any tips for programmatically improving the OCR accuracy, either pre-processing on the scanned images, or post-processing on the recognized text?
Edit: Find a sample pdf. It includes some sample images from which I get the poorest results.
Do you have a sample image you can post somewhere then we can quickly tell you what is causing most of your problems. FineReader is one of the better OCR engines out there so there are definitely reasons why you are getting poor results.
It could be related to poor contrast and threshold settings, image skewing, dirty rollers in the scanner, complex and coloured backgrounds, dithered backgrounds, font sizes too small, scanning dpi being too low etc...
After seeing the attached image there are a few small issues.
There are lots of dirty specks on the background page. FineReader seems to do a reasonable job with this on your images.
There is some slight skew but that is not causing and problems.
FineReader is getting confused with BOLD tall Arial type font used for column headers.
4 A big problem seems to be the bottom region of the pages where the contrast is poor and the image is fuzzy. This seems to be a problem with the scanner but could be due to printing problems.
The printing is quite poor and I am guessing it is a scan from a newspaper. Most of your errors are due to scanning issues so it would be hard to programmatically improve the results.
Firstly, I would try scanning the image in grayscale using a slightly higher resolution and see if that helps. FineReader works well with grayscale images. If you have to have a B/W image then see if the scanner driver includes a setting for dynamic thresholding and turn it on.
Your images would not be an easy task for any OCR engine. You will get better results if you can improve the scanning. Page 3 has a lot of noise in the bottom right corner.
What version of FineReasder are you using ? FR10 would probably give better results than previous versions.