I am new to Object Detection with Yolo and I have questions regarding the labeling (for custom objects):
Is there any guideline or tips on how to label images to have high accuracy at the end? Anything I have to take care of?
For example what if I have one object twice next to each other like in the following picture:
enter image description here
https://www.manchestereveningnews.co.uk/news/greater-manchester-news/greater-manchester-28-new-buses-17777605.amp
How would you label the black bus? Just the black part or would you assume the whole bus and thus create a box that would include the blue bus as well?
Update:
Below are two examples of images labeled in COCO Dataset that show complex cases. You may explore further on the dataset to find out how they handled different cases.
Another resource, http://vision.stanford.edu/pdf/bbox_submission.pdf.
Image 1:
Image 2:
These links below may help.
PASCAL Visual Object Classes Challenge 2007 (VOC2007) Annotation
Guidelines on what and how to label,
http://host.robots.ox.ac.uk/pascal/VOC/voc2007/guidelines.html
Quote from link below on labeling best practices,
For occluded objects, label them entirely. If an object is out of view due to another object being in front of it, label the object out
of view as if you could see its entirety. Your model will begin to
understand the true bounds of objects this way.
https://blog.roboflow.ai/getting-started-with-labelimg-for-labeling-object-detection-data/
This article below states to label occluded objects entirely.
https://towardsdatascience.com/how-to-train-a-custom-object-detection-model-with-yolo-v5-917e9ce13208
You can create labels by color. For example if there are buses with different colors like black,red,blue and others you can label them with names like black_bus, red_bus, blue_bus and default_bus. But accuracy depends on the number of training images. You need thousands of images of each colored bus to get better accuracy.
You can label the example image like this:
Related
I'm currently an MS student in Medical Physics and I have a great need to be able to overlay an isodose distribution from an RTDOSE file onto a CT image from a .dcm file set.
I've managed to extract the image and the dose pixel arrays myself using pydicom and dicom_numpy, but the two arrays are not the same size! So, if I overlay the two together, the dose will not be in the correct position based on what the Elekta Gamma Plan software exported it as.
I've played around with dicompyler and 3DSlicer and they obviously are able to do this even though the arrays are not the same size. However, I think I cannot export the numerical data when using these softwares.I can only scroll through and view it as an image. How can I overlay the RTDOSE to an CT image?
Thank you
for what you want it sounds like you should use Simple ITK (or equivalent - my experience is with sitk) to do the dicom handling, not pydicom.
Dicom has built in a complete system for 3D point and location specifications for all the pixel data in patient coordinates. This uses a bunch of attributes in the dicom files in the Image Plane Module set of tags. See here for a good overview.
The simple ITK library fully understands and uses the full 3D Image Plane tags to identify and locate any images in patient coordinates by default - irrespective of such things as the specific pixel spacing, slice thickness etc etc.
So - in your case - if you use SITK to open your studies, then you should be able to overlay them correctly "out of the box", because SITK will do all the work to parse the Image Plane Module tags and locate the data in patient coordinates - just like you get with 3DSlicer.
Pydicom, in contrast, doesn't itself try to use any of that information at all. It only gives you the raw pixel arrays (for images).
Note I use both pydicom and SITK. This isn't something bad about pydicom, but more a question of right tool for the job. In fact, for many (most?) things I use pydicom, but for any true 3D type work, SITK is the easier toolkit to use.
My question might be off topic, but I didn't a better forum to ask.
I need to change the color of a product on an eCommerce website. We have many styles and many colours, so taking a picture of every combination is out of question (about 100 styles and colours, which will result in 10,000 pictures. We just don't have time to take that many pictures or process them manually). However, I could take a picture of every product and and a picture of one style in every colours and then make a program which generate all the missing pictures. I was thinking using something like OpenCV (and probably python) which provide lots classic computer vision algorithm off the shelf, to do so. Before doing it, I'm sure this is a classic image processin problem. Does it have a name or is there any algorithm or resources on the topic ?
In other world, there are apps and program which allows you to change the colour of our dress or clothes. Does any body knows how it works or have usefull resources related to this problem ?
You separate intesity from colour information. Then you change the colour information and merge both back together.
This will give you an image with changed colours but maintained brightness. So shadows, highlights and so on stay untouched.
You have to convert your RGB touples to a colour space that has separate coordinates for intensity and colour.
https://en.wikipedia.org/wiki/Lab_color_space as one example
Of course you may restrict these operations to your "product" so anything else remains unchanged.
I have the following image
What I need to do is connect the edges in MATLAB that are obviously of the same object in order to use regionprops later. By 'obviously' I mean the edges of the inside object and those of the outside one. What I thought is that I somehow must keep the pixels of each edge in a struct and then for each edge find the one that is closer to it and then apply some fitting(polynomial, bspline etc). The problem is that I have to make it for thousands of such images so I need a robust algorithm and I cannot do it by hand for all of them. Is there a way for somebody to help me? The image of which the previous image is obtained is this one. Ideally I have to catch the two interfaces shown there.
Thank you very much in advance
I can use Scikit-Learn to train a model and recognize objects but I also need to be able to tell where in my test data images the object is residing. Is there someway I could maybe get the coordinates of the part of the test image which has the object I'm trying to recognize?
If not, please refer me to some other library that'll help me achieve this task.
Thankyou
I assume that you are talking about a computer vision application. Usually, the way that a box is drawn around an identified object is by using a sliding window and running your classifier on each window as it steps across the screen. You can keep track of which windows come back with positive results and use those windows as your bounds. You may wish to use windows of various size, if the object scale changes from image to image. In that case, you would likely want to prefer the smaller of two overlapping windows.
I do have few images. Some of the images contains text and few other doesn't contains text at all. I want a robust algorithm which can conclude if image contains text or not.
Even Probabilistic Algorithms are fine.
Can anyone suggest such algorithm?
Thanks
There are a some specifics that you'll want to pin down:
Will there be much text in the image? Or just a character or two?
Will the text be oriented properly? Or does rotation also need to be performed?
How big will you expect the text to be?
How similar to text will be background be?
Since images can vary significantly you want to define the problem and find as many constraints as you can to make the problem as simple as possible. It's a difficult problem.
For such an algorithm you'll want to focus on what makes text unique from the background (consistent spacing between characters and lines, consistent height, consistent baseline, etc. There's an area of research in "text detection" that you'll want to investigate and you'll find a number of algorithms there. Two surveys of some of these methods can be found here and here