I need a solution to compare two scanned images.
I have an image of an application form (unfilled), I need to compare that against other images of the same form, and want to detect whether there is any totally unfilled application form.
I just tried with Emgu CV AbsDiff, MatchTemplate etc, but none of them give me a 100 % match, even if I scanned the same form twice in the same scanner, could be because of the noise in the scanning, I can apply a tolerance but the problem is that I need to find out whether the user has filled anything in it. If I apply a tolerance then small changes in the form will not be detected.
I also had a look at the Python Image Libray, Accord.Net etc but couldn't find an approach for comparing this type of image.
Any suggestions on how to do this type of image comparison ?
Is there any free or paid library available for this ?
Only inspect the form fields. Without distractions it's easier to detect small changes. You also don't have to inspect each pixel. Use the histogram or mean color. I used SimpleCV:
from SimpleCV import *
form = Image("form.png")
form_field = form.crop(34, 44, 200, 30)
mean_color = form_field.meanColor()[0] # For simplicity only red channel.
if mean_color < 253:
print "Field is filled"
else:
print "Field is empty"
Alternatively look for features. E.g. corners, blobs or key points. Key points will ignore noise and will work better with poor scans:
key_points = form_field.findKeypoints()
if key_points:
print "Field is filled"
key_points.show(autocolor=True)
else:
print "Field is empty"
Related
I want to detect digits on a display. For doing that I am using a custom 19 classes dataset. The choosen model has been yolov5-X. The resolution is 640x640. Some of the objets are:
0-9 digits
Some text as objects
Total --> 17 classes
I am having problems to detect all the digits when I want to detect 23, 28, 22 for example. If they are very close to each other the model finds problems.
I am using roboflow to create diferent folders in which I add some prepcocessings to have a full control of what I am entering into the model. All are checked and entered in a new folder called TRAIN_BASE. In total I have 3500 images with digits and the majority of variance is with hue and brightness.
Any advice to make the model able to catch all the digits besides being to close from each other?
Here are the steps I follow:
First of all, The use of mosaic dataset was not a good choice the purpose of detecting digits on a display because in a real scenario I was never gonna find pieces of digits. That reason made the model not to recognize some digits if it was not shure.
example of the digits problem concept
Another big improvement was to change the anchor boxes of the yolo model to adapt them to small objects. To know which anchor boxes I needed. Just with adding this argument to train.py is enought in the script provided by ultralitics to print custom anchors and add them to your custom architecture.
To check which augmentations can be good and which not, the next article explains it quite visually.
P.D: Thanks for the fast response to help the comunity gave me.
I am working on a project where my task is to identify machine part by its part number written on label attached to it or engraved on its surface. One such example of label and engraved part is shown in below figures.
My task is to recognise 9 or 10 alphanumerical number (03C 997 032 D in 1st image and 357 955 531 in 2nd image). This seems to be easy task however I am facing problem in distinguishing between useful information in the image and rest of the part i.e. there are many other numbers and characters in both image and I want to focus on only mentioned numbers. I tried many things but no success as of now. Does anyone know the image pre processing methods or any ML/DL model which I should apply to get desired result?
Thanks in advance!
JD
You can use OCR to the get all characters from the image and then use regular expressions to extract the desired patterns.
You can use OCR method, like Tesseract.
Maybe, you want to clean the images before running the text-recognition system, by performing some filtering to remove noise / remove extra information, such as:
Convert to gray scale (colors are not relevant, aren't them?)
Crop to region of interest
Canny Filter
A good start can be one of this tutorial:
OpenCV OCR with Tesseract (Python API)
Recognizing text/number with OpenCV (C++ API)
I'm trying to visualize collisions and different events visually, and am searching for the best way to update color or visual element properties after registration with RegisterVisualGeometry.
I've found the GeometryInstance class, which seems like a promising point for changing mutable illustration properties, but have yet to find and example where an instance is called from the plant (from a GeometryId from something like GetVisualGeometriesForBody?) and its properties are changed.
As a basic example, I want to change the color of a box's visual geometry when two seconds have passed. I register the geometry pre-finalize with
// box : Body added to plant
// X_WA : Identity transform
// FLAGS_box_l : box side length
geometry::GeometryId box_visual_id = plant.RegisterVisualGeometry(
box, X_WA,
geometry::Box(FLAGS_box_l, FLAGS_box_l, FLAGS_box_l),
"BoxVisualGeometry",
Eigen::Vector4d(0.7, 0.5, 0, 1));
Then, I have a while loop to create a timed event at two seconds where I would like for the box to change it's color.
double current_time = 0.0;
const double time_delta = 0.008;
bool changed(false);
while( current_time < FLAGS_duration ){
if (current_time > 2.0 && !changed) {
std::cout << "Change color for id " << box_visual_id.get_value() << "\n";
// Change color of box using its GeometryId
changed = true;
}
simulator.StepTo(current_time + time_delta);
current_time = simulator_context.get_time();
}
Eventually I'd like to call something like this with a more specific trigger like proximity to another object, or velocity, but for now I'm not sure how I would register a simple visual geometry change.
Thanks for the details. This is sufficient for me to provide a meaningful answer of the current state of affairs as well as the future (both near- and far-term plans).
Taking your question as a representative example, changing a visual geometry's color can mean one of two things:
The color of the object changes in an "attached" visualizer (drake_visualizer being the prime example).
The color of the object changes in a simulated rgb camera (what is currently dev::RgbdCamera, but imminently RgbdSensor).
Depending on what other properties you might want to change mid simulation, there might be additional subtleties/nuances. But using the springboard above, here are the details:
A. Up until recently (drake PR 11796), changing properties after registration wasn't possible at all.
B. PR 11796 was the first step in enabling that. However, it only enables it for changing ProximityProperties. (ProximityProperties are associated with the role geometry plays in proximity queries -- contact, signed distance, etc.)
C. Changing PerceptionProperties is a TODO in that PR and will follow in the next few months (single digit unless a more pressing need arises to bump it up in priority). (PerceptionProperties are associated with the properties geometry has in simulated sensors -- how they appear, etc.)
D. Changing IllustrationProperties is not supported and it is not clear what the best/right way to do so may be. (IllustrationProperties are what get fed to an external visualizer like drake_visualizer.) This is the trickiest, due to the way the LCM communication is currently articulated.
So, when we compare possible implications of changing an object's color (1 or 2, above) with the state of the art and near-term art (C & D, above), we draw the following conclusions:
In the near future, you should be able to change it in a synthesized RGB image.
No real plan for changing it in an external visualizer.
(Sorry, it seems the answer is more along the lines of "oops...you can't do that".)
I want to do binary image classification and I thought ImageJ would be a good platform to do it, but alas, I am at a loss. The pseudocode for what I want to do is below. How do I implement this in ImageJ? It does not have to follow the pseudocode exactly, I just hope that gives an idea of what I want to do. (If there is a plugin or a more suitable platform that does this already, can you point me to that?)
For training:
train_directory = get folder from user
train_set = get all image files in train_directory
class_file = get text file from user
// each row of class_file contains image name and a "1" or "-1"
// a "1" indicates that image belongs to the positive class
model = build_model_from_train_set_and_class_file
write_model_to_output_file
// end train
For testing:
test_directory = get folder from user
test_set = get all images in test_directory
if (user wants to)
class_file = get text file from user
// normally for testing one would always know the class of the image
// the if statement is just in case the user does not
model = read_model_file
output = classify_images_as_positive_or_negative
write_output_to_file
Note that there should not be any preprocessing done by the user: no additional set of masks, no drawings, no additional labels, etc. The algorithm must be able to figure out what is common among the training images from the training images alone and build up a model appropriately. Of course, the algorithm can do any preprocessing it wants to, it just cannot rely on that preprocesssing being done already.
I tried using CVIPtools for this but it wants a mask on all of the images to do feature extraction.
I am a little curious about the cute little kaleidoscopic images associated with each user on this site.
How are those generated? Possibilities are:
A list of images is already there in some folder and it is chosen randomly.
The image is generated whenever a user registers.
In any case, I am more interested in what kind of algorithm is used to generate such images.
It's called an Identicon. If you entered and e-mail, it's a based on a hash of your e-mail address. If you didn't enter an e-mail, it's based on your IP address.
Jeff posted some .NET code to generate IP based Identicons.
Its usually generated from a hash of either a user name, email address or ip address.
Stackoverflow uses Gravatar to do the image generation.
As far as I know the idea came from Don Parks, who writes about the technique he uses.
IIRC, it's generated from an IP address.
"IP Hashing" I believe it's called.
I remember reading about it on a blog; he made the code available for download. I have no idea where it was from, however. :(
The images are produced by Gravatar and details of them are outlined here, however, they do not reveal how they are doing it.
I bet each tiny tile image is given a set of other tile images it looks good with. Think of a graph with the tiles as nodes. You pick a random node for the corner and fill it's adjacent spots with partners, then rotate it and apply the same pattern four times. Then pick a color.
Instead of a graph, it could also be a square matrix in which each row represents an image, each column represents an image, and cell values are weights.
I believe the images are a 4×4 grid with the upper 2×2 grid repeated 4 times clockwise, just each time rotated 90 degrees, again clockwise. Seems the two colours are chosen randomly, and each 1×1 block is chosen from a predefined set.
EDIT: obviously my answer was ad hoc. Nice to know about identicons.
Try this: http://www.docuverse.com/blog/9block?code=(32-bit integer)8&size=(16|32|64)
substituting appropriate numbers for the parenthesized items.