Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I need the ability to verify that a user has drawn a shape correctly, starting with simple shapes like circle, triangle and more advanced shapes like the letter A.
I need to be able to calculate correctness in real time, for example if the user is supposed to draw a circle but is drawing a rectangle, my hope is to be able to detect that while the drawing takes place.
There are a few different approaches to shape recognition, unfortunately I don't have the experience or time to try them all and see what works.
Which approach would you recommend for this specific task?
Your help is appreciated.
We may define "recognition" as the ability to detect features/characteristics in elements and compare them with features of known elements seen in our experience. Objects with similar features probably are similar objects. The higher the amount and complexity of the features, the greater is our power to discriminate similar objects.
In the case of shapes, we can use their geometric properties such as number of angles, the angles values, number of sides, sides sizes and so forth. Therefore, in order to accomplish your task you should employ image processing algorithms to extract such features from the drawings.
Below I present a very simple approach that shows this concept in practice. We gonna recognize different shapes using the numbers of corners. As I said: "The higher the amount and complexity of the features, the greater is our power to discriminate similar objects". Since we are using just one feature, the number of corners, we can differentiate a few different kinds of shapes. Shapes with the same number of corners will not be discriminated. Therefore, in order to improve the approach you might add new features.
UPDATE:
In order to accomplish this task in real time you might extract the features in real time. If the object to be drawn is a triangle and the user is drawing the fourth side of any other figure, you know that he or she is not drawing a triangle. About the level of correctness you might calculate the distance between the feature vector of the desired object and the drawn one.
Input:
The Algorithm
Scale down the input image since the desired features can ben detected in lower resolution.
Segment each object to be processed independently.
For each object, extract its features, in this case, just the number of corners.
Using the features, classify the object shape.
The Software:
The software presented below was developed in Java and using Marvin Image Processing Framework. However, you might use any programming language and tools.
import static marvin.MarvinPluginCollection.floodfillSegmentation;
import static marvin.MarvinPluginCollection.moravec;
import static marvin.MarvinPluginCollection.scale;
public class ShapesExample {
public ShapesExample(){
// Scale down the image since the desired features can be extracted
// in a lower resolution.
MarvinImage image = MarvinImageIO.loadImage("./res/shapes.png");
scale(image.clone(), image, 269);
// segment each object
MarvinSegment[] objs = floodfillSegmentation(image);
MarvinSegment seg;
// For each object...
// Skip position 0 which is just the background
for(int i=1; i<objs.length; i++){
seg = objs[i];
MarvinImage imgSeg = image.subimage(seg.x1-5, seg.y1-5, seg.width+10, seg.height+10);
MarvinAttributes output = new MarvinAttributes();
output = moravec(imgSeg, null, 18, 1000000);
System.out.println("figure "+(i-1)+":" + getShapeName(getNumberOfCorners(output)));
}
}
public String getShapeName(int corners){
switch(corners){
case 3: return "Triangle";
case 4: return "Rectangle";
case 5: return "Pentagon";
}
return null;
}
private static int getNumberOfCorners(MarvinAttributes attr){
int[][] cornernessMap = (int[][]) attr.get("cornernessMap");
int corners=0;
List<Point> points = new ArrayList<Point>();
for(int x=0; x<cornernessMap.length; x++){
for(int y=0; y<cornernessMap[0].length; y++){
// Is it a corner?
if(cornernessMap[x][y] > 0){
// This part of the algorithm avoid inexistent corners
// detected almost in the same position due to noise.
Point newPoint = new Point(x,y);
if(points.size() == 0){
points.add(newPoint); corners++;
}else {
boolean valid=true;
for(Point p:points){
if(newPoint.distance(p) < 10){
valid=false;
}
}
if(valid){
points.add(newPoint); corners++;
}
}
}
}
}
return corners;
}
public static void main(String[] args) {
new ShapesExample();
}
}
The software output:
figure 0:Rectangle
figure 1:Triangle
figure 2:Pentagon
The other way is you can use math with this problem using the average of each point that are smallest distance from the one your'e comparing it from,
first you must resize shape with the ones in your library of shapes and then:
function shortestDistanceSum( subject, test_subject ) {
var sum = 0;
operate( subject, function( shape ){
var smallest_distance = 9999;
operate( test_subject, function( test_shape ){
var distance = dist( shape.x, shape.y, test_shape.x, test_shape.y );
smallest_distance = Math.min( smallest_distance, distance );
});
sum += smallest_distance;
});
var average = sum/subject.length;
return average;
}
function operate( array, callback ) {
$.each(array, function(){
callback( this );
});
}
function dist( x, y, x1, y1 ) {
return Math.sqrt( Math.pow( x1 - x, 2) + Math.pow( y1 - y, 2) );
}
var square_shape = Array; // collection of vertices in a square shape
var triangle_shape = Array; // collection of vertices in a triangle
var unknown_shape = Array; // collection of vertices in the shape your'e comparing from
square_sum = shortestDistanceSum( square_shape, unknown_shape );
triangle_sum = shortestDistanceSum( triangle_shape, unknown_shape );
Where the lowest sum is the closest shape.
You have two inputs - the initial image and the user input - and you are looking for a boolean outcome.
Ideally you would convert all your input data to a comparable format. Instead, you could also parameterize both types of input and use a supervised machine learning algorithm (Nearest Neighbor comes to mind for closed shapes).
The trick is in finding the right parameters. If your input is a flat image file, this could be a binary conversion. If user input is a swiping motion or pen stroke, I'm sure there are ways to capture and map this as binary but the algorithm would probably be more robust if it used data closest to the original input.
Related
Because of the Moodle-STACK environment I am currently limited to JSXGraph 0.99.7. Is there a way to get the union of two curves given by coordinate vectors (polygons) in that version?
In 1.2.1 I do this using Clip.union(), which works fine in jsfiddle (not exactly a minimum working example) but not in STACK.
this.b = board.create('curve', JXG.Math.Clip.union( bneu, this.b, board),
{opacity: true, fillcolor:'lightgray', strokeWidth: normalStyle.strokeWidth,
strokeColor: normalStyle.strokeColor});
In 0.99.7 you have to do the union by hand. As long as the shapes are not overlapping this might be doable without too much work. Define a curve and set its
updataDataArray method:
c = board.create('curve', [[], []]);
c.updateDataArray = function() {
this.dataX = [];
this.dataY = [];
// copy now the coordinates of the polygons / curves into
// these arrays.
};
board.update();
You can access the coordinates of the polygon vertices by
polygon.vertices[i].X();
polygon.vertices[i].Y();
Attention: the last vertex is a copy of the first vertex, to make the polygon a closed curve.
The coordinates of a curve can be accessed by
curve.points[i].usrCoords[1]; // x-coordinate
curve.points[i].usrCoords[2]; // y-coordinate
The curve path can be interrupted by adding NaNs:
this.dataX.push(NaN);
this.dataY.push(NaN);
Hope that helps a little bit.
I am currently using a Project Tango tablet for robotic obstacle avoidance. I want to create a matrix of z-values as they would appear on the Tango screen, so that I can use OpenCV to process the matrix. When I say z-values, I mean the distance each point is from the Tango. However, I don't know how to extract the z-values from the TangoXyzIjData and organize the values into a matrix. This is the code I have so far:
public void action(TangoPoseData poseData, TangoXyzIjData depthData) {
byte[] buffer = new byte[depthData.xyzCount * 3 * 4];
FileInputStream fileStream = new FileInputStream(
depthData.xyzParcelFileDescriptor.getFileDescriptor());
try {
fileStream.read(buffer, depthData.xyzParcelFileDescriptorOffset, buffer.length);
fileStream.close();
} catch (IOException e) {
e.printStackTrace();
}
Mat m = new Mat(depthData.ijRows, depthData.ijCols, CvType.CV_8UC1);
m.put(0, 0, buffer);
}
Does anyone know how to do this? I would really appreciate help.
The short answer is it can't be done, at least not simply. The XYZij struct in the Tango API does not work completely yet. There is no "ij" data. Your retrieval of buffer will work as you have it coded. The contents are a set of X, Y, Z values for measured depth points, roughly 10000+ each callback. Each X, Y, and Z value is of type float, so not CV_8UC1. The problem is that the points are not ordered in any way, so they do not correspond to an "image" or xy raster. They are a random list of depth points. There are ways to get them into some xy order, but it is not straightforward. I have done both of these:
render them to an image, with the depth encoded as color, and pull out the image as pixels
use the model/view/perspective from OpenGL and multiply out the locations of each point and then figure out their screen space location (like OpenGL would during rendering). Sort the points by their xy screen space. Instead of the calculated screen-space depth just keep the Z value from the original buffer.
or
wait until (if) the XYZij struct is fixed so that it returns ij values.
I too wish to use Tango for object avoidance for robotics. I've had some success by simplifying the use case to be only interested in the distance of any object located at the center view of the Tango device.
In Java:
private Double centerCoordinateMax = 0.020;
private TangoXyzIjData xyzIjData;
final FloatBuffer xyz = xyzIjData.xyz;
double cumulativeZ = 0.0;
int numberOfPoints = 0;
for (int i = 0; i < xyzIjData.xyzCount; i += 3) {
float x = xyz.get(i);
float y = xyz.get(i + 1);
if (Math.abs(x) < centerCoordinateMax &&
Math.abs(y) < centerCoordinateMax) {
float z = xyz.get(i + 2);
cumulativeZ += z;
numberOfPoints++;
}
}
Double distanceInMeters;
if (numberOfPoints > 0) {
distanceInMeters = cumulativeZ / numberOfPoints;
} else {
distanceInMeters = null;
}
Said simply this code is taking the average distance of a small square located at the origin of x and y axes.
centerCoordinateMax = 0.020 was determined to work based on observation and testing. The square typically contains 50 points in ideal conditions and fewer when held close to the floor.
I've tested this using version 2 of my tango-caminada application and the depth measuring seems quite accurate. Standing 1/2 meter from a doorway I slid towards the open door and the distance changed form 0.5 meters to 2.5 meters which is the wall at the end of the hallway.
Simulating a robot being navigated I moved the device towards a trash can in the path until 0.5 meters separation and then rotated left until the distance was more than 0.5 meters and proceeded forward. An oversimplified simulation, but the basis for object avoidance using Tango depth perception.
You can do this by using camera intrinsics to convert XY coordinates to normalized values -- see this post - Google Tango: Aligning Depth and Color Frames - it's talking about texture coordinates but it's exactly the same problem
Once normalized, move to screen space x[1280,720] and then the Z coordinate can be used to generate a pixel value for openCV to chew on. You'll need to decide how to color pixels that don't correspond to depth points on your own, and advisedly, before you use the depth information to further colorize pixels.
The main thing is to remember that the raw coordinates returned are already using the basis vectors you want, i.e. you do not want the pose attitude or location
So, here is my situation. I have created a object detection program which is based on color object detection. My program detects the color red and it works perfectly. But here is the problems i am facing:-
Whenever there are more than one red object in the surrounding, my program detects them and it cannot really track one object at that time(i.e it tracks other red objects of various sizes in the background. It shows me the error that "too much noise in the background". As you can see in the "threshold image" attached, it detects the round object (which is my tracking object) and my cap which is red in color. I want my program to detect only my tracking object("which is a round shaped coke cap"). How can i achieve that? Please help me out. I have my engineering design contest in few days and i have to demo my program infront of my lecturers. My program should only be able to detect and track the object which i want. Thanks
My code for the objectdetection program is a little long. So, i am hereby explaining the code as follows- I captured a frame from the webcam frame-converted it to HSV- used HSV Inrange filter to filter out the other colors but red- applied morphological operations on the filtered image. This all goes in my main function
I am using a frame resolution of 1280*720 for my webcam frame. It kind of slows down my program but it was a trade off which i had to do for performing gesture controlled operations. Anyways here is my drawobjectfunction and trackfilteredobjectfunction.
int H_MIN = 0;
int H_MAX = 256;
int S_MIN = 0;
int S_MAX = 256;
int V_MIN = 0;
int V_MAX = 256;
//default capture width and height
const int FRAME_WIDTH = 1280;
const int FRAME_HEIGHT = 720;
//max number of objects to be detected in frame
const int MAX_NUM_OBJECTS=50;
//minimum and maximum object area
const int MIN_OBJECT_AREA = 20*20;
const int MAX_OBJECT_AREA = FRAME_HEIGHT*FRAME_WIDTH/1.5;
void drawObject(int x, int y,Mat &frame){
circle(frame,Point(x,y),20,Scalar(0,255,0),2);
if(y-25>0)
line(frame,Point(x,y),Point(x,y-25),Scalar(0,255,0),2);
else line(frame,Point(x,y),Point(x,0),Scalar(0,255,0),2);
if(y+25<FRAME_HEIGHT)
line(frame,Point(x,y),Point(x,y+25),Scalar(0,255,0),2);
else line(frame,Point(x,y),Point(x,FRAME_HEIGHT),Scalar(0,255,0),2);
if(x-25>0)
line(frame,Point(x,y),Point(x-25,y),Scalar(0,255,0),2);
else line(frame,Point(x,y),Point(0,y),Scalar(0,255,0),2);
if(x+25<FRAME_WIDTH)
line(frame,Point(x,y),Point(x+25,y),Scalar(0,255,0),2);
else line(frame,Point(x,y),Point(FRAME_WIDTH,y),Scalar(0,255,0),2);
putText(frame,intToString(x)+","+intToString(y),Point(x,y+30),1,1,Scalar(0,255,0),2);
}
void trackFilteredObject(int &x, int &y, Mat threshold, Mat &cameraFeed){
Mat temp;
threshold.copyTo(temp);
//these two vectors needed for output of findContours
vector< vector<Point> > contours;
vector<Vec4i> hierarchy;
//find contours of filtered image using openCV findContours function
findContours(temp,contours,hierarchy,CV_RETR_CCOMP,CV_CHAIN_APPROX_SIMPLE );
//use moments method to find our filtered object
double refArea = 0;
bool objectFound = false;
if (hierarchy.size() > 0) {
int numObjects = hierarchy.size();
//if number of objects greater than MAX_NUM_OBJECTS we have a noisy filter
if(numObjects<MAX_NUM_OBJECTS){
for (int index = 0; index >= 0; index = hierarchy[index][0]) {
Moments moment = moments((cv::Mat)contours[index]);
double area = moment.m00;
//if the area is less than 20 px by 20px then it is probably just noise
//if the area is the same as the 3/2 of the image size, probably just a bad filter
//we only want the object with the largest area so we safe a reference area each
//iteration and compare it to the area in the next iteration.
if(area>MIN_OBJECT_AREA && area<MAX_OBJECT_AREA && area>refArea){
x = moment.m10/area;
y = moment.m01/area;
objectFound = true;
refArea = area;
}else objectFound = false;
}
//let user know you found an object
if(objectFound ==true){
putText(cameraFeed,"Tracking Object",Point(0,50),2,1,Scalar(0,255,0),2);
//draw object location on screen
drawObject(x,y,cameraFeed);}
}else putText(cameraFeed,"TOO MUCH NOISE! ADJUST FILTER",Point(0,50),1,2,Scalar(0,0,255),2);
}
}
Here is the link of the image; as you can see it also detects the red hat in the background along with the red cap of the coke bottle.
My observations:- Here is what i think, to achieve my desired goal of not detecting objects of unknown sizes of red color. I think i have to edit the value of maximum object area which i declared in the above program as (const int MAX_OBJECT_AREA = FRAME_HEIGHT*FRAME_WIDTH/1.5;). I think i have to change this value, that might eliminate the detection of bigger continous red pictures. But also, there is another problem some objects are not completely red in color and they have patches of red and other colors. So, if the detected area is within the range specfied in my program then my program detects those red patches too. What i mean to say is i was wearing a tshirt which has mixed colors and when i tested my program by wearing that tshirt, my program was able to detect the red color out of the other colors. Now, how do i solve this issue?
I think you can try out the following procedure:
obtain a circular kernel having roughly the same area as your object of interest. You can do it like: Mat kernel = getStructuringElement(MORPH_ELLIPSE, Size(d, d));
where d is the diameter of the disk.
perform normalized-cross-correlation or convolution of the filtered regions image with this kernel (I think normalized-cross-correlation would be better. And add an empty boarder around the kernel).
the peak of the resulting image should give you the location of the circular region in your filtered image (if you are using normalized-cross-correlation, you'll have to add the shift).
To speed things up, you can perform this at a reduced resolution.
You can filter out non-circular shapes by detecting circles in your thresholded image. OpenCV provides a built-on method to detect circles using Hough transform, more info here. You can take advantage of this function to retain only circles that have a radius in a given range.
Another possibility is to implement connected component labeling (CCL) into your demo program.
I believe that it was removed at some point in verions 2.x of OpenCV, but a basic implementation of the two-pass version is straightforward from the Wikipedia page.
CCL will assign a unique ID for each object after thresholding. You then have to implement matching between the objects at frame (T-1) and objects in frame (T) (for example based on some nearest distance criterion) and possibly trajectory filtering or smoothing, but this would definitely give you some extra-points.
Perhaps this is more of a math question than a programming question, but I've been trying to implement the rotating calipers algorithm in XNA.
I've deduced a convex hull from my point set using a monotone chain as detailed on wikipedia.
Now I'm trying to model my algorithm to find the OBB after the one found here:
http://www.cs.purdue.edu/research/technical_reports/1983/TR%2083-463.pdf
However, I don't understand what the DOTPR and CROSSPR methods it mentions on the final page are supposed to return.
I understand how to get the Dot Product of two points and the Cross Product of two points, but it seems these functions are supposed to return the Dot and Cross Products of two edges / line segments. My knowledge of mathematics is admittedly limited but this is my best guess as to what the algorithm is looking for
public static float PolygonCross(List<Vector2> polygon, int indexA, int indexB)
{
var segmentA1 = NextVertice(indexA, polygon) - polygon[indexA];
var segmentB1 = NextVertice(indexB, polygon) - polygon[indexB];
float crossProduct1 = CrossProduct(segmentA1, segmentB1);
return crossProduct1;
}
public static float CrossProduct(Vector2 v1, Vector2 v2)
{
return (v1.X * v2.Y - v1.Y * v2.X);
}
public static float PolygonDot(List<Vector2> polygon, int indexA, int indexB)
{
var segmentA1 = NextVertice(indexA, polygon) - polygon[indexA];
var segmentB1 = NextVertice(indexB, polygon) - polygon[indexB];
float dotProduct = Vector2.Dot(segmentA1, segmentB1);
return dotProduct;
}
However, when I use those methods as directed in this portion of my code...
while (PolygonDot(polygon, i, j) > 0)
{
j = NextIndex(j, polygon);
}
if (i == 0)
{
k = j;
}
while (PolygonCross(polygon, i, k) > 0)
{
k = NextIndex(k, polygon);
}
if (i == 0)
{
m = k;
}
while (PolygonDot(polygon, i, m) < 0)
{
m = NextIndex(m, polygon);
}
..it returns the same index for j, k when I give it a test set of points:
List<Vector2> polygon = new List<Vector2>()
{
new Vector2(0, 138),
new Vector2(1, 138),
new Vector2(150, 110),
new Vector2(199, 68),
new Vector2(204, 63),
new Vector2(131, 0),
new Vector2(129, 0),
new Vector2(115, 14),
new Vector2(0, 138),
};
Note, that I call polygon.Reverse to place these points in Counter-clockwise order as indicated in the technical document from perdue.edu. My algorithm for finding a convex-hull of a point set generates a list of points in counter-clockwise order, but does so assuming y < 0 is higher than y > 0 because when drawing to the screen 0,0 is the top left corner. Reversing the list seems sufficient. I also remove the duplicate point at the end.
After this process, the data becomes:
Vector2(115, 14)
Vector2(129, 0)
Vector2(131, 0)
Vector2(204, 63)
Vector2(199, 68)
Vector2(150, 110)
Vector2(1, 138)
Vector2(0, 138)
This test fails on the first loop when i equals 0 and j equals 3. It finds that the cross-product of the line (115,14) to (204,63) and the line (204,63) to (199,68) is 0. It then find that the dot product of the same lines is also 0, so j and k share the same index.
In contrast, when given this test set:
http://www.wolframalpha.com/input/?i=polygon+%282%2C1%29%2C%281%2C2%29%2C%281%2C3%29%2C%282%2C4%29%2C%284%2C4%29%2C%285%2C3%29%2C%283%2C1%29
My code successfully returns this OBB:
http://www.wolframalpha.com/input/?i=polygon+%282.5%2C0.5%29%2C%280.5%2C2.5%29%2C%283%2C5%29%2C%285%2C3%29
I've read over the C++ algorithm found on http://www.geometrictools.com/LibMathematics/Containment/Wm5ContMinBox2.cpp but I'm too dense to follow it completely. It also appears to be very different than the other one detailed in the paper above.
Does anyone know what step I'm skipping or see some error in my code for finding the dot product and cross product of two line segments? Has anyone successfully implemented this code before in C# and have an example?
Points and vectors as data structures are essentially the same thing; both consist of two floats (or three if you're working in three dimensions). So, when asked to take the dot product of the edges, I suppose it means taking the dot product of the vectors that the edges define. The code you provided does exactly this.
Your implementation of CrossProduct seems correct (see Wolfram MathWorld). However, in PolygonCross and PolygonDot I think you shouldn't normalize the segments. It will affect the magnitude of the return values of PolygonDot and PolygonCross. By removing the superfluous calls to Vector2.Normalize you can speed up your code and reduce the amount of noise in your floating point values. However, normalization is not relevant to the correctness of the code that you have pasted as it only compares the results with zero.
Note that the paper you refer to assumes that the polygon vertices are listed in counterclockwise order (page 5, first paragraph after "Beginning of comments") but your example polygon is defined in clockwise order. That's why PolygonCross(polygon, 0, 1) is negative and you get the same value for j and k.
I assume DOTPR is a normal vector dot product, crosspr is a crossproduct. dotproduct will return a normal number , crossproduct will return a vector which is perpendicular to the two vectors given. (basic vector math,check wikipedia)
they are actually defined in the paper as DOTPR(i,j) returns dotproduct of vectors from vertex i to i+1 and j to j+1. same for CROSSPR but with cross product.
I'm recently playing with Away3D Library and have a problem in finding Face center in Away3D. Why Away3DLite has a face.center feature while Away3D doesn't have it ? and what is the alternative solution for this ?
If you want to find the center of a face, it's simply the average position of all the vertices making up that face:
function getFaceCenter(f : Face) : Vector3D
{
var vert : Vertex;
var ret : Vector3D = new Vector3D;
for each (vert in f.vertices) {
ret.x += vert.x;
ret.y += vert.y;
ret.z += vert.z;
}
ret.x /= f.vertices.length;
ret.y /= f.vertices.length;
ret.z /= f.vertices.length;
return ret;
}
The above is a very simple function to calculate an average, although on a 3D vector instead of a simple scalar number. That average is the center of all the vertices in the face.
If you need to do this a lot, optimize the method by preventing it from allocating a vector (by passing in a vector to which the return values should be written) and create a temporary variable for the vertex list length instead of dereferencing it through two object references like min (f and vertices), which is unnecessarily heavy.