I'm training a YOLO model, I have the bounding boxes in this format:-
x1, y1, x2, y2 => ex (100, 100, 200, 200)
I need to convert it to YOLO format to be something like:-
X, Y, W, H => 0.436262 0.474010 0.383663 0.178218
I already calculated the center point X, Y, the height H, and the weight W.
But still need a away to convert them to floating numbers as mentioned.
for those looking for the reverse of the question (yolo format to normal bbox format)
def yolobbox2bbox(x,y,w,h):
x1, y1 = x-w/2, y-h/2
x2, y2 = x+w/2, y+h/2
return x1, y1, x2, y2
Here's code snipet in python to convert x,y coordinates to yolo format
def convert(size, box):
dw = 1./size[0]
dh = 1./size[1]
x = (box[0] + box[1])/2.0
y = (box[2] + box[3])/2.0
w = box[1] - box[0]
h = box[3] - box[2]
x = x*dw
w = w*dw
y = y*dh
h = h*dh
return (x,y,w,h)
im=Image.open(img_path)
w= int(im.size[0])
h= int(im.size[1])
print(xmin, xmax, ymin, ymax) #define your x,y coordinates
b = (xmin, xmax, ymin, ymax)
bb = convert((w,h), b)
Check my sample program to convert from LabelMe annotation tool format to Yolo format https://github.com/ivder/LabelMeYoloConverter
There is a more straight-forward way to do those stuff with pybboxes. Install with,
pip install pybboxes
use it as below,
import pybboxes as pbx
voc_bbox = (100, 100, 200, 200)
W, H = 1000, 1000 # WxH of the image
pbx.convert_bbox(voc_bbox, from_type="voc", to_type="yolo", image_size=(W,H))
>>> (0.15, 0.15, 0.1, 0.1)
Note that, converting to YOLO format requires the image width and height for scaling.
YOLO normalises the image space to run from 0 to 1 in both x and y directions. To convert between your (x, y) coordinates and yolo (u, v) coordinates you need to transform your data as u = x / XMAX and y = y / YMAX where XMAX, YMAX are the maximum coordinates for the image array you are using.
This all depends on the image arrays being oriented the same way.
Here is a C function to perform the conversion
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <math.h>
struct yolo {
float u;
float v;
};
struct yolo
convert (unsigned int x, unsigned int y, unsigned int XMAX, unsigned int YMAX)
{
struct yolo point;
if (XMAX && YMAX && (x <= XMAX) && (y <= YMAX))
{
point.u = (float)x / (float)XMAX;
point.v = (float)y / (float)YMAX;
}
else
{
point.u = INFINITY;
point.v = INFINITY;
errno = ERANGE;
}
return point;
}/* convert */
int main()
{
struct yolo P;
P = convert (99, 201, 255, 324);
printf ("Yolo coordinate = <%f, %f>\n", P.u, P.v);
exit (EXIT_SUCCESS);
}/* main */
There are two potential solutions. First of all you have to understand if your first bounding box is in the format of Coco or Pascal_VOC. Otherwise you can't do the right math.
Here is the formatting;
Coco Format: [x_min, y_min, width, height]
Pascal_VOC Format: [x_min, y_min, x_max, y_max]
Here are some Python Code how you can do the conversion:
Converting Coco to Yolo
# Convert Coco bb to Yolo
def coco_to_yolo(x1, y1, w, h, image_w, image_h):
return [((2*x1 + w)/(2*image_w)) , ((2*y1 + h)/(2*image_h)), w/image_w, h/image_h]
Converting Pascal_voc to Yolo
# Convert Pascal_Voc bb to Yolo
def pascal_voc_to_yolo(x1, y1, x2, y2, image_w, image_h):
return [((x2 + x1)/(2*image_w)), ((y2 + y1)/(2*image_h)), (x2 - x1)/image_w, (y2 - y1)/image_h]
If need additional conversions you can check my article at Medium: https://christianbernecker.medium.com/convert-bounding-boxes-from-coco-to-pascal-voc-to-yolo-and-back-660dc6178742
For yolo format to x1,y1, x2,y2 format
def yolobbox2bbox(x,y,w,h):
x1 = int((x - w / 2) * dw)
x2 = int((x + w / 2) * dw)
y1 = int((y - h / 2) * dh)
y2 = int((y + h / 2) * dh)
if x1 < 0:
x1 = 0
if x2 > dw - 1:
x2 = dw - 1
if y1 < 0:
y1 = 0
if y2 > dh - 1:
y2 = dh - 1
return x1, y1, x2, y2
There are two things you need to do:
Divide the coordinates by the image size to normalize them to [0..1] range.
Convert (x1, y1, x2, y2) coordinates to (center_x, center_y, width, height).
If you're using PyTorch, Torchvision provides a function that you can use for the conversion:
from torch import tensor
from torchvision.ops import box_convert
image_size = tensor([608, 608])
boxes = tensor([[100, 100, 200, 200], [300, 300, 400, 400]], dtype=float)
boxes[:, :2] /= image_size
boxes[:, 2:] /= image_size
boxes = box_convert(boxes, "xyxy", "cxcywh")
Just reading the answers I am also looking for this but find this more informative to know what happening at the backend.
Form Here: Source
Assuming x/ymin and x/ymax are your bounding corners, top left and bottom right respectively. Then:
x = xmin
y = ymin
w = xmax - xmin
h = ymax - ymin
You then need to normalize these, which means give them as a proportion of the whole image, so simple divide each value by its respective size from the values above:
x = xmin / width
y = ymin / height
w = (xmax - xmin) / width
h = (ymax - ymin) / height
This assumes a top-left origin, you will have to apply a shift factor if this is not the case.
So the answer
I've got a problem in my code. (Love2D framework)
I'm making small follower-object, that follows coordinates in realtime.
The problem is that speed of movement is changing, depending on the vector.
Like move forward is slightly faster, than moving to up-left.
Can somebody help me? Mark my mistake please.
The code:
function love.load()
player = love.graphics.newImage("player.png")
local f = love.graphics.newFont(20)
love.graphics.setFont(f)
love.graphics.setBackgroundColor(75,75,75)
x = 100
y = 100
x2 = 600
y2 = 600
speed = 300
speed2 = 100
end
function love.draw()
love.graphics.draw(player, x, y)
love.graphics.draw(player, x2, y2)
end
function love.update(dt)
print(x, y, x2, y2)
if love.keyboard.isDown("right") then
x = x + (speed * dt)
end
if love.keyboard.isDown("left") then
x = x - (speed * dt)
end
if love.keyboard.isDown("down") then
y = y + (speed * dt)
end
if love.keyboard.isDown("up") then
y = y - (speed * dt)
end
if x < x2 and y < y2 then
x2 = x2 - (speed2 * dt)
y2 = y2 - (speed2 * dt)
end
if x > x2 and y < y2 then
x2 = x2 + (speed2 * dt)
y2 = y2 - (speed2 * dt)
end
if x > x2 and y > y2 then
x2 = x2 + (speed2 * dt)
y2 = y2 + (speed2 * dt)
end
if x < x2 and y > y2 then
x2 = x2 - (speed2 * dt)
y2 = y2 + (speed2 * dt)
end
end
This
if x < x2 and y < y2 then
x2 = x2 - (speed2 * dt)
y2 = y2 - (speed2 * dt)
end
if x > x2 and y < y2 then
x2 = x2 + (speed2 * dt)
y2 = y2 - (speed2 * dt)
end
if x > x2 and y > y2 then
x2 = x2 + (speed2 * dt)
y2 = y2 + (speed2 * dt)
end
if x < x2 and y > y2 then
x2 = x2 - (speed2 * dt)
y2 = y2 + (speed2 * dt)
end
has a bunch of problems.
The one that's causing your speed differences is this: if (x<x2), but x>(x2+speed2*dt), you will run through the first branch (if x < x2 and y < y2 then …). This will change the values so that you will also hit the second branch (if x > x2 and y < y2 then …), which means you move twice in the y direction. Change to if x < x2 and y < y2 then … elseif x > x2 and y < y2 then … so that you cannot fall through into the other branches, or do the math and avoid the whole if-chain.
A thing that you may or may not want and currently have is that it either walks in a certain direction, or it doesn't. Which means there are 8 possible directions that the follower can travel. (Axis-aligned or diagonally – the 4 cases you have plus the situation when dx or dy is (approximately) zero.)
If you want it to move "directly towards you", the follower should be able to move in any direction.
You will have to use the relative distances along the x- and y-direction. There are many common choices of distance definitions, but you probably want the euclidean distance. (The "shape" of the unit circle under that norm / distance definition determines how fast it's moving in either direction, only the euclidean distance is "the same in all directions".)
So replace the above block with
local dx, dy = (x2-x), (y2-y) -- difference in both directions
local norm = 1/(dx^2+dy^2)^(1/2) -- length of the distance
dx, dy = dx*norm, dy*norm -- relative contributions of x/y parts
x2, y2 = x2 - dx*speed*dt, y2 - dy*speed*dt
and you'll get the "normal" notion of distance, which gets you the normal notion of movement speed, which gets you identical-looking speeds when moving towards the player.
I am having some trouble with cv2.Houghlines() showing vertical lines when I believe that the real fit should provide horizontal lines.
Here is a clip of the code I am using:
rho_resoultion = 1
theta_resolution = np.pi/180
threshold = 200
lines = cv2.HoughLines(image, rho_resoultion, theta_resolution, threshold)
# print(lines)
for line in lines:
rho, theta = line[0]
a = np.cos(theta)
b = np.sin(theta)
x0 = a*rho
y0 = b*rho
x1 = int(x0 + 1000*(-b))
y1 = int(y0 + 1000*(a))
x2 = int(x0 - 1000*(-b))
y2 = int(y0 - 1000*(a))
cv2.line(image,(x1,y1),(x2,y2),(255,255,255),1)
cv2.namedWindow('thing', cv2.WINDOW_NORMAL)
cv2.imshow("thing", image)
cv2.waitKey(0)
This is the input and output:
I think it would be easier to extract out what is occurring if the Hough space image could be viewed.
However, the documentation does not provide information for how to show the full hough space.
How would one show the whole Hough transform space?
I attempted reducing the threshold to 1 but it did not provide an image.
Maybe you got something wrong when calculationg the angles. Feel free to show some code.
Here is an example of how to show all Hough lines in an image:
import cv2
import numpy as np
img = cv2.imread('sudoku.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray,50,150,apertureSize = 3)
lines = cv2.HoughLines(edges,1,np.pi/180,200)
for line in lines:
for rho,theta in line:
a = np.cos(theta)
b = np.sin(theta)
x0 = a*rho
y0 = b*rho
x1 = int(x0 + 1000*(-b))
y1 = int(y0 + 1000*(a))
x2 = int(x0 - 1000*(-b))
y2 = int(y0 - 1000*(a))
cv2.line(img,(x1,y1),(x2,y2),(0,0,255),2)
cv2.imshow('Houghlines',img)
if cv2.waitKey(0) & 0xff == 27:
cv2.destroyAllWindows()
Original image:
Result:
I have this function which returns x and y position an just adding up degrees, it make objects to move around in circular movements like a satellite around a planet.
In my case it moves like an ellipse because I added +30 to dist.
-(CGPoint)circularMovement:(float)degrees moonDistance:(CGFloat)dist
{
if(degrees >=360)degrees = 0;
float x = _moon.position.x + (dist+30 + _moon.size.height/2) *cos(degrees);
float y = _moon.position.y + (dist + _moon.size.height/2) *sin(degrees);
CGPoint position= CGPointMake(x, y);
return position;
}
What I would like is to reverse this function, giving the x and y position of an object and getting back the dist value.
Is this possible?
If so, how would I go about achieving it?
If you have an origin and a target, the origin having the coordinates (x1, y1) and the target has the coordinates (x2, y2) the distance between them is found using the Pythagorean theorem.
The distance between the points is the square root of the difference between x2 and x1 plus the difference between y2 and y1.
In most languages this would look something like this:
x = x2 - x1;
y = y2 - y1;
distance = Math.SquareRoot(x * x + y * y);
Where Math is your language's math library.
float x = _moon.position.x + (dist+30 + _moon.size.height/2) *cos(degrees);
float y = _moon.position.y + (dist + _moon.size.height/2) *sin(degrees);
is the way you have originally calculated the values, so the inverse formula would be:
dist = ((y - _moon.position.y) / (sin(degrees))) - _moon.size.height/2
You could calculate it based on x as well, but there is no point, it is simpler based on y.
I better explain my problem with an Image
I have a contour and a line which is passing through that contour.
At the intersection point of contour and line I want to draw a perpendicular line at the intersection point of a line and contour up to a particular distance.
I know the intersection point as well as slope of the line.
For reference I am attaching this Image.
If the blue line in your picture goes from point A to point B, and you want to draw the red line at point B, you can do the following:
Get the direction vector going from A to B. This would be:
v.x = B.x - A.x; v.y = B.y - A.y;
Normalize the vector:
mag = sqrt (v.x*v.x + v.y*v.y); v.x = v.x / mag; v.y = v.y / mag;
Rotate the vector 90 degrees by swapping x and y, and inverting one of them. Note about the rotation direction: In OpenCV and image processing in general x and y axis on the image are not oriented in the Euclidian way, in particular the y axis points down and not up. In Euclidian, inverting the final x (initial y) would rotate counterclockwise (standard for euclidean), and inverting y would rotate clockwise. In OpenCV it's the opposite. So, for example to get clockwise rotation in OpenCV: temp = v.x; v.x = -v.y; v.y = temp;
Create a new line at B pointing in the direction of v:
C.x = B.x + v.x * length; C.y = B.y + v.y * length;
(Note that you can make it extend in both directions by creating a point D in the opposite direction by simply negating length.)
This is my version of the function :
def getPerpCoord(aX, aY, bX, bY, length):
vX = bX-aX
vY = bY-aY
#print(str(vX)+" "+str(vY))
if(vX == 0 or vY == 0):
return 0, 0, 0, 0
mag = math.sqrt(vX*vX + vY*vY)
vX = vX / mag
vY = vY / mag
temp = vX
vX = 0-vY
vY = temp
cX = bX + vX * length
cY = bY + vY * length
dX = bX - vX * length
dY = bY - vY * length
return int(cX), int(cY), int(dX), int(dY)