I am using the google ML kit to recognize text in images. Everything is happening live and I want to create a visual bounding box representation that will cover certain words that are recognized. I managed to do the simple part of visual representation, however, the preview I am seeing on the phone is much larger than the image the model is receiving, ie: the model is receiving some 720p resolution image for faster execution and the preview is some 1080p resolution.
I've tried the following algorithm to achieve this, however, the bounding box looks accurate only in the middle of the screen, if I move the camera to the edge of the word that is in interest for me, the box get's offset.
private fun Rect.transform(
originalWidth: Int,
originalHeight: Int,
width: Int,
height: Int
): RectF {
val scaleX = originalWidth.toFloat() / width.toFloat()
val scaleY = originalHeight.toFloat() / height.toFloat()
// Scale all coordinates to match preview
val scaledLeft = scaleX * left
val scaledTop = scaleY * top
val scaledRight = scaleX * right
val scaledBottom = scaleY * bottom
return RectF(scaledLeft, scaledTop, scaledRight, scaledBottom)
}
Related
I am working in ARKit/RealityKit/Swift, If I keep one box near to the camera then it has a perfect size. But if I move it far from the camera e.g 1 or 2 meters far then it looks smaller. I want to keep its size same in the user's eyes/perception.
I have done some basic math on the Z coordinate of the object. e.g. if z = -0.9 and size = 0.5 meter then what will be new size if z = -2 or far. But it does not work.
//Sample code to get an idea
let original = Position()
let origianlSize: Float = 0.5
let newPosition = Position() // Assume it is far from original position
let newSize = origianlSize * newPosition.z / original.z // New size is not proper
I'm trying to create something like canvas in SceneKit using an SCNBox, with a UIImage "wrapped" around from one surface and onto the four others adjacent to it.
The only way I can currently think to do this would be to chop up the UIImage into five separate images and put those onto the sides as materials, but I'm sure there must be an easier way.
Can anyone steer me in the right direction here? The box will have a separate texture/material on the side opposite the "front".
The easiest way would probably be to create a custom geometry with matching texture coordinates using +geometryWithSources:elements:
You can use contentsTransform property from SCNMaterialProperty, for adjust needed texture coordinates from your image to SCNBox
Some explanations with simplified example:
Lets suppose that you are using cube and you have a texture like this
By dividing it into rectangles, you will have
You want to skip rectangles 1, 3, 7, 9 and cover your cube with this texture.
For this just normalize the size of side from your SCNBox between 0 and 1, and use it to set the scale and transform in contentsTransform matrix.
I have a cube with equal sides in my example - so it will be the third part of the whole texture. For taking the 5 rectangle from the texture
let normalizedWidth = 1/3
let normilizedHeight = 1/3
let xOffset = 1 //skip 1,4,7 line
let yOffset = 1 //skip 1,2,3 line
let sideMaterial = SCNMaterial()
sideMaterial.diffuse.contents = textureImage
let scaleMatrix = SCNMatrix4MakeScale(normalizedWidth, normilizedHeight, 0.0)
sideMaterial.diffuse.contentsTransform = SCNMatrix4Translate(scaleMatrix,
normalizedWidth * xOffset, yOffset * yOffset, 0.0)
You can fill 5 sides with configured materials, and the last on (on the back) just with the color and set them to materials property of your SCNBox.
In the result you will have
I have a view that I draw using Core Graphics, which in this example is a segmented circle. The user can touch the circle to create a point along its circumference; this creates a subview on the UIView that contains the circle graphic.
Then I've implemented a pinch-zoom gesture which causes the circle to redraw to its new size. I've seen most implementations of pinch zoom use transform properties, but I've chosen to redraw because it's all vectors and gives a clean result.
My problem is repositioning the point views. I calculate the required position of those points based on the scale of the parent view: as it changes I update the x/y coords of the point views. However, it seems there are some precision issues: as the circle shape size increases, the points drift so they aren't right on the line anymore. Here's a couple examples:
This is where the circle is at 100% scale. Note the perfect positioning of that black point. But when you zoom in...
The point drifts off-line.
And here's some code. I derive the new size of the circle from the pinch gesture's scale (I modify if a bit to constrain and slow it down for UI purposes, so that's deltaScale) and then draw it like so:
let currentSize = self.shape!.bounds.size
let newSize = CGSize(width: self.originalSize.width * deltaScale, height: self.originalSize.height * deltaScale)
self.shape?.frame.size = newSize
self.shape?.center = self.originalCentre!
self.shape?.shapeSize = newSize
self.shape?.setNeedsDisplay()
As the pinch-zoom gesture completes, I calculate the factor:
let xScale = Double(newSize.width) / Double(currentSize.width)
let yScale = Double(newSize.height) / Double(currentSize.height)
self.points = self.points.map{(thisPoint) -> UIView in
thisPoint.center = CGPoint(x: Double(thisPoint.center.x) * xScale, y: Double(thisPoint.center.y) * yScale)
return thisPoint
}
(I was using CGFloats, but switched to Doubles in the hope that it would give me the precision I needed. Alas.)
You're accumulating roundoff errors. This is getting executed repeatedly:
thisPoint.center = CGPoint(x: Double(thisPoint.center.x) * xScale, y: Double(thisPoint.center.y) * yScale)
Repeating any calculation of the form 'x=f(x)' with anything less than unlimited precision will result in drift.
Trick is to not have 'thisPoint.center' on both sides of the equal sign. Best way to do that is to have thisPoint.center be a pure function of some other state. Commenter suggested storing desired angle, that would work well. Then you could do:
thisPoint.center = f(thisPoint.someRadians), where 'f' converts from polar to rectangular coordinates, factoring in the scale of the circle.
I am currently using a Project Tango tablet for robotic obstacle avoidance. I want to create a matrix of z-values as they would appear on the Tango screen, so that I can use OpenCV to process the matrix. When I say z-values, I mean the distance each point is from the Tango. However, I don't know how to extract the z-values from the TangoXyzIjData and organize the values into a matrix. This is the code I have so far:
public void action(TangoPoseData poseData, TangoXyzIjData depthData) {
byte[] buffer = new byte[depthData.xyzCount * 3 * 4];
FileInputStream fileStream = new FileInputStream(
depthData.xyzParcelFileDescriptor.getFileDescriptor());
try {
fileStream.read(buffer, depthData.xyzParcelFileDescriptorOffset, buffer.length);
fileStream.close();
} catch (IOException e) {
e.printStackTrace();
}
Mat m = new Mat(depthData.ijRows, depthData.ijCols, CvType.CV_8UC1);
m.put(0, 0, buffer);
}
Does anyone know how to do this? I would really appreciate help.
The short answer is it can't be done, at least not simply. The XYZij struct in the Tango API does not work completely yet. There is no "ij" data. Your retrieval of buffer will work as you have it coded. The contents are a set of X, Y, Z values for measured depth points, roughly 10000+ each callback. Each X, Y, and Z value is of type float, so not CV_8UC1. The problem is that the points are not ordered in any way, so they do not correspond to an "image" or xy raster. They are a random list of depth points. There are ways to get them into some xy order, but it is not straightforward. I have done both of these:
render them to an image, with the depth encoded as color, and pull out the image as pixels
use the model/view/perspective from OpenGL and multiply out the locations of each point and then figure out their screen space location (like OpenGL would during rendering). Sort the points by their xy screen space. Instead of the calculated screen-space depth just keep the Z value from the original buffer.
or
wait until (if) the XYZij struct is fixed so that it returns ij values.
I too wish to use Tango for object avoidance for robotics. I've had some success by simplifying the use case to be only interested in the distance of any object located at the center view of the Tango device.
In Java:
private Double centerCoordinateMax = 0.020;
private TangoXyzIjData xyzIjData;
final FloatBuffer xyz = xyzIjData.xyz;
double cumulativeZ = 0.0;
int numberOfPoints = 0;
for (int i = 0; i < xyzIjData.xyzCount; i += 3) {
float x = xyz.get(i);
float y = xyz.get(i + 1);
if (Math.abs(x) < centerCoordinateMax &&
Math.abs(y) < centerCoordinateMax) {
float z = xyz.get(i + 2);
cumulativeZ += z;
numberOfPoints++;
}
}
Double distanceInMeters;
if (numberOfPoints > 0) {
distanceInMeters = cumulativeZ / numberOfPoints;
} else {
distanceInMeters = null;
}
Said simply this code is taking the average distance of a small square located at the origin of x and y axes.
centerCoordinateMax = 0.020 was determined to work based on observation and testing. The square typically contains 50 points in ideal conditions and fewer when held close to the floor.
I've tested this using version 2 of my tango-caminada application and the depth measuring seems quite accurate. Standing 1/2 meter from a doorway I slid towards the open door and the distance changed form 0.5 meters to 2.5 meters which is the wall at the end of the hallway.
Simulating a robot being navigated I moved the device towards a trash can in the path until 0.5 meters separation and then rotated left until the distance was more than 0.5 meters and proceeded forward. An oversimplified simulation, but the basis for object avoidance using Tango depth perception.
You can do this by using camera intrinsics to convert XY coordinates to normalized values -- see this post - Google Tango: Aligning Depth and Color Frames - it's talking about texture coordinates but it's exactly the same problem
Once normalized, move to screen space x[1280,720] and then the Z coordinate can be used to generate a pixel value for openCV to chew on. You'll need to decide how to color pixels that don't correspond to depth points on your own, and advisedly, before you use the depth information to further colorize pixels.
The main thing is to remember that the raw coordinates returned are already using the basis vectors you want, i.e. you do not want the pose attitude or location
When you have to display a series of visual components (sprites) within the context of a game each taking a literal height and width that needs to be relative to the height & width of the Viewport (not necessarily aspect ratio) of the target device:
Is there a scaling class to help come up with scaling ratio in a dynamic fashion based on current device viewport size?
Will I need to roll my own scaling ratio algorithm?
Any cross platform issues I should be aware of?
This is not a question relating to the loading of assets based on target device nor is it a question of how to perform the scaling of the sprite (which is described here: http://msdn.microsoft.com/en-us/library/bb194913.aspx), rather a question of how to determine the scale of sprites based on view port size.
You can always create your own implementation of scaling.
For example, the default target viewport dimensions are:
const int defaultWidth = 1280, defaultHeight = 720;
And your current screen dimensions are 800×600, which gives you a (let's use a Vector2 instead of two floats):
int currentWidth = GraphicsDevice.Viewport.Width,
currentHeight = GraphicsDevice.Viewport.Height;
Vector2 scale = new Vector2(currentWidth / defaultWidth,
currentHeight / defaultHeight);
This gives you a {0.625; 0.83333}. You can now use this in a handy SpriteBatch.Draw() overload that takes a Vector2 scaling variable:
public void Draw (
Texture2D texture,
Vector2 position,
Nullable<Rectangle> sourceRectangle,
Color color,
float rotation,
Vector2 origin,
Vector2 scale,
SpriteEffects effects,
float layerDepth
)
Alternatively, you can draw all your stuff to a RenderTarget2D and copy the resulting image from there to a stretched texture on the main screen, but that will still require the above SpriteBatch.Draw() overload, though it might save you time if you have lots of draw calls.
Another Option to generate the scale would be to leverage:
var scaleMatrix = Matrix.CreateScale(
(float)GraphicsDevice.Viewport.Width / View.Width,
(float)GraphicsDevice.Viewport.Width / View.Width, 1f);
http://msdn.microsoft.com/en-gb/library/bb195692.aspx.
But this did not meet my needs, as I would then have to roll my own transform to map touch input location to the 'transformed' sprites (which respond to user touch input by knowing their own position and size).
In the end I used a percentage based approach.
I basically got the viewport height and width...
GraphicsDevice.Viewport.Width
GraphicsDevice.Viewport.Height
...then calculated the Height and Width of my sprites (Note: "as mentioned in question they take a literal height and width") based on their relative size to the screen myself using percentages.
//I want the buttons height and width to be 20% of the viewport
var x, y = GraphicsDevice.Viewport.Width * 0.2f; //20% of screen width
var btnsize = new Vector(x,y);
var button = new GameButton(btnsize);
Then once I have the size of the button I am able to calculate the position on the screen to render the button based of the size of the button and the available viewport size, against working in relative position based in percentages.