OpenGL ES 2.0 Vertex Transformation Algorithms - ios

I'm developing an image warping iOS app with OpenGL ES 2.0.
I have a good grasp on the setup, the pipeline, etc., and am now moving along to the math.
Since my experience with image warping is nil, I'm reaching out for some algorithm suggestions.
Currently, I'm setting the initial vertices at points in a grid type fashion, which equally divide the image into squares. Then, I place an additional vertex in the middle of each of those squares. When I draw the indices, each square contains four triangles in the shape of an X. See the image below:
After playing with photoshop a little, I noticed adobe uses a slightly more complicated algorithm for their puppet warp, but a much more simplified algorithm for their standard warp. What do you think is best for me to apply here / personal preference?
Secondly, when I move a vertex, I'd like to apply a weighted transformation to all the other vertices to smooth out the edges (instead of what I have below, where only the selected vertex is transformed). What sort of algorithm should I apply here?

As each vertex is processed independently by the vertex shader, it is not easy to have vertexes influence each other's positions. However, because there are not that many vertexes it should be fine to do the work on the CPU and dynamically update your vertex attributes per frame.
Since what you are looking for is for your surface to act like a rubber sheet as parts of it are pulled, how about going ahead and implementing a dynamic simulation of a rubber sheet? There are plenty of good articles on cloth simulation in full 3D such as Jeff Lander's. Your application could be a simplification of these techniques. I have previously implemented a simulation like this in 3D. I required a force attracting my generated vertexes to their original grid locations. You could have a similar force attracting vertexes to the pixels at which they are generated before the simulation is begun. This would make them spring back to their default state when left alone and would progressively reduce the influence of your dragging at more distant vertexes.

Related

Get GLSL vertex shader positions back to use on cpu event collider functions

I'm using python kivy to render meshes with opengl onto a canvas. I want to return vertex data from the fragment shader so i can build a collider (to use on my cpu event listeners after doing projection and model view transforms). I can replicate the matrix multiplications on the cpu (i guess that's the easy way out), but then i would have to do the same calculations twice (not good).
The only way I can think of doing this (after some browsing) is to imprint an object id onto my rendered mesh alpha channel (wouldn't affect much if i'd keep data coding near value 1 for alpha ). And create some kind of 'color picker' on the cpu side to decode it (I'm guessing that's not hard to do using kivy).
Anyone has a better idea to deal with this? Or a better approach?
First criterion here is: do you need collision for picking or for physics simulation?
If it is for physics: you almost never want the same mesh for rendering and for physics collisions. Typically, you use a very rough approximation for the physics shape, nearly always a convex shape, or a union of convex shapes. (Colliding arbitrary concave meshes is something that no physics engine can do well, and if they attempt it at all, performance will be poor.)
If it is for the purpose of picking an object with a mouse-click: you can go two different ways for this:
You replicate the geometry on the CPU, and use the mouse-location plus camera-view to create a ray that intersects this geometry, to see what is hit first.
After rendering your scene, you read back a single pixel from the depth buffer. (The pixel that your mouse is over.) With the depth value you get back, plus camera info, you can reconstruct a corresponding 3D position in your world. Once you have a 3D location, you can query your world to see which object is the closest to this point, and you will have your hit.

How do I design a Animoji 3D model used by ARKit?

I want to create an Animoji in my APP. But when I contact with some designers they didn't know how to design an Animoji 3D model. Where can I find a solution for reference?
Solution I can thought is create many bones on face of 3D model, And when I get blendShapes of ARFaceAnchor, which contain the detail information of face expression, then I use it to update bone animations of partial face.
Thank you for reading. Any advises is appreciated.
First, to clear the air a bit: Animoji is a product built on top of ARKit, not in any way a feature of ARKit itself. There's no simple path to "build a model in this format and it 'just works' in (or like) Animoji".
That said, there are multiple ways to use the face expression data vended by ARKit to perform 3D animation, so how you do it depends more on what you and your artist are comfortable with. And remember, for any of these you can use as many or as few of the blend shapes as you like, depending on how realistic you want the animation to be.
Skeletal animation
As you suggested, create bones corresponding to each of the blend shapes you're interested in, along with a mapping of blend shape values to bone positions. For example, you'll want to define two positions for the bone for the browOuterUpLeft parameter such that one of them corresponds to a value of 0.0 and another to a value of 1.0 and you can modulate its transform anywhere between those states. (And set up the bone influences in the mesh such that moving it between those two positions creates an effect similar to the reference design when applied to your model.)
Morph target animation
Define multiple, topologically equivalent meshes, one for each blend shape parameter you're interested in. Each one should represent the target state of your character for when that blend shape's weight is 1.0 and all other blend shapes are at 0.0.
Then, at render time, set each vertex position to the weighted average of the same vertex's position in all blend shape targets. Pseudocode:
for vertex in i..<vertexCount {
outPosition = float4(0)
for shape in 0..<blendShapeCount {
outPosition += targetMeshes[shape][vertex] * blendShapeWeights[shape]
}
}
An actual implementation of the above algorithm is more likely to be done in a vertex shader on the GPU, so the for vertex part would be implicit there — you'd just need to feed all your blend shape targets in as vertex attributes. (Or use a compute shader?)
If you're using SceneKit, you can let Apple implement the algorithm for you by feeding your blend shape target meshes to SCNMorpher.
This is where the name "blend shape" comes from, by the way. And rumor has it the built-in ARFaceGeometry is built this way, too.
Simpler and Hybrid approaches
As you can see in Apple's sample code, you can go even simpler — breaking a face into separate pieces (nodes in SceneKit) and setting their positions or transforms based on the blend shape parameters.
You can also combine some of these approaches. For example, a cartoon character could use morph targets for skin deformation around the mouth, but have floating 2D eyebrows that animate simply through setting node positions.
Check-out the 'weboji' javascript library on gitHub. The CG artists we hired to create the 3D models get used with the workflow in minutes. Also, it could be an interesting approach to avoid proprietary formats and closed ecosystem issues.
Screenshots of a 3D Fox (THREE.JS based demo) and a 2D Cartman (SVG based demo).
Demo on youtube featuring a 2D 'Cartman'.

Detecting balls on a pool table

I'm currently working on a project where I need to be able to very reliable get the positions of the balls on a pool table.
I'm using a Kinect v2 above the table as the source.
Initial image looks like this (after converting it to 8-bit from 16-bit by throwing away pixels which is not around table level):
Then a I subtract a reference image with the empty table from the current image.
After thresholding and equalization it looks like this: image
It's fairly easy to detect the individual balls on a single image, the problem is that I have to do it constantly with 30fps.
Difficulties:
Low resolution image (512*424), a ball is around 4-5 pixel in diameter
Kinect depth image has a lot of noise from this distance (2 meters)
Balls look different on the depth image, for example the black ball is kind of inverted compared to the others
If they touch each other then they can become one blob on the image, if I try to separate them with depth thresholding (only using the top of the balls) then some of the balls can disappear from the image
It's really important that anything other than balls should not be detected e.g.: cue, hands etc...
My process which kind of works but not reliable enough:
16bit to 8bit by thresholding
Subtracting sample image with empty table
Cropping
Thresholding
Equalizing
Eroding
Dilating
Binary threshold
Contour finder
Some further algorithms on the output coordinates
The problem is that a pool cue or hand can be detected as a ball and also if two ball touches then it can cause issues. Also tried with hough circles but with even less success. (Works nicely if the Kinect is closer but then it cant cover the whole table)
Any clues would be much appreciated.
Expanding comments above:
I recommend improving the IRL setup as much as possible.
Most of the time it's easier to ensure a reliable setup than to try to "fix" that user computer vision before even getting to detecting/tracking anything.
My suggestions are:
Move the camera closer to the table. (the image you posted can be 117% bigger and still cover the pockets)
Align the camera to be perfectly perpendicular to the table (and ensure the sensor stand is sturdy and well fixed): it will be easier to process a perfect top down view than a slightly tilted view (which is what the depth gradient shows). (sure the data can be rotated, but why waste CPU cycles when you can simply keep the sensor straight)
With a more reliable setup you should be able to threshold based on depth.
You can possible threshold to the centre of balls since the information bellow is occluded anyway. The balls do not deform, so it the radius decreases fast the ball probably went in a pocket.
One you have a clear threshold image, you can findContours() and minEnclosingCircle(). Additionally you should contrain the result based on min and max radius values to avoid other objects that may be in the view (hands, pool cues, etc.). Also have a look at moments() and be sure to read Adrian's excellent Ball Tracking with OpenCV article
It's using Python, but you should be able to find OpenCV equivalent call for the language you use.
In terms tracking
If you use OpenCV 2.4 you should look into OpenCV 2.4's tracking algorithms (such as Lucas-Kanade).
If you already use OpenCV 3.0, it has it's own list of contributed tracking algorithms (such as TLD).
I recommend starting with Moments first: use the simplest and least computationally expensive setup initially and see how robuts the results are before going into the more complex algorithms (which will take to understand and get the parameters right to get expected results out of)

Simple flat shading using Stage3D/AGAL

I'm relatively new to 3D development and am currently using Actionscript, Stage3D and AGAL to learn. I'm trying to create a scene with a simple procedural mesh that is flat shaded. However, I'm stuck on exactly how I should be passing surface normals to the shader for the lighting. I would really like to just use a single surface normal for each triangle and do flat, even shading for each. I know it's easy to achieve better looking lighting with normals for each vertex, but this is the look I'm after.
Since the shader normally processes every vertex, not every triangle, is it possible for me to just pass a single normal per triangle, rather than one per vertex? Is my thinking completely off here? If anyone had a working example of doing simple, flat shading I'd greatly appreciate it.
I'm digging up an old question here since I stumbled on it via google and can see there is no accepted answer.
Stage3D does not have an equivalent "GL_FLAT" option for it's shader engine. What this means is that the fragment shader program always receives a "varying" or interpolated value from the output of the three respective vertices (via the vertex program). If you want flat shading, you have basically only one option:
Create three unique vertices for each triangle and set the normal for
each vertex to the face normal of the triangle. This way, each vertex
will calculate the same lighting and result in the same vertex color.
When the fragment shader interpolates, it will be interpolating three
identical values, resulting in flat shading.
This is pretty lame. The requirement of unique vertices per triangle means you can't share vertices between triangles. This will definitely increase your vertex count, causing increased delays during your VertexBuffer3D uploads as well as overall lower frame rates. However, I have not seen a better solution anywhere.

Which is faster: creating a detailed mesh before execution or tessellating?

For simplicity of the problem let's consider spheres. Let's say I have a sphere, and before execution I know the radius, the position and the triangle count. Let's also say the triangle count is sufficiently large (e.g. ~50k triangles).
Would it be faster generally to create this sphere mesh before hand and stream all 50k triangles to the graphics card, or would it be faster to send a single point (representing the centre of the sphere) and use tessellation and geometry shaders to build the sphere on the GPU?
Would it still be faster if I had 100 of these spheres in different positions? Can I use hull/geometry shaders to create something which I can then combine with instancing?
Tessellation is certainly valuable. Especially when combined with displacement from a heightmap. The isolated environment described in your question is bound not to fully answer your question.
Before using tessellation you would need to know that you will become CPU poly/triangle bound and therefore need to start utilizing the GPU to help you increase the overall triangles of your game/scene. Calculations are very fast on the GPU so yes using tessellation multiple subdivision levels is advisable if you are going to do it...though sometimes I've been happy with just subdividing 3-4 times from a 200 tri plane.
Mainly tessellation is used for environmental/static mesh scene objects so that you can spend your tri's on characters and other moving/animated models without becoming CPU bound.
Checkout engines like Unity3D and CryEngine for tessellation examples to help the learning curve.
I just so happen to be working with this at the same time.
In terms of FPS, the pre-computed method would be faster in this situation since you can
dump one giant 50K triangle sphere payload (like any other model) and
draw it in multiple places from there.
The tessellation method would be slower since all the triangles would
be generated from a formula, multiple times per frame.

Resources