Methods to track marked points in a stationary video? - image-processing

Not sure where to ask this. Please redirect me if SO is not the place.
I want make a web app that accurately tracks pose in a stationary video of someone pedaling a stationary bike. The joints can be marked with some stickers to make the process easier and more accurate. Basically, I want to do what does this app.
First i tried markerless tracking using pose estimation models such as mediapipe's Blazepose and google's MoveNet. However, these are not accurate enough. I would also like to track some additional landmarks (ball of the foot,...).
Then I tried OpenCV.js's Lukas-Kanade optical flow method. But the algorithm lost the tracked point quickly. Even when i placed a colored tape on the part of the body that i wanted to track.
I also tried template matching a single marked point in opencv but it was not very robust, and it would probably not work well when using more markers.
What other methods can I try? Since the app i send the video of requires stickers to be placed, I though it is using something like Lukas-Kanade. But as I said, when I tried it, it wasn't able to track the marked point. Because the app is only on iOS I thought it may be using this API. However, this is only my speculation.
Edit: added example video: https://www.youtube.com/watch?v=eCNyyABfWSE
I tried shooting in slowmo to have more fps, but the quality suffered because of this. Also i didn't have blue or green tape so I had to use yellow, which is not very visible on the sweater or on my wrist. But the markers on the pants should be trackable right?

Related

Detecting a real world object using ARKit with iOS

I am currently playing a bit with ARKit. My goal is to detect a shelf and draw stuff onto it.
I did already find the ARReferenceImage and that basically works for a very, very simple prototype, but the image needs to be quite complex it seems? Xcode always complains if I try to use something a lot simpler (like a QR-Code like image). With that marker I would know the position of an edge and then I'd know the physical size of my shelf and know how to place stuff into it. So that would be ok, but I think small and simple markers will not work, right?
But ideally I would not need a marker at all.
I know that I can detect e.g. planes, but I want to detect the shelf itself. But as my shelf is open, it's not really a plane. Are there other possibilities to find an object using ARKit?
I know that my question is very vague, but maybe somebody could point me in the right direction. Or tell me if that's even possible with ARKit or if I need other tools? Like Unity?
There are several different possibilities for positioning content in augmented reality. They are called content anchors, and they are all subclasses of the ARAnchor class.
Image anchor
Using an image anchor, you would stick your reference image on a pre-determined spot on the shelf and position your 3D content relative to it.
the image needs to be quite complex it seems? Xcode always complains if I try to use something a lot simpler (like a QR-Code like image)
That's correct. The image needs to have enough visual detail for ARKit to track it. Something like a simple black and white checkerboard pattern doesn't work very well. A complex image does.
Object anchor
Using object anchors, you scan the shape of a 3D object ahead of time and bundle this data file with your app. When a user uses the app, ARKit will try to recognise this object and if it does, you can position your 3D content relative to it. Apple has some sample code for this if you want to try it out quickly.
Manually creating an anchor
Another option would be to enable ARKit plane detection, and have the user tap a point on the horizontal shelf. Then you perform a raycast to get the 3D coordinate of this point.
You can create an ARAnchor object using this coordinate, and add it to the ARSession.
Then you can again position your content relative to the anchor.
You could also implement a drag gesture to let the user fine-tune the position along the shelf's plane.
Conclusion
Which one of these placement options is best for you depends on the use case of your app. I hope this answer was useful :)
References
There are a lot of informative WWDC videos about ARKit. You could start off by watching this one: https://developer.apple.com/videos/play/wwdc2018/610
It is absolutely possible. If you do this in swift or Unity depends entirely on what you are comfortable working in.
Arkit calls them https://developer.apple.com/documentation/arkit/arobjectanchor. In other implementations they are often called mesh or model targets.
This Youtube video shows what you want to do in swift.
But objects like a shelf might be hard to recognize since their content often changes.

iOS timecode-synced downloadable animation system

As an introduction and context, I'm currently a novice iOS app developer and I want to make sure I'm not reinventing the wheel too much as I make this app (reinventing wheels can get very expensive.)
The app will allow the user to download our videos off the internet and will allow storage for offline usage. The problem with storing these videos on the device is that many of them will be too long and thus too big to be practical to store.
The videos are quite simple however, consisting of a couple short "real" video clips at the beginning and end, with the bulk of the video being still images animated around the screen. The animations would consist solely of opacity and simple transformation keyframes (translate, scale, rotate around static anchor point), and would require a variety of easing functions for each transition.
The hardest part likely would be that the "video" player will also have to be able to track with an audio player's timecode, and will have to support seeking to any arbitrary point like a normal video player.
So, now that I've described the problem, here's the solution I've come up with so far. Hopefully doing it this way will reduce the probability of XY problems. :)
The idea is to basically do a dumbed-down version of what Final Cut and other editing programs do with animations—have a bunch of clips, sometimes overlapping, and be able to animate the position, scale, rotation, and opacity of each using keyframes.
My first instinct as far as implementation goes is to use some of iOS's game engine stuff to do animations (maybe SceneKit because it seems to allow animations to use scene time as opposed to real time, despite the fact that it's primarily 3d and I am doing 2d animations) and manually handle syncing time with the audio player, as well as manually handling the adding and removing of nodes from the scene when seeking through the video and when clips begin/end.
What are some built-in systems, plugins, etc. that I can take advantage of to make this easier and faster to develop and maintain? Double points if I don't have to transcode the animations by hand to some custom format.
As I mentioned in my comment your question is rather broad and contains multiple questions in one, I will address what you mentioned to be likely the hardest part:
https://developer.apple.com/documentation/avfoundation/avplayeritem
https://developer.apple.com/documentation/avfoundation/avasset
Instead of SceneKit, take a look at SpriteKit and its SKVideoNode.
Also, research Metal video processing. There are quit a few example projects available you could use as a starting point.

How to track an opened hand in any environment with RGB camera?

I want to make a movable camera that tracks an opened hand (toward the floor). It just needs to track the opened hand but it has to also know the rotation (2d rotation).
This is what I searched for so far:
Contour- As the camera is movable, the background is unknown, even the lighting is not fixed. It's hard for me to get a clear hand
segment in real time.
Haar- It seems this just returns a rect and can't deal with rotation.
Feature detect- A hand doesn't have enough detail for this.
I am using the Opencv Unity plugin to do this.
EDIT
https://www.codeproject.com/Articles/826377/Rapid-Object-Detection-in-Csharp
I see another library can do something like this. Can OpenCV also do this?

Image tracking - tracking a screen with a camera

I want to track the relative position of a camera aimed at a computer screen.
I can’t control what is displayed on the computer screen but I can receive screen dumps whenever something changes on the screen. Those screen dumps can hopefully be used to find the screen when analyzing the video from the camera.
I see many videos on youtube for face, logo or single colored objects tracking using OpenCV but I’m unsure those methods would work finding and tracking a more detailed image like a screen dump.
Maybe Template Matching is the way to go? But I need to find the screen even at an angle.
Basically I don’t know where to begin and need help from people with experience in this field to find the best way for achieving what I want.
Thanks
Using feature matching should do the trick (Sift/SURF/ORB/...)

Extracting slides from video lectures using OpenCV

I would like to extract out all the slides from a video lecture, using OpenCV. Here is an example of a lecture: http://www.youtube.com/watch?v=-hxOpz9c0bY.
What approaches would you recommend? So far, I've tried:
Comparing the change in grayscale intensity from frame to frame. This can have problems when an object in the foreground moves around. For example, in this lecture, there's a hand that moves around: http://www.youtube.com/watch?v=mNzu42FrlHo#t=07m00s.
Using SURF features and doing comparisons frame by frame. This approach seems kind of slow.
Does anyone have other ideas?
Most of this work is most likely already done by video encoder. You just need to extract key-frames and check how well compressed are frames between them.
It should be also fairly easy to distinguish still images. You can save lot of time by examining just the key-frames. Slides are likely to have high contrast, solid shapes, solid background. Lecture hall has blurry shapes and low contrast.
What you need is a scene change detection. After that, you'll have to classify scenes as "lecture hall" or "presentation". As for the problem with hands - you could use background subtraction with an adaptive background (just make sure you mask the foreground... you don't want the foreground to become a part of the background).
You could try an edge detection and look for a rectangular object - the slides (above a certain area threshold). You could further reduce FPs by looking for some text within the rectangle.
There are several reasons to extract slides/frames from a video presentation, especially in the case of education or conference related videos. It allows you to access the study notes without watching the whole video.
I have faced this issue several times, so I decided to create a solution for it myself using python. I have made the code open-source, you can easily set up this tool and run it in few simple steps.
Refer to this for a youtube video tutorial. Steps on how to use this tool.
Clone this project video2pdfslides
Set up your environment by running "pip install -r requirements.txt"
Copy your video path
Run "python video2pdfslides.py <video_path>"
Boom! the pdf slides will be available in the output folder Make notes and enjoy!

Resources