With Augmented Reality, is there any way to display a 3d model or anything without the need for pattern recognition? There is a service that allows you to try on rings using augmented reality but it required pattern recognition.
Technically there is a pattern, but it's an invisible pattern...
When you do 3D tracking or SLAM, you're not tracking a printed pattern, but a map made of the object or environment. However, hands are hard. It works better with static objects like buildings.
Here's some more info about it:
http://dev.metaio.com/toolbox/
It depends on what exactly you're augmenting. If for example, you have a phone with accelerometers etc, and for example you want to show the cheapest price of a pint of cider above a nightclub's door, you don't need pattern recognition. However, if you want to process what the camera is seeing (e.g. your ring example) and respond to that, then you do need pattern recognition. You also need pattern recognition if you're some how responding to audio - or things like that - not that I can think of how or why you'd want to do that.
Related
This is the first mini project I am working on and I have been searching for some information regarding yolo. I want to know if we could train yolo to recognise objects in a real time webcam and set up a boundary (not to be confused with the boundary boxes) that sends out a simple alert if the so called object (in our case, a face) goes out of the boundary.
This is my first time asking here and I don't know if it is appropriate to do so. please let me know and I will be reading APIs related to motion detection. If there are any suggestions, please do give them.
I would check out this open source CCTV solution called https://shinobi.video/. I used it once to do motion detection and I think it could be much easier for you than building something from scratch.
Here are some articles they have that sound related to what you are trying to do:
https://hub.shinobi.video/articles/view/JtJiGkdbcpAig40
https://hub.shinobi.video/articles/view/xEMps3O4y4VEaYk
My goal is to overlay material/texture on a physical object (it would be an architectural model) that I would have an identical 3d model of. The model would be static (on a table if that helps), but I obviously want to look at the object from any side. The footprint area of my physical models would tend to be no smaller than 15x15cm and could be as large as 2-3m^2, but I would be willing to change the size of the model to work with ARCore's capability.
I know ARCore is mainly designed to anchor digital objects to flat horizontal planes. My main question is, in its current state, is it capable of accompliahing my end goal? If i have this right, it would record physical point cloud data and attempt to match it to point cloud data of my digital model, then overlapping the two on the phone screen?
If that really isn't what ARCore is for, is there an alternative that I should be focusing on? In my head this sounded fairly straightforward, but I'm sure I'll get way out of my depth if I go about it an inefficient way. Speaking of depth, I would prefer not to use a depth sensor, since my target devices are phones.
I most definitely hope that it will be possible in the future - after all an AR toolkit without Computer Vision is not that helpful.
Unfortunately, according to the ARCore employee Ian, this is currently not directly supported but you could try to access the pixels via glReadPixels and then use OpenCV with these image bytes.
Quote from Ian:
I can't speak to future plans, but I agree that it's a desirable
capability. Unfortunately, my understanding is that current Android
platform limitations prevent providing a single buffer that can be
used as both a GPU texture and CPU-accessible image, so care must be
taken in providing that capability.
Updated: 25 September, 2022.
At the moment there's still no 3D Object Recognition API in ARCore 1.33.
But... You can use ML Kit framework and Augmented Images API (ARCore 1.2+) for some tasks.
According to Google documentation, you can use ARCore as input for Machine Learning models.
I am working on an open source package for robot owners. I want to do a decent job of detecting when the robot is having movement problems. One of the problems the robot commonly has is that the back wheel gets "tucked underneath" in a bad way and makes it turn very slowly when on carpet. I believe that with a combination of accelerometer value inspection and (I hope) a relatively simple yet robust vision analysis technique, I will be able to tell when the robot is having this specific problem.
What I need is to be able to analyze two images, separated by about 1/2 second in time, and get a numerical value that tells about how close they are, but in a way that has some intelligence about the objects in the screen instead of just a simple color/hue/etc. analysis. I've heard of an algorithm called optical flow that is used in object and scene tracking, but I'm hoping I don't need something heavyweight.
Is there an machine vision algorithm/function that can analyze two JPEG's and tell if they belong to the same scene and viewpoint, yet can also deliver a numerical monotonically increasing value that tells me rough how different they are? If I could get that numerical value and compare it to the number of milliseconds past, while examining the current accelerometer activity, I believe I can detect when the robot is having the "slow turn of death" problem.
If so, please tell me the basic technique involved, and if you know of machine vision library that implements it, which one it is.
but in a way that has some intelligence about the objects in the screen instead of just a simple color/hue/etc. analysis
What you are suggesting is a complex problem by itself, so forget about 'lightweight' solutions. Probably you are going to need something like optical flow.
Other options I would recommend you looking into are:
Vanishing points detection and variation from image to image. This quite fits into your problem domain Wikipedia
Disparity map: related to optical flow. Used for stereographic vision, but I think you can use it for the kind of application you are looking for. Take a look at this
I'm planning on doing my Final Year Project of my degree on Augmented Reality. It will be using markers and there will also be interaction between virtual objects. (sort of a simulation).
Do you recommend using libraries like ARToolkit, NyARToolkit, osgART for such project since they come with all the functions for tracking, detection, calibration etc? Will there be much work left from the programmers point of view?
What do you think if I use OpenCV and do the marker detection, recognition, calibration and other steps from scratch? Will that be too hard to handle?
I don't know how familiar you are with image or video processing, but writing a tracker from scratch will be very time-consuming if want it to return reliable results. The effort also depends on which kind of markers you plan to use. Artoolkit e.g. compares the marker's content detected from the video stream to images you earlier defined as markers. Hence it tries to match images and returns a value of probability that a certain part of the video stream is a predefined marker. Depending on the threshold you are going to use and the lighting situation, markers are not always recognized correctly. Then there are other markers like datamatrix, qrcode, framemarkers (used by QCAR) that encode an id optically. So there is no image matching required, all necessary data can be retrieved from the video stream. Then there are more complex approaches like natural feature tracking, where you can use predefined images, given that they offer enough contrast and points of interest so they can be recognized later by the tracker.
So if you are more interested in the actual application or interaction than in understanding how trackers work, you should base your work on an existing library.
I suggest you to use OpenCV, you will find high quality algorithms and it is fast. They are continuously developing new methods so soon it will be possible to run it real-time in mobiles.
You can start with this tutorial here.
Mastering OpenCV with Practical Computer Vision Projects
I did the exact same thing and found Chapter 2 of this book immensely helpful. They provide source code for the marker tracking project and I've written a framemarker generator tool. There is still quite a lot to figure out in terms of OpenGL, camera calibration, projection matrices, markers and extending it, but it is a great foundation for the marker tracking portion.
I had an idea for which I need to be able to recognize certain objects or models from a rendered three dimensional digital movie.
After limited research, I know now that what I need is called feature detection in the field of Computer Vision.
So, what I want to do is:
create a few screenshots of a certain character in the movie (eg. front/back/leftSide/rightSide)
play the movie
while playing the movie, continuously create new screenshots of the movie
for each screenshot, perform feature detection (SIFT?, with openCV?) to see if any of our character appearances are there (they must still be recognized if the character is further away and thus appears smaller, or if the character is eg. lying down).
give a notice whenever the character is found
This would be possible with OpenCV, right?
The "issue" is that I would have to learn c++ or python to develop this application. This is not a problem if my movie and screenshots are applicable for what I want to do.
So, I would like to first test my screenshots of the movie. Is there a GUI version of OpenCV that I can input my test data and then execute it's feature detection algorithms manually as a means of prototyping?
Any feedback is appreciated. Thanks.
There is no GUI of OpenCV able to do what you want. You will be able to use OpenCV for some aspects of your problem, but there is no ready-made solution waiting there for you.
While it's definitely possible to solve your problem, the learning curve for this problem is quite long. If you're a professional, then an alternative to learning about it yourself would be to hire an expert to do it for you. It would cost money, but save you time.
EDIT
As far as template matching goes, you wouldn't normally use it to solve such a problem because the thing you're looking for is changing appearance and shape. There aren't really any "dynamic parameters to set". The closest thing you could try is have a massive template collection that would try to cover the expected forms that your target may take. But it would hardly be an elegant solution. Plus it wouldn't scale.
Next, to your point about face recognition. This is kind of related, but most facial recognition applications deal with a controlled environment: lighting, distance, pose, angle, etc. Outside of that controlled environment face detection effectiveness drops significantly. If you're detecting objects in a movie, then your environment isn't really controlled.
You may want to first try a simpler problem of accurately detecting where the characters are, without determining who they are (video surveillance, essentially). While it may sound simple, you'll find that it's actually non-trivial for arbitrary scenes. The result of solving that problem may be useful in identifying the characters.
There is Find-Object by Mathieu Labbé. It was very helpful for me to start getting an understanding of the descriptors since you can change them while your video is running to see what happens.
This is probably too late, but might help someone else looking for a solution.
Well, using OpenCV you would of taking a frame of a video file and do any computations on it.
You can do several different methods of detecting a character on that image, but it's not so easy to have it as flexible so you can even get that person if it's lying on the floor for example, if you only entered reference images of that character standing.
Basically you could try extracting all important features from your set of reference pictures and have a (in your case supervised) learning algorithm that gets a good feature-vector of that character for classification.
You then need to write your code that plays the video and which takes a video frame let's say each 500ms (or other as you desire), gets a segmentation of the object you thing would be that character and compare it with the reference values you get from your learning algorithm. If there's a match, your code can yell "Yehaaawww!" or do other things...
But all this depends on how flexible you want this to be. You could also try a template match or cross-correlation which basically shifts the reference image(s) over the frame and checks how equal both parts are. But this unfortunately is very sensitive for rotation, deformations or other noise... so you wouldn't get that person if its i.e. laying down. And I doubt you can get all those calculations done in realtime...
Basically: Yes OpenCV is good to use for your image processing/computer vision tasks. But it offers a lot of methods and ways and you'd need to find a way that works for your images... it's not a trivial task though...
Hope that helps...
Have you tried looking at some of the work of the Oxford visual geometry group?
Their Video Google system describes to a large extent what you want, instance detection.
Their work into Naming People in TV shows is also pretty relevant. A face detection and facial feature pipeline is included that can be run from Matlab. Are you familiar with Matlab?
Have you tried computer vision frameworks like Cassandra? There you can exactly do that just by some mouse clicks.