Please help me out. I am struggeling for weeks now with ARKit and LiDAR to build an outdoor app. In the app I want to place an object, save the scene with the object and reconstruct it later on. Now I am using more (invisable) anchors around the object, so it will be easier to reconstruct the saved experience. In the first place it worked well.
But there is one thing. If I've saved the scene on a clowdy (darker) day, it's impossible to reconstruct it on a sunny day. How is this possible? I was thinking that by using LiDAR it's possible to use it in every environment (dark, light, shadows, etc.). I've seen Andy's great article about light estimation (isLightEstimationEnabled), but I don't need to the give the object extra light. It looks that the environment can not be scanned on the right way. Am I doing something wrong, or is there a simple command like: configurartion.useLiDARonlyAndIgnoreLightning?
Thanks,
Marc
As far as we all know, AI is heavily involved in ARKit. This is also true when using a LiDAR scanner. To be honest, I have no idea how to make a LiDAR reconstruct a 3D mesh the same way, whether it's a sunny day or a cloudy one. Such lighting conditions are different for LiDAR + RGBCam vision, because there may be bright hotspots that make recognizable objects unrecognizable. AI remembers lighting situation. And how can we ignore the fact that our mesh is updated dynamically?
Nevertheless, it seems to me that you can slightly improve this situation by disabling the classifier (use .mesh property instead of .meshWithClassification).
config.sceneReconstruction = .mesh
Related
Think if someone in real life waved their hand and hit the 3D object in AR, how would I detect that? I basically want to know when something crosses over the AR object so I can know that something "hit" it and react.
Another example would be to place a virtual bottle on the table and then wave your hand in the air where the bottle is and then it gets knocked over.
Can this be done? If so how? I would prefer unity help but if this can only be done via Xcode and ARKit natively, I would be open to that as well.
ARKit does solve a ton of issues with AR and make them a breeze to work with. Your issue just isn't one of them.
As #Draco18s notes (and emphasizes well with the xkcd link 👍), you've perhaps unwittingly stepped into the domain of hairy computer vision problems. You have some building blocks to work with, though: ARKit provides pixel buffers for each video frame, and the projection matrix needed for you to work out what portion of the 2D image is overlaid by your virtual water bottle.
Deciding when to knock over the water bottle is then a problem of analyzing frame-to-frame differences over time in that region of the image. (And tracking that region's movement relative to the whole camera image, since the user probably isn't holding the device perfectly still.) The amount of of analysis required varies depending on the sophistication of effect you want... a simple pixel diff might work (for some value of "work"), or there might be existing machine learning models that you could put together with Vision and Core ML...
You should take a look at ManoMotion: https://www.manomotion.com/
They're working on this issue and suppose to release a solution in form of library soon.
I have a next task: get a room 3d projection from multiple images (possible video stream, doesn't matter). There will be spherical camera (in fact multiple cameras on sphere-like construction), so the case is the right one on the image.
I decided to code it on iOS platform as I'm iOS developer and model cameras with iPhone cam rotating it as shown on the pic above. As I can decompose this task, first I need to get real distance to the objects (walls in most cases, I think). Is it possible? Which algoritms/methods should I use to achieve this? I don't ask you to make the task for me obviously, but give me the direction, because I have no idea, maybe some equations/tutorials/algorithms with explanation to my case. Thank you!
The task of building a 3D model from multiple 2D images is called "scene reconstruction." It's still an active area of research, but solutions involve recognizing the same keypoint (e.g. a distinctive part of an object) in two images. Once you have that, you can use the known camera geometry to solve for the 3D position of that keypoint in the world.
Here's a reference:
http://docs.opencv.org/3.1.0/d4/d18/tutorial_sfm_scene_reconstruction.html#gsc.tab=0
You can google "scene reconstruction" to find lots more, and papers that go into more detail.
For SCNFloor, if the reflective is set to 1 and reflectionFallOffEnd is big enough, it will be like a mirror.
My question is how to apply this to other geometries (say plane or box)? As I want to have a mirror in my game.
I have done quite a bit of research on how to make reflections using Scenekit.
Here are the different leads I found (sadly, they will all need a serious amount of code and research):
Screen-Space reflections
Pros :
Cheap
Easy to make
Cons
Doesn't always look great
I'm not sure how to output a normal pass with SCNTechnique
Parallax-mapped cubemaps
Pros :
Cheap
Looks amazing
Cons
No real time objects unless using an image proxy
No good code sample online, will need research
Not quite sure how to use it with SCNProgram
Two cameras + Stencil
Pros :
Realistic
Real time
Almost built in
Cons
No documentation of the pointOfView of SCNTechnique
No documentation on Stencils
Needs to render the scene twice
OpenGL mirror
Pros :
Actually duplicates the geometry, so very accurate
This is the technique used by SCNFloor (I think)
Cons
Geometry can clip with the mirror plane (happens with SCNFloor)
Unusable on anything other than a plane
Needs OpenGL Code
4 Cameras linked to a Cubemap
Pros :
Easy to set up
Real time
Works on any object
Very popular technique in modern Video Games
Cons
I have no idea if this would work
Will need to render the scene 5 times for a single mirror
Not very accurate depending on object
My conclusion is that we need more help on using SCNTechnique. We could build amazing things with it but the lack of documentation and examples is a big problem.
If you could specify what kind of mirror you have in mind, I'll be happy to help you choose the best way to go.
I know this is an old question, but I wanted to share what I have done. I created a gist on GitHub that contains the code and explains how it works.
It basically attaches six cameras to a node and automatically creates a cubemap that is then used as the reflective property of the object. The main downside is that it won't work with physically based materials, but in order to simulate roughness, it blurs the cubemap to whatever you set the roughness property to. It works well in real time and you can set how quickly the cubemaps update so that you are not affecting the framerate of your game too much. It can also handle many different reflective objects and automatically stops updating nodes that you can't see.
This is currently not supported on other geometry types. Please file a request to Apple.
I am looking for camera calibration techniques with OpenCV and saw the chessboard and circles methods, but I wanted to calibrate the camera with something that is in the real world and you don't have to print (printers are also not very accurate in what they print).
Is it possible to do calibration with complex shapes like the Coca Cola logo on the cans? Is it a problem that the surface is curved?
Thanks
Depending on what you want to achieve this is not at all necessarily a bad idea, and you are not the first one who had it. There was a technology that uses a CD, which is a strongly standardised object which at least used to exist on most households, for a simple camera calibration task. (There is little technical to be found online about this, as the technology was proprietary. This is business document, where the use of the CD is mentioned. Algorithmically, however, it is not difficult if you know camera calibration.)
The question is whether the precision you get is sufficient for your application. Don't expect any miracles here. Generally you can use almost any object you like to learn something about a camera, as long as you can detect it reliably and you know its geometry. Almost certainly you will have to take several pictures of the object. Curved surfaces are no problem per see. I regularly used a cylinder (larger than a beverage can, though, with a simple to detect pattern) to calibrate a complete camera rig of 12 SLRs.
Don't expect to find out of the box solutions and don't expect implementation to be trivial. You will have to work your way through the math. I recommend the book by Hartley and Zisserman, Multiple View Geometry for Computer vision. This paper describes an analysis-by-synthesis approach to calibration, which is the way to go for here (it does not describe exactly what you want, but the approach should generalise to arbitrary objects as long as you can detect them).
i can understand your wish, but it's a bad idea.
the calibration algorithm works by comparing real world points from the cam with a synthetical model ( yes, you have to supply that , too! ). so, while it's easy to calculate a 2d chessboard grid on the fly and use that, it will be very hard to do for your tin can, or any arbitrary household item you grab.
just give in, and print a rectangular chessbord grid to a piece of paper
(opencv comes with a pdf for that already).
don't use a real-life chessboard, a quadratic one is ambiguous to 90° rotation.
interesting idea.
What about displaying a checkerboard pattern (or sth else) on an lcd screen display and use that display as calibration pattern?? You would have to know the displaying size of the pattern though.
Googling I found this paper:
CAMERA CALIBRATION BASED ON LIQUID CRYSTAL DISPLAY (LCD)
ZHAN Zongqian
http://www.isprs.org/proceedings/XXXVII/congress/3b_pdf/04.pdf
comment: this doesn't answer the question about the coca-cola can but gives and idea for a solution to the grounding problem: camera calibration with a common object.
I have a very specific application in which I would like to try structure from motion to get a 3D representation. For now, all the software/code samples I have found for structure from motion are like this: "A fixed object that is photographed from all angle to create the 3D". This is not my case.
In my case, the camera is moving in the middle of a corridor and looking forward. Sometimes, the camera can look on other direction (Left, right, top, down). The camera will never go back or look back, it always move forward. Since the corridor is small, almost everything is visible (no hidden spot). The corridor can be very long sometimes.
I have tried this software and it doesn't work in my particular case (but it's fantastic with normal use). Does anybody can suggest me a library/software/tools/paper that could target my specific needs? Or did you ever needed to implement something like that? Any help is welcome!
Thanks!
What kind of corridors are you talking about and what kind of precision are you aiming for?
A priori, I don't see why your corridor would not be a fixed object photographed from different angles. The quality of your reconstruction might suffer if you only look forward and you can't get many different views of the scene, but standard methods should still work. Are you sure that the programs you used aren't failing because of your picture quality, arrangement or other reasons?
If you have to do the reconstruction yourself, I would start by
1) Calibrating your camera
2) Undistorting your images
3) Matching feature points in subsequent image pairs
4) Extracting a 3D point cloud for each image pair
You can then orient the point clouds with respect to one another, for example via ICP between two subsequent clouds. More sophisticated methods might not yield much difference if you don't have any closed loops in your dataset (as your camera is only moving forward).
OpenCV and the Point Cloud Library should be everything you need for these steps. Visualization might be more of a hassle, but the pretty pictures are what you pay for in commercial software after all.
Edit (2017/8): I haven't worked on this in the meantime, but I feel like this answer is missing some pieces. If I had to answer it today, I would definitely suggest looking into the keyword monocular SLAM, which has recently seen a lot of activity, not least because of drones with cameras. Notably, LSD-SLAM is open source and may not be as vulnerable to feature-deprived views, as it operates directly on the intensity. There even seem to be approaches combining inertial/odometry sensors with the image matching algorithms.
Good luck!
FvD is right in the sense that your corridor is a static object. Your scenario is the same and moving around and object and taking images from multiple views. Your views are just not arranged to provide a 360 degree view of the object.
I see you mentioned in your previous comment that the data is coming from a video? In that case, the problem could very well be the camera calibration. A camera calibration tells the SfM algorithm about the internal parameters of the camera (focal length, principal point, lens distortion etc.) In the absence of knowledge about these, the bundler in VSfM uses information from the EXIF data of the image. However, I don't think video stores any EXIF information (not a 100% sure). As a result, I think the entire algorithm is running with bad focal length information and cannot solve for the orientation.
Can you extract a few frames from the video and see if there is any EXIF information?