How do Google Maps do their panoramas? - actionscript

How do Google Maps do their panoramas in Street View?
Yeah, I know its Flash, but how do they skew bitmaps with Correct Texture Mapping?
Are they doing it on the pixel-level like most Flash 3D engines?, or just applying some tricky transformation to the Bitmaps in the Movieclips?

Flash Panorama Player can help achieve a similar result!
It uses 6 equirectangular images (cube faces) stitched together seamlessly with some 'magic' ActionScript.
Also see these parts of flashpanos.com for plugins, and tutorials with (possibly) documentation.
A quick guide to shooting panoramas so you can view them with FPP (Flash Panorama Player).
Cubic projection cube faces are actually 90x90 degrees rectilinear
images like the ones you get from a normal camera lens. ~ What is VR Photography?

Check out http://www.panoguide.com/. They have howtos, links to software etc.
Basically there are 2 components in the process: the stitching software which creates a single panoramic photo from many separate image sources, then there is the panoramic viewer, which distorts the image as you change your POV to simulate what your eyes would see if you were actually there.

My company uses the Papervision3D flash render engine, and maps a panoramic image (still image or video) onto a 3D sphere. We found that using a spherical object with about 25 divisions along both the axes gives a much better visual result than mapping the same image on the six faces of a cube. Check it for yourself at http://www.panocast.com.
Actually, you could of course distort your image in advance, so that when it is mapped on the faces of a cube, its perspective is just right, but this requires the complete rerendering of your imagery.
With some additional "magic", we can also load still images incrementally, as needed, depending on where the user is looking and at what zoom level (not unlike Google Street View does).

In terms of what Google actually does, Bork had this right. I'm not sure of the exact details (and not sure I could release the details even if I did), but Google stores individual 360 degree streetview scenes in an equirectangular representation for serving. The flash player then uses a series of affine transformations to display the image in perspective. The affine transformations are approximate, but good enough to aggregate to a decent image overall.
The calculation of the served images is very involved, since there are many stages of image processing that have to be done, to remove faces, account for bloom, etc. etc. In terms of actually stitching the panoramas, there are many algorithms for this (wikipedia article). Just one interesting thing I'd like to point out though, as food for thought, in the 360 degree panoramas on street view, you can see the road at the bottom of the image, where there was no camera on the cars. Now that's stitching.

An expensive camera. makes
A 360 degree video
It is pretty impressive to watch a video that allows panning in every direction... which is what street view is without the bandwidth to support the full video.

For those wondering how the Google VR Photographers and editers add the ground to their Equirectangular panoramas, check out the feature called Viewpoint Correction, as seen in software like PTGui:
ptgui.com/excamples/vptutorial.html
(Note that this is NOT the software used by Google)
If you take a closer look at the ground in street view, you see that the stitching seems streched, and sometimes it even overlaps with information from the viewpoint next to the current one. (With that I mean that you can see something in one place, and suddenly that same feature is shown as the ground in the next place, revealing the technique used for the ground stitching).

Related

Panorama of cylindric objects

I want to get the panorama view of cylindrical objects without using special cameras.
The idea was to get a lot of images from different views, cut the center and join these centers together. But I got bad results.
May be somebody knows the best solution for this purpose? May be it's better recognize from video?
Hugin is a great configurable and agile free cross-platform software to stitch panoramic images. You can definitely use it for your task.
If you want to create your own tool for that purpose, you may find useful to read about Hugin's toolchain workflow to know what steps may be needed to achieve nice results.
A possible work flow may be
Take images.
Correct projection depending on lense parameters.
Find and verify control points on image pairs (possible algorithms: SIFT, SURF).
Geometric optimisation (shift, 3D rotation, etc).
Photometric optimisation (exposure values, vignetting, white balance).
Stitch and blend output (cut the centers and join them smoothly together).
You may skip some steps depending on your image capturing conditions. The more similar images are (same camera and cylinder positions, same lighting, etc.) the less image correction you will need.

Computing real depth map of image objects and reconstruction from several images

I have a next task: get a room 3d projection from multiple images (possible video stream, doesn't matter). There will be spherical camera (in fact multiple cameras on sphere-like construction), so the case is the right one on the image.
I decided to code it on iOS platform as I'm iOS developer and model cameras with iPhone cam rotating it as shown on the pic above. As I can decompose this task, first I need to get real distance to the objects (walls in most cases, I think). Is it possible? Which algoritms/methods should I use to achieve this? I don't ask you to make the task for me obviously, but give me the direction, because I have no idea, maybe some equations/tutorials/algorithms with explanation to my case. Thank you!
The task of building a 3D model from multiple 2D images is called "scene reconstruction." It's still an active area of research, but solutions involve recognizing the same keypoint (e.g. a distinctive part of an object) in two images. Once you have that, you can use the known camera geometry to solve for the 3D position of that keypoint in the world.
Here's a reference:
http://docs.opencv.org/3.1.0/d4/d18/tutorial_sfm_scene_reconstruction.html#gsc.tab=0
You can google "scene reconstruction" to find lots more, and papers that go into more detail.

iOS:Which Augmented Reality SDK for virtual try room to be used?

I am working on iOS Augmented Reality project, Where i need to integrate virtual dressing concept.
I tried OpenCV, it worked as desired for me in Face Detection Scenario Only but when i did Upper Body Portion, That didn't work for me as desired.
I used UPPER_BODY_HAAR_CASCADE but it didn't work as it was desired
it came as something like
but my desired output is something like this
If someone has achieved this functionality in iOS, Please Reply me
Not exactly answer you are looking for. You make your app depending on the sdk you choose. Most of them are quite expensive to use and may suffer from changing the use policy. Additionally you drag all the extensive functionality you don't need into your app. So at the end of day your app is 60-100MB in size.
If I was you (and I was in similar situation), I would develop own little sdk with the functionality you need. If you know how to do it then it takes couple days for the basic things to work. Plus opencv and you are in good shape.
PS. #Tommy asked interesting question. How one can approach to implement something like on this video: youtube.com/watch?v=IBE11ROpxHE
Adding some info which is too long for comment.
#Tommy Nice video. It seems to have all we need to proceed. First of all, for any AR application you need your camera (mobile phone camera) calibration info. In simple case, it contains two matrixes: camera matrix and distortion matrix. Camera matrix is then used for creating opengl projection matrix (how the 3d model is projected to 2d flat screen, field of view, planes, etc). And distortions matrix is used for example, for warping parts of your input frame in case of detecting something. In the example with watches, we need to detect the belt and watches body in order to place the 3d model in that position. Given the paper watches is not having ideal perspective with 90 degrees angle to the eye, it needs to be transformed to this view.
In other words, your paper watches looks like this:
/---/
/ /
/---/
And for the analysis and detecting the model name you need it look like this:
---
| |
| |
---
This is where distortion matrix is used in order to have precise transformation. And different cameras have their own distortions.
Most of application use so called offline calibration. There is a chessboard and its feed into opencv functions that detect cells on series of frames with different perspective, and build the matrices based on how the cells are shaped.
In your case, the belt of your watch may be designed in a way that it will contain all the needed for online calibration. On your video it has special pattern, I'm pretty sure its done exactly for this purpose. You may do the same and use chessboard pattern for simplicity.
Then you could use lets say 25 first frames for online calibration and then having all the matrixes you go for detecting paper watches, building projection matrix and replace it with your 3d model. If all is done right then your paper watcthes will have coord 0 0 0 in 3d space and you could easily place something else in that position.

3D reconstruction -- How to create 3D model from 2D image?

If I take a picture with a camera, so I know the distance from the camera to the object, such as a scale model of a house, I would like to turn this into a 3D model that I can maneuver around so I can comment on different parts of the house.
If I sit down and think about taking more than one picture, labeling direction, and distance, I should be able to figure out how to do this, but, I thought I would ask if someone has some paper that may help explain more.
What language you explain in doesn't matter, as I am looking for the best approach.
Right now I am considering showing the house, then the user can put in some assistance for height, such as distance from the camera to the top of that part of the model, and given enough of this it would be possible to start calculating heights for the rest, especially if there is a top-down image, then pictures from angles on the four sides, to calculate relative heights.
Then I expect that parts will also need to differ in color to help separate out the various parts of the model.
As mentioned, the problem is very hard and is often also referred to as multi-view object reconstruction. It is usually approached by solving the stereo-view reconstruction problem for each pair of consecutive images.
Performing stereo reconstruction requires that pairs of images are taken that have a good amount of visible overlap of physical points. You need to find corresponding points such that you can then use triangulation to find the 3D co-ordinates of the points.
Epipolar geometry
Stereo reconstruction is usually done by first calibrating your camera setup so you can rectify your images using the theory of epipolar geometry. This simplifies finding corresponding points as well as the final triangulation calculations.
If you have:
the intrinsic camera parameters (requiring camera calibration),
the camera's position and rotation (it's extrinsic parameters), and
8 or more physical points with matching known positions in two photos (when using the eight-point algorithm)
you can calculate the fundamental and essential matrices using only matrix theory and use these to rectify your images. This requires some theory about co-ordinate projections with homogeneous co-ordinates and also knowledge of the pinhole camera model and camera matrix.
If you want a method that doesn't need the camera parameters and works for unknown camera set-ups you should probably look into methods for uncalibrated stereo reconstruction.
Correspondence problem
Finding corresponding points is the tricky part that requires you to look for points of the same brightness or colour, or to use texture patterns or some other features to identify the same points in pairs of images. Techniques for this either work locally by looking for a best match in a small region around each point, or globally by considering the image as a whole.
If you already have the fundamental matrix, it will allow you to rectify the images such that corresponding points in two images will be constrained to a line (in theory). This helps you to use faster local techniques.
There is currently still no ideal technique to solve the correspondence problem, but possible approaches could fall in these categories:
Manual selection: have a person hand-select matching points.
Custom markers: place markers or use specific patterns/colours that you can easily identify.
Sum of squared differences: take a region around a point and find the closest whole matching region in the other image.
Graph cuts: a global optimisation technique based on optimisation using graph theory.
For specific implementations you can use Google Scholar to search through the current literature. Here is one highly cited paper comparing various techniques:
A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms.
Multi-view reconstruction
Once you have the corresponding points, you can then use epipolar geometry theory for the triangulation calculations to find the 3D co-ordinates of the points.
This whole stereo reconstruction would then be repeated for each pair of consecutive images (implying that you need an order to the images or at least knowledge of which images have many overlapping points). For each pair you would calculate a different fundamental matrix.
Of course, due to noise or inaccuracies at each of these steps you might want to consider how to solve the problem in a more global manner. For instance, if you have a series of images that are taken around an object and form a loop, this provides extra constraints that can be used to improve the accuracy of earlier steps using something like bundle adjustment.
As you can see, both stereo and multi-view reconstruction are far from solved problems and are still actively researched. The less you want to do in an automated manner the more well-defined the problem becomes, but even in these cases quite a bit of theory is required to get started.
Alternatives
If it's within the constraints of what you want to do, I would recommend considering dedicated hardware sensors (such as the XBox's Kinect) instead of only using normal cameras. These sensors use structured light, time-of-flight or some other range imaging technique to generate a depth image which they can also combine with colour data from their own cameras. They practically solve the single-view reconstruction problem for you and often include libraries and tools for stitching/combining multiple views.
Epipolar geometry references
My knowledge is actually quite thin on most of the theory, so the best I can do is to further provide you with some references that are hopefully useful (in order of relevance):
I found a PDF chapter on Multiple View Geometry that contains most of the critical theory. In fact the textbook Multiple View Geometry in Computer Vision should also be quite useful (sample chapters available here).
Here's a page describing a project on uncalibrated stereo reconstruction that seems to include some source code that could be useful. They find matching points in an automated manner using one of many feature detection techniques. If you want this part of the process to be automated as well, then SIFT feature detection is commonly considered to be an excellent non-real-time technique (since it's quite slow).
A paper about Scene Reconstruction from Multiple Uncalibrated Views.
A slideshow on Methods for 3D Reconstruction from Multiple Images (it has some more references below it's slides towards the end).
A paper comparing different multi-view stereo reconstruction algorithms can be found here. It limits itself to algorithms that "reconstruct dense object models from calibrated views".
Here's a paper that goes into lots of detail for the case that you have stereo cameras that take multiple images: Towards robust metric reconstruction
via a dynamic uncalibrated stereo head. They then find methods to self-calibrate the cameras.
I'm not sure how helpful all of this is, but hopefully it includes enough useful terminology and references to find further resources.
Research has made significant progress and these days it is possible to obtain pretty good-looking 3D shapes from 2D images. For instance, in our recent research work titled "Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes With Deep Generative Networks" took a big step in solving the problem of obtaining 3D shapes from 2D images. In our work, we show that you can not only go from 2D to 3D directly and get a good, approximate 3D reconstruction but you can also learn a distribution of 3D shapes in an efficient manner and generate/synthesize 3D shapes. Below is an image of our work showing that we are able to do 3D reconstruction even from a single silhouette or depth map (on the left). The ground-truth 3D shapes are shown on the right.
The approach we took has some contributions related to cognitive science or the way the brain works: the model we built shares parameters for all shape categories instead of being specific to only one category. Also, it obtains consistent representations and takes the uncertainty of the input view into account when producing a 3D shape as output. Therefore, it is able to naturally give meaningful results even for very ambiguous inputs. If you look at the citation to our paper you can see even more progress just in terms of going from 2D images to 3D shapes.
This problem is known as Photogrammetry.
Google will supply you with endless references, just be aware that if you want to roll your own, it's a very hard problem.
Check out The Deadalus Project, althought that website does not contain a gallery with illustrative information about the solution, it post several papers and info about the working method.
I watched a lecture from one of the main researchers of the project (Roger Hubbold), and the image results are quite amazing! Althought is a complex and long problem. It has a lot of tricky details to take into account to get an approximation of the 3d data, take for example the 3d information from wall surfaces, for which the heuristic to work is as follows: Take a photo with normal illumination of the scene, and then retake the picture in same position with full flash active, then substract both images and divide the result by a pre-taken flash calibration image, apply a box filter to this new result and then post-process to estimate depth values, the whole process is explained in detail in this paper (which is also posted/referenced in the project website)
Google Sketchup (free) has a photo matching tool that allows you to take a photograph and match its perspective for easy modeling.
EDIT: It appears that you're interested in developing your own solution. I thought you were trying to obtain a 3D model of an image in a single instance. If this answer isn't helpful, I apologize.
Hope this helps if you are trying to construct 3d volume from 2d stack of images !! You can use open source tool such as ImageJ Fiji which comes with 3d viewer plugin..
https://quppler.com/creating-a-classifier-using-image-j-fiji-for-3d-volume-data-preparation-from-stack-of-images/

Satellite Map Analysis for Building Generation

Has anyone every heard of a program which analyses a satellite map and attempts to generate three dimensional buildings that roughly match the length/width of their real life counterparts?
The use in programs like Google Earth or FlightGear would be phenomenal.
Anybody heard of something like this already existing?
EDIT:
Any references to related work would be great as well!
This can be achieved using photogrammetry from stereo imagery (airborne or high-resolution satellite). Stereo imagery consists of a pair of registered images taken from slightly different angles or from different positions and can be used to calculate elevations very precisely. You can also derive information from building shadows if you know when and at what exact time the image was taken and have information on the sensor and image geometry.
Two other options would be 1) to use LIDAR (expensive, not readily available), or 2) to obtain shapefiles with building footprints and heights (sometimes available from local governments or other sources).
Stereo imagery can be a powerful resource to create 3D models. C3 Technologies developed a really interesting app for hitta.se:
Go to http://www.hitta.se/LargeMap.aspx
Click on 3D
Go to Stockholm
Zoom in, zoom in.. it takes a while to load
Really beautiful 3D models from stereo imagery

Resources