How can I scale model with 4 decimal digit in ARKit/SceneKit? - augmented-reality

I have a model with length as 182.925 meter I want to scale it to .0182 meter in ARKit. In the editor I cannot pass more than 3 decimal digit.
I want .0001 or similar in the scale.x field.

In ARKit, SceneKit and RealityKit a distance from objects and size of objects are measured in meters. I suppose it's quite logical that the smallest size here is mm (10th of centimetre). So, 1 mm precision is more than enough for AR and VR models.
If you need to downscale your model you'll get the following value:
0.018 m or 18 mm // in your case
You can't work with micrometer or nanometer precision in RealityKit and SceneKit because it 100% would cause issues and artifacts on rendered surfaces of shaders, and there would be unnecessary computations of light rays at such a microscopic scale.

Related

Camera calibration focal length twice as large as expected

I used the CameraCalibration node in Meshroom 2021.1.0 using a checkerboard grid to do a camera calibration. From what I understand, Meshroom is using OpenCV, so this question indirectly relates to the calibration process in OpenCV as well.
The lens I'm using is advertised as an 8mm lens, so I was expecting a focal length of something between 7 and 9 mm, but the fx value was 2541.273 and fy value was 2641.111 and I know the sensor pixel size is 6 microns, so when converting from pixels to mm, I'm getting focal lengths of 15.247 mm and 15.847 mm respectively which is right around double what I would expect.
The checkerboard I'm using has 50 mm squares, and I specified the size of the square in the camera calibration, and I double checked the printed dimensions with calipers. I also verified that the size of my images were full resolution compared to the expected size based on sensor specifications, so it wasn't a case where the resolution was half or double the original sensor size or something like that.
Curious if there is anything obvious I may have missed that would cause the focal length in the calibration to come out double what is expected.
I went through a similar calibration process with my smartphone and the camera I was testing with advertised a focal length of 7 mm and the camera calibration returned in that case an fx of 7.21 mm and an fy of 7.20 mm. The only difference was the grid I was using in that test was using 30 mm squares and was 7 x 5 instead of 4 x 3, but the process to get those values was essentially the same.
Update:
I reran a camera calibration with a different set of images, and this time I got an fx of 23.07 mm and fy of 23.23 mm, so it would seem that the previous run that was off by a factor of 2 may have been a coincidence that it was off by 2. Given how inconsistent the focal length values are from one run to the next and how far off they are from expected values, I'm guessing that the errors that I'm seeing are due to poor calibration images being used in the process? The camera is fixed, so I'm moving the checkerboard on a surface, so mostly in a single plane. To get a good calibration do I just need a better variety of orientations that the checkerboard is captured in like different distances and different angles?
Is the size of the grid just too small for the field of view to get good calibration values from it? I calibrated with 80 calibration shots similar to the two above moving the board from one edge to the other.
I got a larger calibration target using the ChAruco pattern, and it looks like the values are more stable now, but every now and then if I repeat the calibration, I can get very far out numbers. Should the board below be large enough to get stable calibration values?

Required tolerance for camera calibration target

In reading about and experimenting with camera calibration I haven't seen any mention of the required tolerance for the placement of calibration targets. For example say I have a field of view of 200mm x 30mm and I want to be able to measure the position of objects in this field to within 1mm. I will calibrate my camera using a grid pattern and the OpenCV calibrateCamera flow. Say my calibration target is a printed chessboard grid with 5mm pitch. What is the tolerance on that 5mm spacing between corners on my target? Does a tighter tolerance result in more accurate pixel to real-world transformation? Does a tighter tolerance result in better distortion removal?
Note I'm measuring objects on a 2D plane, no depth measurement, and unfortunately I don't have the ability to move the calibration targets around and take multiple views of it. So I'm talking specifically about calibrating using a single view.
Calibration using a single view is a poor idea, generally speaking, because of the small number of independent samples it entails, so it is possible that tolerance on the calibration grid manufacture be the least of your worries. But if you must...
The controlling factor here is the sensor's dot pitch. Given the nominal focal length of your lens, and that you want your calibration RMSE to be order of a few tenths of pixel, you can work out the angle spanned by, say, 1/10 of a pixel along the sensor's horizontal axis. Back projecting that at the nominal distance between the lens's exit pupil and the target will give you a length in 3D world that measures the uncertainty in a target's corner location at the calibration optimum. Your physical target points should be known at least as accurately, and normally better.
Example:
Setup: Dot pitch 5um, 16mm focal lens, 200mm working distance to target.
Backprojected 1/10 pixel: 200/16*0.5um =~ 6um.
Backprojected 1/2 pixel : 200/16*2.5um =~ 31um.
You can loosen that if you assume perfect Chi-square scaling of the errors with the square root of the number of the data points. If you have, say, 100 corners, you can multiply that by 10, i.e. ~ 300um for 1/2 pixel
Note that with this kind of tolerances temperature control (for camera and target) may become a factor to keep into account.

How does image digitalization differ from sound digitalization (PCM)?

I am trying to understand digitalization of sound and images.
As far as I know, they both need to convert analog signal to digital signal. Both should be using sampling and quantization.
Sound: We have amplitudes on axis y and time on axis x. What is on axis x and y during image digitalization?
What is kind of standard of sample rate for image digitalization? It is used 44kHz for CDs (sound digitalization). How exactly is used sample rate for images?
Quantization: Sound - we use bit-depth - which means levels of amplitude - Image: using bit-depth also, but it means how many intesities are we able to recognize? (is it true?)
What are other differences between sound and image digitalization?
Acquisition of images can be summarized as a spatial sampling and conversion/quantization steps. The spatial sampling on (x,y) is due to the pixel size. The data (on the third axis, z) is the number of electrons generated by photoelectric effect on the chip. These electrons are converted to ADU (analog digital unit) and then to bits. What is quantized is the light intensity in level of greys, for example data on 8 bits would give 2^8 = 256 levels of gray.
An image loses information both due to the spatial sampling (resolution) and the intensity quantization (levels of gray).
Unless you are talking about videos, images won't have sampling in units of Hz (1/time) but in 1/distance. What is important is to verify the Shannon-Nyquist theorem to avoid aliasing. The spatial frequencies you are able to get depend directly on the optical design. The pixel size must be chosen respectively to this design to avoid aliasing.
EDIT: On the example below I plotted a sine function (white/black stripes). On the left part the signal is correctly sampled, on the right it is undersampled by a factor of 4. It is the same signal, but due to bigger pixels (smaller sampling) you get aliasing of your data. Here the stripes are horizontal, but you also have the same effect for vertical ones.
There is no common standard for the spatial axis for image sampling. A 20 megapixel sensor or camera will produce images at a completely different spatial resolution in pixels per mm, or pixels per degree angle of view than a 2 megapixel sensor or camera. These images will typically be rescaled to yet another non-common-standard resolution for viewing (72 ppi, 300 ppi, "Retina", SD/HDTV, CCIR-601, "4k", etc.)
For audio, 48k is starting to become more common than 44.1ksps. (on iPhones, etc.)
("a nice thing about standards is that there are so many of them")
Amplitude scaling in raw format also has no single standard. When converted or requantized to storage format, 8-bit, 10-bit, and 12-bit quantizations are the most common for RGB color separations. (JPEG, PNG, etc. formats)
Channel formats are different between audio and image.
X, Y, where X is time and Y is amplitude is only good for mono audio. Stereo usually needs T,L,R for time, left, and right channels. Images are often in X,Y,R,G,B, or 5 dimensional tensors, where X,Y are spatial location coordinates, and RGB are color intensities at that location. The image intensities can be somewhat related (depending on gamma corrections, etc.) to the number of incident photons per shutter duration in certain visible EM frequency ranges per incident solid angle to some lens.
A low-pass filter for audio, and a Bayer filter for images, are commonly used to make the signal closer to bandlimited so it can be sampled with less aliasing noise/artifacts.

Is there a way to find mm per pixel value for a camera?

I need to implement dimension inspection of an object with a tolerance of 20 microns using image processing. To measure the dimension in mm, i need the mm per pixel value for pixel to mm conversion.
Camera and lens Specifications:
5 MP Matrix vision camera (2592 x 1944)
25 mm lens
How i tried to do it:
I used a 30 cm ruler to get the actual field of view in mm covered by the camera.I got a plot of the image using Matplotlib function in OpenCV as shown in the fig.
Image for scaling
From the image i got 31 mm as the actual width covered by the camera and the camera resolution is 2592 x 1944. So i obtained mm/pixel = 31/2952 = 0.011959876.
But i want to know if it is the correct way to find the mm/pixel value using a centimeter scale specially when tolerance of 20 micron is needed in dimension inspection. If this is not the correct way, then a solution procedure for finding mm/pixel value would be really helpful.
I believe what you are doing really borderline. First of all, to be as precise as possible I would use the right (or left) edge of the most left and most right ruler ticks like I sketched here:
and then use this value in pixel to calculate the mm/pixel calibration value. Even using this method 20 mu is really tough to achieve. Let's say we can determine the ruler tick edge position with a precision of 2 pixels (very optimistic) then you would have an error of about 31mm/2580 * 2, which is about 25 mu.
If you really need the 20mu calibration precision I would go for a microscope calibration target. I've been always used one of those for this kind of calibration task.
20 microns over a field of view of 31 mm = 31000 µm corresponds to 1.7 pixel, so your measurement error must be smaller than that. This is a stringent requirement. Your ruler and manual operation are not appropriate.
In the first place, you should check the magnitude of the lens distortion, which could very well exceed these 1.7 pixels. You will need a precise calibration procedure that can fit a deformation model to the image. For this purpose you should use a certified calibration target such as grid of dots or a chessboard pattern.
At the same time as the calibration software measures and compensates the distortion, it will provide the scale factor between physical units (knowing the grid spacing) and pixels. You can measure feature location on the target by blob analysis or gauging techniques, then use least-squares fitting of a model.
Software packages made for machine vision applications do contain such tools.
Also be aware that there can be a bias in the dimensional measurement of the object due to mis-location of the edges. Simply moving the light source can result in variations of the measured size.
If your objects are always the same and at the same place in the field of view, a cheap solution is to establish a repeatable measurement procedure in pixels, and physically measure one of the parts. This will give you a scale factor valid in the same conditions.
But simply moving the object will have a noticeable effect, both by changing the light reflection/shadows on edges and by having a different distortion.

Relationship of standard deviation for Gaussian filter between pixel domain and the real world

I constructed an experiment with Gaussian blur in real world and MR images. I printed some test images blurred and compare augmented images blurred too.
What is the best way to express how much blurring I applied in real-world coordinates?
The image is 2560x1440 pixels, corresponding to 533x300 cm in the real world. If this image is blurred with a Gaussian with standard deviation n (filter size is ceil(3 * n) * 2 + 1), how can this be expressed in centimeters? Is it reasonable to express it as the real size of the filter in centimeters?
In short, yes, it is perfectly reasonable to express the size of the kernel in real-world coordinates.
In your case, you have 533 cm == 2560 pixels horizontally, which is 0.2082 cm per pixel. (Please edit if the question has a mistake and this should be mm instead of cm.) Vertically you have approximately the same, so we can assume isotropic sampling and leave it at 0.208 cm/px.
Given that pixel size, a standard deviation of the Gaussian of n is equivalent to a standard deviation of 0.208*n cm in the real world.

Resources