I am a beginner in image compression domain using Deep learning Autoencoders. I understood the concept of Bits per pixel of an image but I confused how to calculate it while performing image compression using autoencoders.
I have read a lot of research articles (eg: Variable Rate Deep Image Compression with a conditional autoencoder by Yoojin Choi et al. and many other similar articles) in the net which include the bpp values in them calculated from latent space representation for comparison between different autoencoder models but I could not find any straight forward way or description on how to calculate the bpp value while performing compression using autoencoders.
How to calculate Bits Per pixel in autoencoder using latent space representation?
I have spent weeks sorting it out but no luck still.
Kindly guide me in this. Thanks in advance
Related
I'm working on this lab where we need to apply a lossless predictive coding to an image before compressing it (with Huffman, or some other lossless compression algorithm).
From the example seen below, it's pretty clear that by pre-processing the image with predictive coding, we've modified its histogram and concentrated all of its grey levels around 0. But why exactly does this aid compression?
Is there maybe a formula to determine the compression rate of Huffman, knowing the standard deviation and entropy of the original image? Otherwise, why would the compression ratio be any different; it's not like the range of values has changed between the original image and pre-processed image.
Thank you in advance,
Liam.
I'm trying to develop a way to count the number of bright spots in an image. The spots should be gaussian point sources, but there is a lot of noise. There are probably on the order of 10-20 actual point sources in this image. My first though was to use a gaussian convolution with sigma = 15, which seems to do a good job.
First, is there a better way to isolate these bright spots?
Second, how can I 'detect' the bright spots, i.e. count them? I haven't had any luck with circular hough transforms from opencv.
Edit: Here is the original without gridlines, here is the convolved image without gridlines.
I am working with thermal infrared images which subject to quantity of noises.
I found that low rank based approaches such as approaches based on Singular Value Decomposition (SVD) or Weighted Nuclear Norm Metric (WNNM) give very efficient result in terms of reducing the noise while preserving the structure of the information.
Their main drawback is the fact they are quite slow to compute (several minutes per image)
Here is some litterature:
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7067415
https://arxiv.org/abs/1705.09912
The second paper has some MatLab code available, there is quite a lot of files but the translation to python is should not that complex.
OpenCV implement as well (and it is available in python) a very efficient algorithm on the Non-Local Means algorithm:
https://docs.opencv.org/master/d5/d69/tutorial_py_non_local_means.html
I trained a CNN (on tensorflow) for digit recognition using MNIST dataset.
Accuracy on test set was close to 98%.
I wanted to predict the digits using data which I created myself and the results were bad.
What I did to the images written by me?
I segmented out each digit and converted to grayscale and resized the image into 28x28 and fed to the model.
How come that I get such low accuracy on my data set where as such high accuracy on test set?
Are there other modifications that i'm supposed to make to the images?
EDIT:
Here is the link to the images and some examples:
Excluding bugs and obvious errors, my guess would be that your problem is that you are capturing your hand written digits in a way that is too different from your training set.
When capturing your data you should try to mimic as much as possible the process used to create the MNIST dataset:
From the oficial MNIST dataset website:
The original black and white (bilevel) images from NIST were size
normalized to fit in a 20x20 pixel box while preserving their aspect
ratio. The resulting images contain grey levels as a result of the
anti-aliasing technique used by the normalization algorithm. the
images were centered in a 28x28 image by computing the center of mass
of the pixels, and translating the image so as to position this point
at the center of the 28x28 field.
If your data has a different processing in the training and test phases then your model is not able to generalize from the train data to the test data.
So I have two advices for you:
Try to capture and process your digit images so that they look as similar as possible to the MNIST dataset;
Add some of your examples to your training data to allow your model to train on images similar to the ones you are classifying;
For those still have a hard time with the poor quality of CNN based models for MNIST:
https://github.com/christiansoe/mnist_draw_test
Normalization was the key.
Currently i am training small logo datasets similar to Flickrlogos-32 with deep CNNs. For training larger networks i need more dataset, thus using augmentation. The best i'm doing right now is using affine transformations(featurewise normalization, featurewise center, rotation, width height shift, horizontal vertical flip). But for bigger networks i need more augmentation. I tried searching on kaggle's national data science bowl's forum but couldn't get much help. There's code for some methods given here but i'm not sure what could be useful. What are some other(or better) image data augmentation techniques that could be applied to this type of(or in any general image) dataset other than affine transformations?
A good recap can be found here, section 1 on Data Augmentation: so namely flips, random crops and color jittering and also lighting noise:
Krizhevsky et al. proposed fancy PCA when training the famous Alex-Net in 2012. Fancy PCA alters the intensities of the RGB channels in training images.
Alternatively you can also have a look at the Kaggle Galaxy Zoo challenge: the winners wrote a very detailed blog post. It covers the same kind of techniques:
rotation,
translation,
zoom,
flips,
color perturbation.
As stated they also do it "in realtime, i.e. during training".
For example here is a practical Torch implementation by Facebook (for ResNet training).
I've collected a couple of augmentation techniques in my masters thesis, page 80. It includes:
Zoom,
Crop
Flip (horizontal / vertical)
Rotation
Scaling
shearing
channel shifts (rgb, hsv)
contrast
noise,
vignetting
For my Image Processing class project, I am filtering an image with various filter algorithms (bilateral filter, NL-Means etc..) and trying to compare results with changing parameters. I came across PSNR and SSIM metrics to measure filter quality but could not fully understand what the values mean. Can anybody help me about:
Does a higher PSNR value means higher quality smoothing (getting rid of noise)?
Should SSIM value be close to 1 in order to have high quality smoothing?
Are there any other metrics or methods to measure smoothing quality?
I am really confused. Any help will be highly appreciated. Thank you.
With respect to an ideal result image, the PSNR computes the mean squared reconstruction error after denoising. Higher PSNR means more noise removed. However, as a least squares result, it is slightly biased towards over smoothed (= blurry) results, i.e. an algorithm that removes not only the noise but also a part of the textures will have a good score.
SSIm has been developed to have a quality reconstruction metric that also takes into account the similarity of the edges (high frequency content) between the denoised image and the ideal one. To have a good SSIM measure, an algorithm needs to remove the noise while also preserving the edges of the objects.
Hence, SSIM looks like a "better quality measure", but it is more complicated to compute (and the exact formula involves one number per pixel, while PSNR gives you an average value for the whole image).
Expanding on #sansuiso's answer
There are a lot of others Image quality measures you can use to evaluate the de-noising capability of various filters in your case NL means , bilateral filter etc
Here is a chart that demonstrates the various parameters that could be used
Yes and more the PSNR better is the de- noising capability
Here is a paper where you can find the details regarding these parameters and the MATLAB codes could be found here
PSNR is the evaluation standard of the reconstructed image quality, and is important feature
The large the value of NAE means that image is poor quality
The large value of SC means that image is a poor quality.
Regarding this article:
http://icpr2010.org/pdfs/icpr2010_WeAT8.44.pdf
I found out that the PSNR can be obtained by SSIM and vice-versa. And PSNR is more sensitive to the noise than SSIM. By the other hand the other paramethers are almost equal in sensitivity by both: Gaussian Blur and discriminating Quality.