Incrementally reconstruct a high-resolution image from a sequence of low-resolution images - image-processing

I have a high resolution image that I need to transmit over low-bandwidth, lossy comms. I need to break the high-resolution image down into a sequence of much lower resolution images (of order 64 bytes). I want to transmit these low resolution images one at a time, and incrementally reconstruct a higher-resolution image (so I want to minimise redundancy between the low-resolution images).
I sense a Fourier transform in my future.
my daily driver is Python, in case there's an existing solution to this problem.

Related

TensorFlow for image recognition, size of images

How can size of an image effect training the model for this task?
My current training set holds images that are 2880 X 1800, but I am worried this may be too large to train. In total my sample size will be about 200-500 images.
Would this just mean that I need more resources (GPU,RAM, Distribution) when training my model?
If this is too large, how should I go about resizing? -- I want to mimic real-world photo resolutions as best as possible for better accuracy.
Edit:
I would also be using TFRecord format for the image files
Your memory and processing requirements will be proportional to the pixel size of your image. Whether this is too large for you to process efficiently will depend on your hardware constraints and the time you have available.
With regards to resizing the images there is no one answer, you have to consider how to best preserve information that'll be required for your algorithm to learn from your data while removing information that won't be useful. Reducing the size of your input images won't necessarily be a negative for accuracy. Consider two cases:
Handwritten digits
Here the images could be reduced considerably in size and maintain all the structural information necessary to be correctly identified. Have a look at the MNIST data set, these images are distributed at 28 x 28 resolution and identifiable to 99.7%+ accuracy.
Identifying Tree Species
Imagine a set of images of trees where individual leaves could help identify species. Here you might find that reducing the image size reduces small scale detail on leaf shape in a way that's detrimental to the model, but you might find that you get a similar result with a tight crop (which preserves individual leaves) rather than an image resize. If this is the case you may find that creating multiple crops from the same image gives you an augmented data set for training that considerably improves results (which is something to consider, if possible, given your training set is very small)
Deep learning models are achieving results around human level in many image classification tasks: if you struggle to identify your own images then it's less likely you'll train an algorithm to. This is often a useful starting point when considering the level of scaling that might be appropriate.
If you are using GPUs to train, this will def affect your training time. Tensorflow does most of the GPU allocation so you don't have to worry about that. But with big photos you will be experiencing long training time although your dataset is small. You should consider data-augmentation.
You could complement your resizing with the data-augmentation. Resize in equal dimensions and then perform reflection and translation (as in geometric movement)
If your images are too big, your GPU might run out of memory before it can start training because it has to store the convolution outputs on its memory. If that happens, you can do some of the following things to reduce memory consumption:
resize the image
reduce batch size
reduce model complexity
To resize your image, there are many scripts just one Google search away, but I will add that in your case 1440 by 900 is probably a sweet spot.
Higher resolution images will result in a higher training time and an increased memory consumption (mainly GPU memory).
Depending on your concrete task, you might want to reduce the image size in order to therefore fit a reasonable batch size of let's say 32 or 64 on the GPU - for stable learning.
Your accuracy is probably affected more by the size of your training set. So instead of going for image size, you might want to go for 500-1000 sample images. Recent publications like SSD - Single Shot MultiBox Detector achieve high accuracy values like an mAP of 72% on the PascalVOC dataset - with "only" using 300x300 image resolution.
Resizing and augmentation: SSD for instance just scales every input image down to 300x300, independent of the aspect ratio - does not seem to hurt. You could also augment your data by mirroring, translating, ... etc (but I assume there are built-in methods in Tensorflow for that).

Does image size matter when training with TensorFlow?

I was wondering if there is any benefit to training on high resolution images rather than low resolution. I understand that it will take longer to train on larger images and that the dimensions must be a multiple of 32. My current image set is 1440x1920. Would I be better off resizing to 480x640, or is bigger better?
It's certainly not a requirement that your images be powers of two. There may be some cases where it speeds things up (e.g. GPU allocation) but it's not critical.
Smaller images will train significantly faster, and possibly even converge quicker (all other factors held constant) as you will be able to train on bigger batches (e.g. 100-1000 images in one pass, which you might not be able to do on a single machine with high res imagery).
As to whether to resize, you need to ask yourself if every pixel in that image is critical to your task. Often this is not the case - you can probably resize a photo of a bus down to say 128x128 and still recognize that it's a bus.
Using smaller images can also help your network generalise better, too, as there is less data to overfit.
A technique often used in image classification networks is to perform distortions (e.g. random cropping, scaling & brightness adjustment) on images to (a) convert odd-sized images to a constant size, (b) synthesize more data and (c) encourage the network to generalise.
This depends largely on the application. As a rule of thumb, I'd ask myself the question: can I complete the task myself on the resized images? If so, I'd downsize to the lowest resolution before it makes the task more difficult for you yourself. If not... you're going to have to be -very- patient using images 1440 * 1920. I imagine you'll almost always be better off experimenting with more varied architectures and hyper-parameter sets on smaller images compared to fewer models on full resolution images.
Whatever size you choose, you'll have to design your network for the image size you have in mind. If you're using convolutional layers, a larger image will require larger strides, filter sizes and/or layers. The number of parameters will stay the same for each convolution, though the number of features will grow (along with batch normalisation parameters if you're using it).

JPG compression and noise

I am studying jpeg compression and it seems to work by reducing high frequency components in images. Since noise is usually high frequency, does this imply that jpeg compression somewhat works on reducing noise in images?
JPEG compression can reduce noise by smoothing out the high-frequency components of the image, but it also introduces visual noise in the form of compression artifacts. Here is a zoomed-in (3x) view of part of my avatar (a high-quality JPEG) and part of your avatar (a PNG drawing), on the left as downloaded and on the right as compressed with ImageMagick using -quality 60. To my eye they both look "noisier" when JPEG-compressed.
Strictly speaking, no.
JPEG does remove high frequencies (see below), but not selectively enough to be a denoising algorithm. In other words, it will remove high frequencies if they are noise, but also if they are useful detail information.
To understand this, it helps to know the basics of how JPEG works. First, the image is divided in 8x8 blocks. Then the discrete cosine transform (DCT) is applied. As a result, each element of the 8x8 block contains the "weight" of a different frequency. Then the elements are quantized in a fixed way depending on the quality level selected a priori. This quantization means gaining coding performance at the cost of losing precision. The amount of precision lost is fixed a priori, and (as I said above) it does not differenciate between noise and useful detail.
You can test this yourself by saving the same image with different qualities (which technically control the amount of quantization applied to each block) and see that not only noise is removed. There is a nice video showing this effect for different quality levels here: https://upload.wikimedia.org/wikipedia/commons/f/f3/Continuously_varied_JPEG_compression_for_an_abdominal_CT_scan_-_1471-2342-12-24-S1.ogv.

Do PVR textures consume less RAM?

It is my understanding that PVR textures, made with texturetool, are simply compressed images. Therefore the difference lies in the file size.
Frankly, the file size doesn't interest me. What I want to know is, can a PVR texture consume less RAM than a normal .PNG texture? Or does this depend entirely on the texture format (like RGBA8888 etc)?
The essential question would be:
Given X.png and X.pvr, if I display both with texture format RGBA8888, will one consume less RAM than the other?
Yes, the PVR will consume less RAM at all stages — it's unpacked live by the GPU as it's accessed. There's no intermediate decompression.
A PVR-like approach used in digital video is that instead of storing RGB at every pixel, convert to YUV, then store Y at every pixel and U and V only twice per four-pixel block. So you go from 128 bits for the block to 64 bits. To get back to RGB the outputter reads the exactly correct Y and interpolates or accesses the most nearby U and V as necessary.
Schemes like PVR do a similar thing of not storing the full value at every pixel but inferring parts of it from nearby context. What counts as nearby is picked directly corresponding to however the caching is arranged on that GPU. It's usually more in-depth that just scaling down the sampling resolution of some of the channels, e.g. specifying a base offset for samples and then using a tiny precision for each is also common.
So the GPU can always get a value for pixel X by reading only values in a very small, local region of the data.
This contrasts with traditional schemes like PNG where having to know every pixel in the stream prior to X is acceptable if it improves the compression. Processing such things live would flood the GPU's memory bandwidth and hence be completely impractical, so such textures are decompressed from disk and then uploaded.
So schemes like PVR tend to lead to poorer compression and lower per-pixel quality but the win is that they can sit in VRAM compressed. A game will often increase the resolution of its textures if using PVR to try to find a comfortable balance.
Uncompressed textures are:
16 bit per pixel (RGB565, RGBA4444),
24 bit per pixel (RGB888)
PVRTC textures are either 4bpp or even 2bpp. So yes they do use less memory.
Also they perform better because need less memory bandwidth to fetch textures.

Best way to store motion changes to reduce memory

I am comparing jpeg to jpeg in a constant 'video-stream'. i am using EMGU/OpenCV to compare each pixels at the byte level. There are 3 channels to each image (RGB). I had heard that it is common practice to store only the pixels that have changed between frames as a way of conserving memory space. But, if for instance/example I say EVERY pixel has changed (pls note i am using an exaggerated example to make my point and i would normally discard such large changes) then the resultant bytes saved is 3 times larger than the original jpeg.
How can I store such motion changes efficiently?
thanks
While taking the consecutive images the camera might also move or not. If the camera is fixed, only the items on the view move and some portion of the image changes every time. If the camera also moves, even if the objects stand still, the image changes significantly. There are some algorithms to discard the effect of the motion of the camera. So the main idea is when compared with the sampling frequency of the camera (e.g. 25 frames per second) most of the objects nearly standing still.
Because most of the image is unchanged between the frames, it becomes feasible to use difference of the images. It provides some compression ratios. However after some amount of time the newly received image shows big difference with the reference image, so it becomes better to get a new image reference. Which is named a "reference frame".
In fact, modern video compression algorithms uses advanced techniques to detect the objects and follow them, which results better compression ratios.
Wikipedia - Different compression techniques
Check This - OpenCV should handle the storing of consecutive images in different video formats.

Resources