I have code to recreate a high quality image at a very poor quality but I am a bit confused by the results I am seeing. The code is here:
NSData *compressedData = UIImageJPEGRepresentation(bigImage,0);
NSLog(#"compressedData length = %d", [compressedData length]);
self.currentImage = [UIImage imageWithData:compressedData];
NSData *dataAfterCompression = UIImageJPEGRepresentation(self.currentImage,1);
NSLog(#"dataAfterCompression length = %d", [dataAfterCompression length]);
Which outputs:
2012-01-02 02:47:05.615 MyApp[349:707] compressedData length = 32671
2012-01-02 02:47:06.143 MyApp[349:707] dataAfterCompression length = 251144
Why would creating a new image with the low quality data result in such a large image?
The input to UIImageJPEGRepresentation is a 2-D array of pixels, not the compressed data from which that 2-D array was created. (The array might not have come from compressed data at all!) UIImageJPEGRepresentation doesn't know anything about where the image came from, and it doesn't have any concept of the "quality" of its input. It just knows that you want it to try very hard to make the output small (when compressionQuality is zero) or you want it to try very hard to make the output accurate (when compressionQuality is one).
The JPEG compression algorithm has some tunable parameters. The compressionQuality value selects a set of those parameters. When you set compressionQuality to 1, the compressor uses a set of parameters that allow very little loss of accuracy of the input data, regardless of what that input data actually is. For any input data, those parameters result in very little loss of accuracy. The tradeoff is that those parameters also result in very little compression. The compressor doesn't then think "Hmm, I should try using other parameters and see if I get the same accuracy with better compression". If that's what you want, you have to do it yourself.
I am not sure if this is exactly what you are looking for, but this blog has some classes that extend UIImage and it deals with compression quite nice. Check out this website:
http://vocaro.com/trevor/blog/2009/10/12/resize-a-uiimage-the-right-way/
Related
My scenario, I am trying to get Image data for upload to server. Here, I am getting huge data string length. Is there anything possibility to get short string without reducing image file resolution.
My Code for Image Data
let image = self.attachment_one_img.image
let imageData = image?.jpegData(compressionQuality: 1)
let base64String = (imageData)?.base64EncodedString(options: NSData.Base64EncodingOptions(rawValue: 0))
let trimmedString_one = base64String?.trimmingCharacters(in: .whitespaces)
Print(trimmedString_one) // I am getting huge data string length
Base-64 adds approximately 33% overhead to whatever its input data is. The only way to make it smaller is shrink your input data.
You can use other ASCII-based encodings, such as Ascii85, which have a little less overhead, but the encoded data will always be larger than the input data, because you're using fewer bits-per-byte to hold it (fewer bits-per-byte means more bytes for the same number of input bits). Since image data is typically already well compressed, it cannot be sent in significantly fewer bytes types than its data representations.
If it is JPEG data, you can reduce the quality rather than the resolution. Using a compression quality of 1 is generally excessive, and you should consider PNG rather than JPEG if you're looking for lossless compression. (JPEG is intended for photographs, and is very good at compressing them. PNG is generally better for line-art and other things with very sharp color transitions.) After ensuring your resolution is no higher than needed, tuning the quality value is the best tool for managing size.
I am trying to train a CNN in caffe. I wanted to do a lot of data augmentation, so I'm using a "Python" layer for input, as suggested here.
However, I see from the log that Caffe is using the datatype float32 for all my data. This is really wasteful, because I'm only dealing with 8-bit integers. Is there a way to tell caffe to use dtype='uint8'?
I have tried to typecast the data while setting the top:
top[0].data[...] = someArray.astype(np.uint8, copy=False)
but this doesn't work.
Any suggestions?
AFAIK, caffe is currently compiled to support only float32 or float64. I suppose lmdb/leveldb data can be stored in uint8 format, but caffe converts it internally to float32 upon reading.
The fact that your input data is uint8 does not mean the entire processing remains this way, at the first convolution/InnerProduct layer, data is multiplied with float numbers and can no longer be guaranteed to remain uint8.
So, I suppose you should put up with the little space waste at the input layer and give up the conversion to uint8.
Say, I have a sequence on .dicom files in a folder. The cumulative size is about 100 Mb. It's a lot of data. I tried to convert data into .nrrd and .nii, but those files had the summary size of the converted .dicom files (which is fairly predictable, though .nrrd was compressed with gzip). I'd like to know, if there a file format that would give me far less sizes, or just a way to solve that. Perhaps, .vtk, or something else (not sure it qould work). Thanks in advance.
DICOM supports compression of the pixel data within the file itself. The idea of DICOM is that it's format agnostic from the point of view of the pixel data it holds.
DICOM can hold raw pixel data and also can hold JPEG-compressed pixel data, as well as many other formats. The transfer syntax tag of the DICOM file gives you the compression protocol of the pixel data within the DICOM.
The first thing is to figure out whether you need lossless or lossy compression. If lossy, there are a lot of options, and the compression ratio is quite high in some - the tradeoff is that you do lose fidelity and the images may not be adequate for diagnostic purposes. There are also lossless compression schemes - like JPEG2000, RLE and even JPEG-LS. These will compress the pixel data, but retain diagnostic quality without any image degradation.
You can also zip the files, which, if raw, should produce very good results. What are you looking to do w/ these compressed DICOMs?
I am using the Jtransforms library which seems to be wicked fast for my purpose.
At this point I think I have a pretty good handle on how FFT works so now I am wondering if there is any form of a standard domain which is used for audio visualizations like spectograms?
Thanks to android's native FFT in 2.3 I had been using bytes as the range although I am still unclear as to whether it is signed or not. (I know java doesn't have unsigned bytes, but Google implemented these functions natively and the waveform is PCM 8bit unsigned)
However I am adapting my app to work with mic audio and 2.1 phones. At this point having the input domain being in the range of bytes whether it is [-128, 127] or [0, 255] no longer seems quite optimal.
I would like the range of my FFT function to be [0,1] so that I can scale it easily.
So should I use a domain of [-1, 1] or [0, 1]?
Essentially, the input domain does not matter. At most, it causes an offset and a change in scaling on your original data, which will be turned into an offset on bin #0 and an overall change in scaling on your frequency-domain results, respectively.
As to limiting your FFT output to [0,1]; that's essentially impossible. In general, the FFT output will be complex, there's no way to manipulate your input data so that the output is restricted to positive real numbers.
If you use DCT instead of FFT your output range will be real. (Read about the difference and decide if DCT is suitable for your application.)
FFT implementations for real numbers (as input domain) use half the samples for the output range (since there are only even results when the input is real), therefore the fact you have both real and imaginary parts for each sample doesn't effect the size of the result (vs the size of the source) much (output size is ceil(n/2)*2).
I have an embedded application where an image scanner sends out a stream of 16-bit pixels that are later assembled to a grayscale image. As I need to both save this data locally and forward it to a network interface, I'd like to compress the data stream to reduce the required storage space and network bandwidth.
Is there a simple algorithm that I can use to losslessly compress the pixel data?
I first thought of computing the difference between two consecutive pixels and then encoding this difference with a Huffman code. Unfortunately, the pixels are unsigned 16-bit quantities so the difference can be anywhere in the range -65535 .. +65535 which leads to potentially huge codeword lengths. If a few really long codewords occur in a row, I'll run into buffer overflow problems.
Update: my platform is an FPGA
PNG provides free, open-source, lossless image compression in a standard format using standard tools. PNG uses zlib as part of its compression. There is also a libpng. Unless your platform is very unusual, it should not be hard to port this code to it.
How many resources do you have available on your embedded platform?
Could you port zlib and do gzip compression? Even with limited resources, you should be able to port something like LZ77 or LZ88.
There are a wide variety of image compression libraries available. For example, this page lists nothing but libraries/toolkits for PNG images. Which format/library works best for you will most likely depend on the particular resource constraints you are working under (in particular, whether or not your embedded system can do floating-point arithmetic).
The goal with lossless compression is to be able to predict the next pixel based on previous pixels, and then to encode the difference between your prediction and the real value of the pixel. This is what you initial thought to do, but you were only using the one previous pixel and making the prediction that the next pixel would be the same.
Keep in mind that if you have all of the previous pixels, you have more relevant information than just the preceding pixel. That is, if you are trying to predict the value of X, you should use the O pixels:
..OOO...
..OX
Also, you would not want to use the previous pixel, B, in the stream to predict X in the following situation:
OO...B <-- End of row
X <- Start of next row
Instead you would make your prediction base on the Os.
How 'lossless' do you need?
If this is a real scanner there is a limit to the bandwidth/resolution so even if it can send +/-64K values it may be unphysical for adjacent pixels to have a difference of more than say 8 bits.
In which case you can do a start pixel value for each row and then do differences between each pixel.
This will smear out peaks but it may be that any peaks more than 'N'bits are noise anyway.
A good LZ77/RLE hybrid with bells and wwhistles can get wonderful compression that is fairly quick to decompress. They will also be bigger, badder compressors on smaller files due to the lack of library overhead. For a good, but GPLd implentation of this, check out PUCrunch