For the purpose of identifying and comparing JPG images taken from cameras I want to calculate a MD5 hash of the scan portion of the image inside the JPG. My idea is to take the bytes between the SOS and the EOI marker and perform a hash on those bytes based on the assumption that these bytes will never change unless the actual image is processed and altered.
Apparently this question has come up already several times 1,2, 3. Rather complicated solutions have been suggested, a fact that I find irritating looking at my rather simple but apparently effective approach. (Or is it too simple to be true?)
I know there can be multiple pairs of SOS ($FFDA) and EOI ($FFD9) in a JPG file, in my present files there are 3: A thumbnail, the actual image and an additional 1920x1080 image (Sony). My present approach is to parse the stream and locate the next SOS, then look for EOI, calculate the size and assume the actual image if the size exceeds 50% of the file size.
This approach works with my present files. I stripped all metadata from a JPG file with exiftool -all= image.jpg and found the MD5 hash to be identical. Yet the algorithm seems rather coarse to me.
So here are my questions:
Is there any risk that simply examining the space between SOS and EOI can fail? I have read this, but am still not sure.
Parsing every byte from the SOS of the actual image takes a lot of time. I take it from here that there is no shortcut to finding the end of the compressed data. But I might just leap forward 80% or so from the second SOS marker. I am talking about images from a camera - how much can I rely on the fact that there will be a thumbnail coming first and the actual image after it?
Should I start 6 Bytes after SOS (here?)
Any ideas for a better approach?
After doing some research and running an bunch of tests here I present my solution to my question.
First, I want to make clear that we are not talking about a forensic investigation. There are possibly ways to manipulate a JPG image in a way that markers appear where they shouldn't and do not appear where would have to according to the specs.
We are not talking about image identity or similarity, either. If you losslessly rotate a JPG you still have the very same image information, but not the identical image any more. We're not talking, either, about images that have been resized, optimized or altered in any other way.
What we are talking about is identifying simple duplicates or JPGs that have been renamed or where metadata has been modified or removed, but where the image itself has never been processed or tampered with in any way.
Is a hash of the bytes between the SOS and the EOI markers a reliable way to uniquely identify an image?
Yes, it is. Within bounds of reason there is no way two files with identical MD5 checksums of the image scan data can contain non-identical images and vice versa.
I examined sample photos taken with cameras from 12 different makers and edited/stripped the metadata. Actually, this wasn't really necessary, because from the specs and the code you know that all metadata resides in separate blocks (that's why you can hide all kind of stuff in a JPG) and the scan data will never be touched by metadata operations, but yes, identical MD5 checksums all over the place.
Is there any way to quickly locate the (right) SOS marker?
Definitely. The JPG specs are a mess and a punishment. After trying quite a few pieces of code I found NativeJPG by Nils Haeck to be the most straightforward.
This has been adapted from sdJpegImage:
function FindSOSPos(S: TStream): Cardinal;
var
B, MarkerTag, BytesRead: byte;
Size,W: word;
const
mkNone = 0; mkSOF0 = $c0; mkSOF1 = $c1; mkSOF2 = $c2; mkSOF3 = $c3; mkSOF5 = $c5;
mkSOF6 = $c6; mkSOF7 = $c7; mkSOF9 = $c9; mkSOF10 = $ca; mkSOF11 = $cb; mkSOF13 = $cd;
mkSOF14 = $ce; mkSOF15 = $cf; mkDHT = $c4; mkDAC = $cc; mkSOI = $d8; mkEOI = $d9; mkSOS = $da;
mkDQT = $db; mkDNL = $dc; mkDRI = $dd; mkDHP = $de; mkEXP = $df; mkAPP0 = $e0; mkAPP15 = $ef; mkCOM = $fe;
begin
Repeat
Result := 0;
// Read markers from the stream, until a non $FF is encountered
If S.Read(B, 1) = 0 then
exit;
// Do we have a marker?
if B = $FF then
begin
BytesRead := S.Read(MarkerTag, 1);
while (BytesRead > 0) and (MarkerTag = $FF) do
begin
MarkerTag := mkNone;
BytesRead := S.Read(MarkerTag, 1);
end;
Size := 0;
if MarkerTag in [mkAPP0..mkAPP15, mkDHT, mkDQT, mkDRI,
mkSOF0, mkSOF1, mkSOF2, mkSOF3, mkSOF5, mkSOF6, mkSOF7, mkSOF9, mkSOF10, mkSOF11, mkSOF13, mkSOF14, mkSOF15,
mkCOM, mkDNL] then
begin
// Read length of marker
If S.Read(W, 2) = 2 then
Size := Swap(W) - 2
else exit;
end else
If MarkerTag = mkSOS
then break;
S.Position := S.Position + Size;
end else
begin
// B <> $FF is an error, we try to be flexible
repeat
BytesRead := S.Read(B, 1);
until (BytesRead = 0) or (B = $FF);
if BytesRead = 0 then
exit;
S.Seek(-1, soFromCurrent);
end;
Until (MarkerTag = mkSOS) or (MarkerTag = mkNone);
Result := S.Position;
end;
Omit the first 6 Bytes after the SOS marker?
I decided to hash everything between SOS and EOI excluding the markers themselves.
Is there a fast way to locate the trailing EOI marker?
No. But this is irrelevant, since for performing a hash you have to read every single byte anyway.
How reliable is this approach?
As I said, I believe that within bounds of reason the chance that this approach will render no false positives is practically 100%. As to locating the right image: NativeJPG has been around for more than 10 years and you find very few complaints, if any they deal with decoding the image, not missing it.
In my application I offer the option to store the original filename, the EXIF DateTimeDigitized, the camera make, the GPS coordinates and MD5 hashes of the scan data (full and first 16 kB) in the UserComment field. I'm pretty confident that this will allow to lateron identify the file under most conditions (if the UserComment has remained intact).
Is there any risk that simply examining the space between SOS and EOI can fail?
Yes, for your purpose if you are only doing a checksum of the scan data. There could be multiple SOS markers and other markers in between them.
Related
I'm trying to satisfy myself that METEOSAT images I'm getting from their FTP server are actually valid images. My doubt arises because all the tools I've used so far complain about "Bogus Huffman table definition" - yet when I simply comment out that error message, the image appears quite plausible (a greyscale segment of the Earth's disc).
From https://github.com/libjpeg-turbo/libjpeg-turbo/blob/jpeg-8d/jdhuff.c#L379:
while (huffsize[p]) {
while (((int) huffsize[p]) == si) {
huffcode[p++] = code;
code++;
}
/* code is now 1 more than the last code used for codelength si; but
* it must still fit in si bits, since no code is allowed to be all ones.
*/
if (((INT32) code) >= (((INT32) 1) << si))
ERREXIT(cinfo, JERR_BAD_HUFF_TABLE);
code <<= 1;
si++;
}
If I simply comment out the check, or add a check for huffsize[p] to be nonzero (as in the containing loop's controlling expression), then djpeg manages to convert the image to a BMP which I can view with few problems.
Why does the comment claim that all-ones codes are not allowed?
It claims that because they are not allowed. That doesn't mean that there can't be images out there that don't comply with the standard.
The reason they are not allowed is this (from the standard):
Making entropy-coded segments an integer number of bytes is performed
as follows: for Huffman coding, 1-bits are used, if necessary, to pad
the end of the compressed data to complete the final byte of a
segment.
If the all 1's code was allowed, then you could end up with an ambiguity in the last byte of compressed data where the padded 1's could be another coded symbol.
I have a device that sends serial data over a USB to COM port to my program at various speeds and lengths.
Within the data there is a chunk of several thousands bytes that starts and ends with special distinct code ('FDDD' for start, 'FEEE' for end).
Due to the stream's length, occasionally not all data is received in one piece.
What is the recommended way to combine all bytes into one message BEFORE parsing it?
(I took care of the buffer size, but have no control over the serial line quality, and can not use hardware control with USB)
Thanks
One possible way to accomplish this is to have something along these lines:
# variables
# buffer: byte buffer
# buffer_length: maximum number of bytes in the buffer
# new_char: char last read from the UART
# prev_char: second last char read from the UART
# n: index to the buffer
new_char := 0
loop forever:
prev_char := new_char
new_char := receive_from_uart()
# start marker
if prev_char = 0xfd and new_char = 0xdd
# set the index to the beginning of the buffer
n := 0
# end marker
else if prev_char = 0xfe and new_char = 0xee
# the frame is ready, do whatever you need to do with a complete message
# the length of the payload is n-1 bytes
handle_complete_message(buffer, n-1)
# otherwise
else
if n < buffer_length - 1
n := n + 1
buffer[n] := new_char
A few tips/comments:
you do not necessarily need a separate start and end markers (you can the same for both purposes)
if you want to have two-byte markers, it would be easier to have them with the same first byte
you need to make sure the marker combinations do no occur in your data stream
if you use escape codes to avoid the markers in your payload, it is convenient to take care of them in the same code
see HDLC asynchronous framing (simply to encode, simple to decode, takes care of the escaping)
handle_complete_message usually either copies the contents of buffer elsewhere or swaps another buffer instead of buffer if in hurry
if your data frames do not have integrity checking, you should check if the payload length is equal to buffer_length- 1, because then you may have an overflow
After several tests, I came up with the following simple solution to my own question (for c#).
Shown is a minimal simplified solution. Can add length checking, etc.
'Start' and 'End' are string markers of any length.
public void comPort_DataReceived(object sender, SerialDataReceivedEventArgs e)
SerialPort port = (SerialPort)sender;
inData = port.ReadExisting();
{
if (inData.Contains("start"))
{
//Loop to collect all message parts
while (!inData.Contains("end"))
inData += port.ReadExisting();
//Complete by adding the last data chunk
inData += port.ReadExisting();
}
//Use your collected message
diaplaydata(inData);
I have a binary image which contains several separated regions. I want to put a threshold on the Area (number of pixels) that these regions occupy, in the way that: a region would be omitted if it has fewer pixels than the threshold. I already have tried these codes (using bwconncomp):
[...]
% let's assume threshold = 50
CC = bwconncomp(my_image);
L = labelmatrix(CC);
A = cell( size(CC.PixelIdxList,1) , size(CC.PixelIdxList,2) );
A = CC.PixelIdxList;
for column = 1 : size(CC.PixelIdxList,2)
if numel(CC.PixelIdxList{column}) < 50, A{column} = 0;
end
end
But at this point I don't know how to convert cell C back to the shape of my image and then show it! Are there any tricks to do that?
Is there any easier and straighter way to gain information about objects in an image than this one I used in here?
I also need to know length and width of these objects. These objects do not necessarily have any specific geometrical shape!
Thanks
Since no one took the effort to answer my question in here, I found it somewhere else. Now I'm coppying it to here, just in case if anyone novice like me might need to know that.
In order to know length and width of objects in an image:
labeledImage = bwlabel(my_image, 8);
regioninfo = regionprops(labeledImage , 'MajorAxisLength', 'MinorAxisLength');
lengths = [regioninfo.MajorAxisLength]; %array
widths = [regioninfo.MinorAxisLength]; %array
I am using Delphi 6 Pro with the DSPACK DirectShow component library to create a DirectShow filter that delivers data in Wav format from a custom audio source. Just to be very clear, I am delivering the raw PCM audio samples as Byte data. There are no Wave files involved, but other Filters downstream in my Filter Graph expect the output pin to deliver standard WAV format sample data in Byte form.
Note: When I get the data from the custom audio source, I format it to the desired number of channels, sample rate, and bits per sample and store it in a TWaveFile object I created. This object has a properly formatted TWaveFormatEx data member that is set correctly to reflect the underlying format of the data I stored.
I don't know how to properly set up the MediaType parameter during a GetMediaType() call:
function TBCPushPinPlayAudio.GetMediaType(MediaType: PAMMediaType): HResult;
.......
with FWaveFile.WaveFormatEx do
begin
MediaType.majortype := (1)
MediaType.subtype := (2)
MediaType.formattype := (3)
MediaType.bTemporalCompression := False;
MediaType.bFixedSizeSamples := True;
MediaType.pbFormat := (4)
// Number of bytes per sample is the number of channels in the
// Wave audio data times the number of bytes per sample
// (wBitsPerSample div 8);
MediaType.lSampleSize := nChannels * (wBitsPerSample div 8);
end;
What are the correct values for (1), (2), and (3)? I know about the MEDIATYPE_Audio, MEDIATYPE_Stream, and MEDIASUBTYPE_WAVE GUID constants, but I am not sure what goes where.
Also, I assume that I need to copy the WaveFormatEx stucture/record from the my FWaveFile object over to the pbFormat pointer (4). I have two questions about that:
1) I assume that should use CoTaskMemAlloc() to create a new TWaveFormatEx object and copy my FWaveFile object's TWaveFormatEx object on to it, before assigning the pbFormat pointer to it, correct?
2) Is TWaveFormatEx the correct structure to pass along? Here is how TWaveFormatEx is defined:
tWAVEFORMATEX = packed record
wFormatTag: Word; { format type }
nChannels: Word; { number of channels (i.e. mono, stereo, etc.) }
nSamplesPerSec: DWORD; { sample rate }
nAvgBytesPerSec: DWORD; { for buffer estimation }
nBlockAlign: Word; { block size of data }
wBitsPerSample: Word; { number of bits per sample of mono data }
cbSize: Word; { the count in bytes of the size of }
end;
UPDATE: 11-12-2011
I want to highlight one of the comments by #Roman R attached to his accepted reply where he tells me to use MEDIASUBTYPE_PCM for the sub-type, since it is so important. I lost a significant amount of time chasing down a DirectShow "no intermediate filter combination" error because I had forgotten to use that value for the sub-type and was using (incorrectly) MEDIASUBTYPE_WAVE instead. MEDIASUBTYPE_WAVE is incompatible with many other filters such as system capture filters and that was the root cause of the failure. The bigger lesson here is if you are debugging an inter-Filter media format negotiation error, make sure that the formats between the pins being connected are completely equal. I made the mistake during initial debugging of only comparing the WAV format parameters (format tag, number of channels, bits per sample, sample rate) which were identical between the pins. However, the difference in sub-type due to my improper usage of MEDIASUBTYPE_WAVE caused the pin connection to fail. As soon as I changed the sub-type to MEDIASUBTYPE_PCM as Roman suggested the problem went away.
(1) is MEDIATYPE_Audio.
(2) is typically a mapping from FOURCC code into GUID, see Media Types, Audio Media Types section.
(3) is FORMAT_WaveFormatEx.
(4) is a pointer (typically allocated by COM task memory allocator API) to WAVEFORMATEX structure.
1) - yes you should allocate memory, put valid data there, by copying or initializing directly, and put this pointer to pbFormat and structure size into cbFormat.
2) - yes it looks good, it is defined like this in first place: WAVEFORMATEX structure.
I am trying to access files that are stored as Jpeg files, is there an easy way to display these image files without performance loss ?
You can load the JPeg file using an instance of TJPEGImage and then assign it to a TBitmap to display. You find TJPEGImage in unit jpeg.
jpeg := TJPEGImage.Create;
jpeg.LoadFromFile('filename.jpg');
bitm := TBitmap.Create;
bitm.Assign(jpeg);
Image1.Height := bitm.Height;
Image1.Width := bitm.Width;
Image1.Canvas.Draw(0, 0, bitm);
Alternatively, this should also work:
bitm := TBitmap.Create;
bitm.Assign('filename.jpg');
Image1.Height := bitm.Height;
Image1.Width := bitm.Width;
Image1.Canvas.Draw(0, 0, bitm);
I found this page!
http://cc.embarcadero.com/Item/19723
Enhanced jpeg implementation
Author: Gabriel Corneanu
This unit contains a new jpeg implementation (based on Delphi original)
fixed bug accessing one pixel height picture
added lossless transformation support for jpeg images (based on Thomas G. Lane C library - visit jpegclub.org/jpegtran )
added CMYK support (read only)
compiled for D5-2010 AND BCB5-6-CMYK to RGB fast MMX conversion (not for Delphi5, lack of MMX ASM) (fallback to simple pascal implementation if not available)
fixed bug in Delphi 5 ASM (CMYK to RGB function)
You only need the jpeg.dcu file; it can be copied to program directory or to the LIB directory.I generated obj and hpp files for use with CBuilder 5 and 6 also.This is what you need to use it:
This is just an enum
TJpegTransform = (
jt_FLIP_H, { horizontal flip }
jt_FLIP_V, { vertical flip }
jt_TRANSPOSE, { transpose across UL-to-LR axis }
jt_TRANSVERSE, { transpose across UR-to-LL axis }
jt_ROT_90, { 90-degree clockwise rotation }
jt_ROT_180, { 180-degree rotation }
jt_ROT_270 { 270-degree clockwise (or 90 ccw) }
);
procedure Crop(xoffs, yoffs, newwidth, newheight: integer); this method is cropping the image
procedure Transform(Operation: TJpegTransform);this method is applying the specified transformation; read the transupp.h comments about limitations(my code is using crop option)
property IsCMYK: boolean read FIsCMYK; this will indicate if the last jpeg image loaded is CMYK encoded
property InverseCMYK: boolean read FInverseCMYK write SetInverseCMYK;if set (default, because I could only find this kind of images), the CMYK image is decoded with inversed CMYK values (I read that Photoshop is doing this).
The jpegex is the same unit compiled with a different name. It can be used to avoid conflicts when you have other components without source code linking to the original jpeg unit. In this case you might need to use qualified class names to solve names conflict: jpegex.TJpegImage.xxx. Be carefull when you use both versions in one program: even though the classes have the same name, they are not identical and you can't cast or assign them directly. The only way to exchange data is saving to/loading from a stream.
Send comments to:
gabrielcorneanuATyahooDOTcom
I dont believe D7 can handle CMYK JPEG's.
If you cant open it using the JPEG unit as Ralph posted, you might consider using something like GDI+ to load the graphic file.
Actually, I once modified Jpeg.pas unit to partial CMYK support. Basically after
jpeg_start_decompress(jc.d)
you should check
if jc.d.out_color_space = JCS_CMYK then
and if true following jpeg_read_scanlines will get 4 bytes data instead of 3 bytes.
Also cinfo.saw_Adobe_marker indicates inverted values (probably Adobe was first who introduced CMYK jpeg variation).
But the most difficult part is CMYK-RGB conversion. Since there's no universal formula, in best systems it's always table approach. I tried to find some simple approximation, but there's always a picture that does not fit. Just as an example, don't use this formulas as a reference:
R_:=Max(254 - (111*C + 2*M + 7*Y + 36*K) div 128, 0);
G_:=Max(254 - (30*C + 87*M + 15*Y + 30*K) div 128, 0);
B_:=Max(254 - (15*C + 44*M + 80*Y + 24*K) div 128, 0);
Easy!
I implemented the CMYK conversion in the JPEG.PAS
Include it in your project to handle CMYK JPEG's
Get it here:
http://delphi.andreotti.nl/