SHA -1 algorithm source code - sha1

SHA 1 algorithm will compress and convert the input data into 160 bit format.so, i need java source code for this SHA 1 alogorithm in which whatever the input data we give,that needs to be converted into 160 bit format.please help to solve this

Related

Importing MNIST dataset with Fortran

A Linux/GFortran question.
I know exactly what my problem is but I can't figure out how to solve it...
I want to import the MNIST dataset images and labels into Fortran arrays to play around with Machine Learning algorithms using Fortran. I've done this with Python but I can't replicate reading the data files with Fortran.
The dataset files and file layout descriptions are at:
http://yann.lecun.com/exdb/mnist/
The 2 problems I'm struggling with are...
1) The data in the files is stored in unsigned bytes. I can't find a similar datatype in Fortran. I'm using integer(kind=1) to read the first 4 bytes successfully, which constitutes the file magic number, but I'm worried about incorrectly reading the value of one of these bytes into the signed integer(kind=1) datatype.
2) The data is stored in Big-Endian format. So when I read the number of images, rows and columns, which are stored in 4 byte integers, into my Little-Endian machine, I receive the obvious gobbledegook. Ideally, what I would like to be able to do is specify the Endiness of a variable to read from a file in an edit descriptor. Is this possible?
Any assistance would be much appreciated.
Kind regards

Deflate and fixed Huffman codes

I'm trying to implement a deflate compressor and I have to decide whether to
compress a block using the static huffman code or create a dynamic one.
What is the rationale behind the length associated with the static code?
(this is the table included in the rfc)
Lit Value Bits
--------- ----
0 - 143 8
144 - 255 9
256 - 279 7
280 - 287 8
I thought static code was more biased towards plain ascii text, instead it
looks like it prefers by a tiny bit the compression of the rle length
What is a good heuristic to decide whether to use static code?
I was thinking to build a distribution of probabilities from a sample of the
input data and calculate a distance (maybe EMD?) from the probabilities derived
from the static code.
I would guess that the creator of the code took a large sample of literals and lengths from compressed data, likely including executables along with text, and found typical code lengths over the large set. They were then approximated with the table shown. However the author passed away many years ago, so we'll never know for sure.
You don't need a heuristic. Once you have done the work to find matching strings, it is comparatively very fast to compute the number of bits in the block for both a dynamic and static representation. Then simply pick the smaller one. Or the static one if equal (decodes faster).
I don't know about rationale, but there was a small amount of irrationale in choosing the static code lengths:
In the table in your question, the maximum static code number there is 287, but the DEFLATE specification only allows up to code 285, meaning code lengths have wastefully been assigned to two invalid codes. (And not even the longest ones either!) It's a similar story with the table for distance codes, with 32 codes having lengths assigned, but only 30 valid.
So there are some easy improvements that could have been made, but that said, without some prior knowledge of the data, it's not really possible to produce anything that's massively more efficient generally. The "flatness" of the table (no code longer than 9 bits) reduces the worst-case performance to 1 extra bit per byte of uncompressable data.
I think the main rationale behind the groupings is that by keeping group sizes to a multiple of 8, it's possible to tell which group a code belongs to by looking at the 5 most significant bits, which also tells you its length, along with what value to add to immediately get the code value itself
00000 00 .. 00101 11 7 bits + 256 -> (256..279)
00110 000 .. 10111 111 8 bits - 48 -> ( 0..144)
11000 000 .. 11000 111 8 bits + 78 -> (280..287)
11001 0000 .. 11111 1111 9 bits - 256 -> (144..255)
So in theory you could set up a lookup table with 32 entries to quickly read in the codes, but it's an uncommon case and probably not worth optimising for.
There are only really two cases (with some overlap) where Fixed Huffman blocks are likely to be the most efficient:
where the input size in bytes is very small, Static Huffman can be more efficient than Uncompressed, because Uncompressed uses a 32-bit header, while Fixed Huffman needs only a 7-bit footer, plus 1 bit potential overhead per byte.
where the output size is very small (ie. small-ish, highly compressible data), Static Huffman can be more efficient than Dynamic Huffman - again because Dynamic Huffman uses a certain amount of space for an additional header. (A practical minimum header size is difficult to calculate, but I'd say at least 64 bits, probably more.)
That said, I've found they are actually helpful from a developer's perspective, because it's very easy to implement a Deflate-compatible function using Static Huffman blocks, and to iteratively improve from there to get more efficient algorithms working.

Validating an HDF5 superblock checksum

I am having a problem writing a program which verifies the checksum in the superblock of an HDF5, Version 2 file. I am not using the HDF5 software, but I have a copy of H5_checksum_fletcher32 (from the HDF5 H5checksum.c) in my code.
I can assume that the file signature block is at position 0.
My logic is:
Let offset = the value of byte 9 of the file.
The superblock spans bytes 0 to (15+4*offset).
The last 4 bytes are the checksum as an unsigned int.
The checksum should equal H5_checksum_fletcher32 applied to bytes 0 to (11+4*offset).
I have applied this logic to several test files from NOAA that I believe to be reliable, but the checksum never matches the result of H5_checksum_fletcher32. The other values in the superblock appear to be correct. Can anyone see the flaw in my logic?
From the HDF5 file format specification:
All checksums used in the format are computed with the Jenkins' lookup3 algorithm.
This is provided in H5checksum.c as H5_checksum_lookup3().
Actually, it seems the correct routine to call is H5_checksum_metadata(), but this just calls H5_checksum_lookup3() using a macro.

Find out if PNG is 8 or 24 bits and alpha

Given a UIImage/NSImage or NSData instance, how do you find out programatically if a PNG is 8 bits or 24 bits? And if it has an alpha channel?
Is there any Cocoa/Cocoa Touch API that helps with this?
To avoid duplication, here is the non-programatic answer to the question, and here's a way to find out if an image is a PNG.
As a person who programmed for a long time on a J2ME platform, I know the PNG format very well. If you have the raw data as a NSData instance, looking up the information is very easy since the PNG format is very straightforward.
See PNG Specification
The PNG file starts with a signature and then it contains a sequence of chunks. You are interested only in the first chunk, IHDR.
So, the code should be along the lines of:
Skip the first 8 bytes (signature)
Skip 4 bytes (IHDR chunk length)
Skip 4 bytes (IHDR chunk type = "IHDR")
Read IHDR
Width: 4 bytes
Height: 4 bytes
Bit depth: 1 byte
Color type: 1 byte
Compression method: 1 byte
Filter method: 1 byte
Interlace method: 1 byte
For alpha, you should also check if there is a tRNS (transparency) chunk in the file. To find a chunk, you can go with the following algorithm:
Read chunk length (4 bytes)
Read chunk type (4 bytes)
Check chunk type whether it is the type we are looking for
Skip chunk length bytes
Skip 4 bytes of CRC
Repeat
EDIT:
To find info about a UIImage instance, get its CGImage and use one of the CGImageGet... functions.
It should be noted that All integer values in a PNG file are read in Big-endian format.

How could I guess a checksum algorithm?

Let's assume that I have some packets with a 16-bit checksum at the end. I would like to guess which checksum algorithm is used.
For a start, from dump data I can see that one byte change in the packet's payload totally changes the checksum, so I can assume that it isn't some kind of simple XOR or sum.
Then I tried several variations of CRC16, but without much luck.
This question might be more biased towards cryptography, but I'm really interested in any easy to understand statistical tools to find out which CRC this might be. I might even turn to drawing different CRC algorithms if everything else fails.
Backgroud story: I have serial RFID protocol with some kind of checksum. I can replay messages without problem, and interpret results (without checksum check), but I can't send modified packets because device drops them on the floor.
Using existing software, I can change payload of RFID chip. However, unique serial number is immutable, so I don't have ability to check every possible combination. Allthough I could generate dumps of values incrementing by one, but not enough to make exhaustive search applicable to this problem.
dump files with data are available if question itself isn't enough :-)
Need reference documentation? A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS is great reference which I found after asking question here.
In the end, after very helpful hint in accepted answer than it's CCITT, I
used this CRC calculator, and xored generated checksum with known checksum to get 0xffff which led me to conclusion that final xor is 0xffff instread of CCITT's 0x0000.
There are a number of variables to consider for a CRC:
Polynomial
No of bits (16 or 32)
Normal (LSB first) or Reverse (MSB first)
Initial value
How the final value is manipulated (e.g. subtracted from 0xffff), or is a constant value
Typical CRCs:
LRC: Polynomial=0x81; 8 bits; Normal; Initial=0; Final=as calculated
CRC16: Polynomial=0xa001; 16 bits; Normal; Initial=0; Final=as calculated
CCITT: Polynomial=0x1021; 16 bits; reverse; Initial=0xffff; Final=0x1d0f
Xmodem: Polynomial=0x1021; 16 bits; reverse; Initial=0; Final=0x1d0f
CRC32: Polynomial=0xebd88320; 32 bits; Normal; Initial=0xffffffff; Final=inverted value
ZIP32: Polynomial=0x04c11db7; 32 bits; Normal; Initial=0xffffffff; Final=as calculated
The first thing to do is to get some samples by changing say the last byte. This will assist you to figure out the number of bytes in the CRC.
Is this a "homemade" algorithm. In this case it may take some time. Otherwise try the standard algorithms.
Try changing either the msb or the lsb of the last byte, and see how this changes the CRC. This will give an indication of the direction.
To make it more difficult, there are implementations that manipulate the CRC so that it will not affect the communications medium (protocol).
From your comment about RFID, it implies that the CRC is communications related. Usually CRC16 is used for communications, though CCITT is also used on some systems.
On the other hand, if this is UHF RFID tagging, then there are a few CRC schemes - a 5 bit one and some 16 bit ones. These are documented in the ISO standards and the IPX data sheets.
IPX: Polynomial=0x8005; 16 bits; Reverse; Initial=0xffff; Final=as calculated
ISO 18000-6B: Polynomial=0x1021; 16 bits; Reverse; Initial=0xffff; Final=as calculated
ISO 18000-6C: Polynomial=0x1021; 16 bits; Reverse; Initial=0xffff; Final=as calculated
Data must be padded with zeroes to make a multiple of 8 bits
ISO CRC5: Polynomial=custom; 5 bits; Reverse; Initial=0x9; Final=shifted left by 3 bits
Data must be padded with zeroes to make a multiple of 8 bits
EPC class 1: Polynomial=custom 0x1021; 16 bits; Reverse; Initial=0xffff; Final=post processing of 16 zero bits
Here is your answer!!!!
Having worked through your logs, the CRC is the CCITT one. The first byte 0xd6 is excluded from the CRC.
It might not be a CRC, it might be an error correcting code like Reed-Solomon.
ECC codes are often a substantial fraction of the size of the original data they protect, depending on the error rate they want to handle. If the size of the messages is more than about 16 bytes, 2 bytes of ECC wouldn't be enough to be useful. So if the message is large, you're most likely correct that its some sort of CRC.
I'm trying to crack a similar problem here and I found a pretty neat website that will take your file and run checksums on it with 47 different algorithms and show the results. If the algorithm used to calculate your checksum is any of these algorithms, you would simply find it among the list of checksums produced with a simple text search.
The website is https://defuse.ca/checksums.htm
You would have to try every possible checksum algorithm and see which one generates the same result. However, there is no guarantee to what content was included in the checksum. For example, some algorithms skip white spaces, which lead to different results.
I really don't see why would somebody want to know that though.

Resources