How is a cryptographic checksum of an empty file computed? - sha1

I've just ran
$ sha1sum myfile
out of boredom.
myfile is an empty file which I created with
$ touch myfile
I was surprised that sha1sum actually returned a checksum. Aren't these checksums supposed to be computed from some non-empty content? Is the checksum for an empty file just a hardcoded "magic" constant?

There's nothing fundamentally different with an empty message from a message with say a byte of data. The algorithm is described here http://en.wikipedia.org/wiki/SHA-1#Examples_and_pseudocode and it's fine with zero data.
Eg.
Pre-processing:
append the bit '1' to the message append 0 ≤ k < 512 bits '0', so that the resulting message length (in bits) is
congruent to 448 (mod 512)

Related

Checksum identification

I'm trying to understand some data traffic.
Here are some short packages I captured (Hex string):
Data Checksum
------ ---------
87 0087
7639 7639
7739 7739
DA423030 A25A
DA423031 A25B
DA423130 A35A
DA424030 D25A
DA423040 A22A
DA423032 A258
Can anyone identify how the checksum is made up?
(Note: Adding zero-bytes at the beginning of the data does not change the checksum, but adding them at the end does change it.)
The solution is:
Take the whole string up to the last 2 bytes, perform on it CRC-16/XMODEM.
Perform on the result XOR with the last 2 bytes left.
For example:
Data Checksum
------ ---------
DA423030 A25A
DA42 >> CRC-16/XMODEM >> 926A
926A >> XOR With 3030 >> A25A

Parse array of unsigned integers in Julia 1.x.x

I am trying to open a binary file that I have some knowledge of its internal structure, and reinterpret it correctly in Julia. Let us say that I can load it already via:
arx=open("../axonbinaryfile.abf", "r")
databin=read(arx)
close(arx)
The data is loaded as an Array of UInt8, which I guess are bytes.
In the first 4 I can perform a simple Char conversion and it works:
head=databin[1:4]
map(Char, head)
4-element Array{Char,1}:
'A'
'B'
'F'
' '
Then it happens to be that in the positions 13-16 is an integer of 32 bytes waiting to be interpreted. How should I do that?
I have tried reinterpret() and Int32 as function, but to no avail.
You can use reinterpret(Int32, databin[13:16])[1]. The last [1] is needed, because reinterpret returns you a view.
Now note that read supports type passing. So if you first read 12 bytes of data from your file e.g. like this read(arx, 12) and then run read(arx, Int32) you will get the desired number without having to do any conversions or vector allocation.
Finally observe that what conversion to Char does in your code is converting a Unicode number to a character. I am not sure if this is exactly what you want (maybe it is). For example if the first byte read in has value 200 you will get:
julia> Char(200)
'È': Unicode U+00c8 (category Lu: Letter, uppercase)
EDIT one more comment is that when you do a conversion to Int32 of 4 bytes you should be sure to check if it should be encoded as big-endian or little-endian (see ENDIAN_BOM constant and ntoh, hton, ltoh, htol functions)
Here it is. Use view to avoid copying the data.
julia> dat = UInt8[65,66,67,68,0,0,2,40];
julia> Char.(view(dat,1:4))
4-element Array{Char,1}:
'A'
'B'
'C'
'D'
julia> reinterpret(Int32, view(dat,5:8))
1-element reinterpret(Int32, view(::Array{UInt8,1}, 5:8)):
671219712

how to form fields section in xfd file

I'm having problem to form the field section's structure into xfd files after analyse by issuing commnad "vutil32.exe -i -kx pogl.dad". I hope somebody could help me out how to form out field structure as highlighted in below. I've uploaded sample of my file known as "pglc.dad" i hope soneone could guide me how to form .xfd file from his expert skills and guide me.Thanks
Result from vutil32.exe
file size: 250880
record size (min/max): 121/1024 compressed(80%)
# of keys: 4
key size: 16:02 31:03 56:03 15
key offset: 0 0 0 1
duplicates okay: N N N N
block size: 512
blocks per granule: 1
tree height: 4/2/2.7
# of nodes: 200
# of deleted nodes: 1
total node space: 101800
node space used: 67463 (66%)
user count: 0
Key Dups Seg-1 Seg-2 Seg-3 Seg-4 Seg-5 Seg-6
(sz/of) (sz/of) (sz/of) (sz/of) (sz/of) (sz/of)
0 N 1/0 15/1
1 N 1/0 15/66 15/1
2 N 1/0 40/81 15/1
3 N 15/1
Here is my further construction of .xfd file.
XFD,02,PGLC,PGLC
00300,00041,004
1,0,013,00000
01
PGSTAT
3,0,004,00004,020,00021,004,00000
3
PGSTAT
PGDESC
PGLINE
3,0,004,00004,008,00013,004,00000
03
PGSTAT
PGDESC
PGLINE
1,0,012,00021
01
PGSTAT
000
0150,00150,00003 =================>> How can i form this field section.
00000,00013,16,00016,+00,000,000,PGSTAT
00000,00001,16,00001,+00,000,000,PGDESC
00001,00015,16,00015,+00,000,000,PGLINE
here is the link for my pglc.dad : http://files.engineering.com/getfile.aspx?folder=080fdad6-b1d5-4a37-8dd0-b89f9a985c69&file=PGLC.DAD
Thanks appopriate to someone could helps.
I know the XFD format intimately as I have written a couple of parsers of this file format in both Perl and Cobol.
Having said that, I would strongly recommend that you do not try to hand craft an XFD file from scratch.
If you have an AcuCobol (MicroFocus) compiler, and the source of the file's SELECT and FD definitions, then you can create a very small Cobol program that has just the SELECT and FD definitions and then compile the program using:
ccbl32.exe -Fx <program>
That will create an XFD file for the indexed file definition. Note, you can specify a directory for the created XFD file using the -Fo <directory> option.
If you don't have the source of the file definitions, then you are just going to be guessing what and where the fields are. The indexed file by itself will not tell you that information. I can see from extracting the data in your file (using the vutil -e option) that the file contains binary data as well as text, so without knowing exactly what PICture those fields are (COMP-?) you will be struggling to figure out the structure of those fields.

TCP/IP Client / Server commands data

I have a Client/Server architecture (C# .Net 4.0) that send's command packets of data as byte arrays. There is a variable number of parameters in any command, and each paramater is of variable length. Because of this I use delimiters for the end of a parameter and the command as a whole. The operand is always 2 bytes and both types of delimiter are 1 byte. The last parameter_delmiter is redundant as command_delmiter provides the same functionality.
The command structure is as follow:
FIELD SIZE(BYTES)
operand 2
parameter1 x
parameter_delmiter 1
parameter2 x
parameter_delmiter 1
parameterN x
.............
.............
command_delmiter 1
Parameters are sourced from many different types, ie, ints, strings etc all encoded into byte arrays.
The problem I have is that sometimes parameters when encoded into byte arrays contain bytes that are the same value as a delimiter. For example command_delmiter=255.. and a paramater may have that byte inside of it.
There is 3 ways I can think of fixing this:
1) Encode the parameters differently so that they can never be the same value as a delimiter (255 and 254) Modulus?. This will mean that paramaters will become larger, ie Int16 will be more than 2 bytes etc.
2) Do not use delimiters at all, use count and length values at the start of the command structure.
3) Use something else.
To my knowledge, the way TCP/IP buffers work is that SOME SORT of delimiter has to be used to seperate 'commands' or 'bundles of data' as a buffer may contain multiple commands, or a command may span multiple buffers.. So this
BinaryReader / Writer seems like an obvious candidate, the only issue is that the byte array may contain multiple commands ( with parameters inside). So the byte array would still have to be chopped up in order to feel into the BinaryReader.
Suggestions?
Thanks.
The standard way to do this is to have the length of the message in the (fixed) first few bytes of a message. So you could have the first 4 bytes to denote the length of a message, read those many bytes for the content of the message. The next 4 bytes would be the length of the next message. A length of 0 could indicate end of messages. Or you could use a header with a message count.
Also, remember TCP is a byte stream, so don't expect a complete message to be available every time you read data from a socket. You could receive an arbitrary number of bytes at ever read.

Read numbers following a keyword into an array in Fortran 90 from a text file

I have many text files of this format
....
<snip>
'FOP' 0.19 1 24 1 25 7 8 /
'FOP' 0.18 1 24 1 25 9 11 /
/
TURX
560231
300244
70029
200250
645257
800191
900333
600334
770291
300335
220287
110262 /
SUBTRACT
'TURX' 'TURY'/
</snip>
......
where the portions I snipped off contain other various data in various formats. The file format is inconsistent (machine generated), the only thing one is assured of is the keyword TURX which may appear more than once. If it appears alone on one line, then the next few lines will contain numbers that I need to fetch into an array. The last number will have a space then a forward slash (/). I can then use this array in other operations afterwards.
How do I "search" or parse a file of unknown format in fortran, and how do I get a loop to fetch the rest of the data, please? I am really new to this and I HAVE to use fortran. Thanks.
Fortran 95 / 2003 have a lot of string and file handling features that make this easier.
For example, this code fragment to process a file of unknown length:
use iso_fortran_env
character (len=100) :: line
integer :: ReadCode
ReadLoop: do
read (75, '(A)', iostat=ReadCode ) line
if ( ReadCode /= 0 ) then
if ( ReadCode == iostat_end ) then
exit ReadLoop
else
write ( *, '( / "Error reading file: ", I0 )' ) ReadCode
stop
end if
end if
! code to process the line ....
end do ReadLoop
Then the "process the line" code can contain several sections depending on a logical variable "Have_TURX". If Have_TRUX is false you are "seeking" ... test whether the line contains "TURX". You could use a plain "==" if TURX is always at the start of the string, or for more generality you could use the intrinsic function "index" to test whether the string "line" contains TURX.
Once the program is in the mode Have_TRUX is true, then you use "internal I/O" to read the numeric value from the string. Since the integers have varying lengths and are left-justified, the easiest way is to use "list-directed I/O": combining these:
read (line, *) integer_variable
Then you could use the intrinsic function "index" again to test whether the string also contains a slash, in which case you change Have_TRUX to false and end reading mode.
If you need to put the numbers into an array, it might be necessary to read the file twice, or to backspace the file, because you will have to allocate the array, and you can't do that until you know the size of the array. Or you could pop the numbers into a linked list, then when you hit the slash allocate the array and fill it from the linked list. Or if there is a known maximum number of values you could use a temporary array, then transfer the numbers to an allocatable output array. This is assuming that you want the output argument of the subroutine be an allocatable array of the correct length, and the it returns one group of numbers per call:
integer, dimension (:), allocatable, intent (out) :: numbers
allocate (numbers (1: HowMany) )
P.S. There is a brief summary of the language features at http://en.wikipedia.org/wiki/Fortran_95_language_features and the gfortran manual has a summary of the intrinsic procedures, from which you can see what built in functions are available for string handling.
I'll give you a nudge in the right direction so that you can finish your project.
Some basics:
Do/While as you'll need some sort of loop
structure to loop through the file
and then over the numbers. There's
no for loop in Fortran, so use this
type.
Read
to read the strings.
To start you need something like this:
program readlines
implicit none
character (len=30) :: rdline
integer,dimension(1000) :: array
! This sets up a character array with 30 positions and an integer array with 1000
!
open(18,file='fileread.txt')
do
read(18,*) rdline
if (trim(rdline).eq.'TURX') exit !loop until the trimmed off portion matches TURX
end do
See this thread for way to turn your strings into integers.
Final edit: Looks like MSB has got most of what I just found out. The iostat argument of the read is the key to it. See this site for a sample program.
Here was my final way around it.
PROGRAM fetchnumbers
implicit none
character (len=50) ::line, numdata
logical ::is_numeric
integer ::I,iost,iost2,counter=0,number
integer, parameter :: long = selected_int_kind(10)
integer, dimension(1000)::numbers !Can the number of numbers be up to 1000?
open(20,file='inputfile.txt') !assuming file is in the same location as program
ReadLoop: do
read(20,*,iostat=iost) line !read data line by line
if (iost .LT. 0) exit !end of file reached before TURX was found
if (len_trim(line)==0) cycle ReadLoop !ignore empty lines
if (index(line, 'TURX').EQ.1) then !prepare to begin capturing
GetNumbers: do
read(20, *,iostat=iost2)numdata !read in the numbers one by one
if (.NOT.is_numeric(numdata)) exit !no more numbers to read
if (iost2 .LT. 0) exit !end of file reached while fetching numbers
read (numdata,*) number !read string value into a number
counter = counter + 1
Storeloop: do I =1,counter
if (I<counter) cycle StoreLoop
numbers(counter)=number !storing data into array
end do StoreLoop
end do GetNumbers
end if
end do ReadLoop
write(*,*) "Numbers are:"
do I=1,counter
write(*,'(I14)') numbers(I)
end do
END PROGRAM fetchnumbers
FUNCTION is_numeric(string)
IMPLICIT NONE
CHARACTER(len=*), INTENT(IN) :: string
LOGICAL :: is_numeric
REAL :: x
INTEGER :: e
is_numeric = .FALSE.
READ(string,*,IOSTAT=e) x
IF (e == 0) is_numeric = .TRUE.
END FUNCTION is_numeric

Resources