I was working on a compound file which contains several streams. I'm frustrated how to figure out the content of each stream. I don't know if these bytes are text or mp3 or video.
for example: is there a way to understand what types of data could these bytes are?
b'\x00\x00\x00\x00\x00\x00\x00\x00\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x0bz\xcc\xc9\xc8\xc0\xc0\x00\xc2?\x82\x1e<\x0ec\xbc*8\x19\xc8i\xb3W_\x0b\x14bH\x00\xb2-\x99\x18\x18\xfe\x03\x01\x88\xcf\xc0\x01\xc4\xe1\x0c\xf9\x0cE\x0c\xd9\x0c\xc5\x0c\xa9\x0c%\x0c\x86`\xcd \x0c\x020\x1a\x00\x00\x00\xff\xff\x02\x080\x00\x96L~\x89W\x00\x00\x00\x00\x80(\\B\xefI;\x9e}p\xfe\x1a\xb2\x9b>(\x81\x86/=\xc9xH0:Pwb\xb7\xdck-\xd2F\x04\xd7co'
Yes, there is away to figure out each stream content. there is a signature for each file on this planet in addition to extension which is not reliable. it might be removed or falsely added.
So what is the signature?
In computing, a file signature is data used to identify or verify the
contents of a file. In particular, it may refer to:
File magic number: bytes within a file used to identify the
format of the file; generally a short sequence of bytes (most are
2-4 bytes long) placed at the beginning of the file; see list of file
signatures
File checksum or more generally the result of a hash function over the file contents: data used to verify the integrity of the file
contents, generally against transmission errors or malicious attacks.
The signature can be included at the end of the file or in a separate
file.
I used the magic number to define the magic number term I'm copying this from Wikipedia
In computer programming, the term magic number has multiple
meanings. It could refer to one or more of the following:
Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants
A constant numerical or text value used to identify a file format or protocol; for files, see List of file
signatures
Distinctive unique values that are unlikely to be mistaken for other meanings(e.g., Globally Unique Identifiers)
in the second point it is a certain sequence of bytes like
PNG (89 50 4E 47 0D 0A 1A 0A)
or
BMP (42 4D)
So how to know the magic number of each file?
in this article "Investigating File Signatures Using PowerShell" we find the writer created a wonderful power shell function to get the magic number also he mentioned a tool and I'm copying this from his article
PowerShell V5 brings in Format-Hex, which can provide an alternative
approach to reading the file and displaying the hex and ASCII value to
determine the magic number.
form Format-Hex help I'm copying this description
The Format-Hex cmdlet displays a file or other input as hexadecimal
values. To determine the offset of a character from the output, add
the number at the leftmost of the row to the number at the top of the
column for that character.
This cmdlet can help you determine the file type of a corrupted file
or a file which may not have a file name extension. Run this cmdlet,
and then inspect the results for file information.
this tool is very good also to get the magic number of a file. Here is an example
another tool is online hex editor but to be onset I didn't understand how to use it.
now we got the magic number but how to know what type of data or is that file or stream?
and that is the most good question.
Luckily there are many database for these magic numbers. let me list some
File Signatures
FILE SIGNATURES TABLE
List of file signatures
for example the first database has a search capability. just enter the magic number with no spaces and search
after you may find. Yes, may. There is a big possibility that you won't directly find the file type in question.
I faced this and solved it by testing the streams against specific types of signatures. Like PNG I was searching for in a stream
def GetPngStartingOffset(arr):
#targted magic Number for png (89 50 4E 47 0D 0A 1A 0A)
markerFound = False
startingOffset = 0
previousValue = 0
arraylength = range(0, len(arr) -1)
for i in arraylength:
currentValue = arr[i]
if (currentValue == 137): # 0x89
markerFound = True
startingOffset = i
previousValue = currentValue
continue
if currentValue == 80: # 0x50
if (markerFound and (previousValue == 137)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 78: # 0x4E
if (markerFound and (previousValue == 80)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 71: # 0x47
if (markerFound and (previousValue == 78)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 13: # 0x0D
if (markerFound and (previousValue == 71)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 10: # 0x0A
if (markerFound and (previousValue == 26)):
return startingOffset
if (markerFound and (previousValue == 13)):
previousValue = currentValue
continue
markerFound = False
elif currentValue == 26: # 0x1A
if (markerFound and (previousValue == 10)):
previousValue = currentValue
continue
markerFound = False
return 0
Once this function found the magic number
I split the stream and save the png file
arr = stream.read()
a = list(arr)
B = a[GetPngStartingOffset(a):len(a)]
bytesString = bytes(B)
image = Image.open(io.BytesIO(bytesString))
image.show()
At the end this is not an end to end solution but it is a way to figure out streams content
Thanks for reading and Thanks for #Robert Columbia for his patience
Related
I have an binary file with shows glibberish infos if i open it in Notepad.
I am working on an plugin to use with wireshark.
So my problem is that I need help. I am reading in an File and need to find 'V' '0' '0' '1' (0x56 0x30 0x30 0x31) in the File, because its the start of an Header, with means there is an packet inside. And I need to do this for the whole file, like parsing. Also should start the Frame with V 0 0 1 and not end with it.
I currently have an Code where I am searching for 0x7E and parse it. What I need is the length of the frame. For example V 0 0 1 is found, so the Length from V to the Position before the next V 0 0 1 in the File. So that I can work with the length and add it to an captured length to get the positions, that wireshark can work with.
For example my unperfect Code for working with 0x7E:
local line = file:read()
local len = 0
for c in (line or ''):gmatch ('.') do
len = len + 1
if c:byte() == 0x7E then
break
end
end
if not line then
return false
end
frame.captured_length = len
Here is also the Problem that the Frame ends with 7E which is wrong. I need something that works perfectly for 'V' '0' '0' '1'. Maybe I need to use string.find?
Please help me!
Thats an example how my file looks like if i use the HEX-Editor in Visual Studio Code.
Lua has some neat pattern tools. Here's a summary:
(...) Exports all captured text within () and gives it to us.
-, +, *, ?, "Optional match as little as possible", "Mandatory match as much as possible", "optional match as much as possible", "Optional match only once", respectively.
^ and $: Root to start or end of file, respectively.
We'll be using this universal input and output to test with:
local output = {}
local input = "V001Packet1V001Packet2oooV001aaandweredonehere"
The easiest way to do this is probably to recursively split the string, with one ending at the character before "V", and the other starting at the character after "1". We'll use a pattern which exports the part before and after V001:
local this, next = string.match(input, "(.-)V001(.*)")
print(this,next) --> "", "Packet1V001Packet2..."
Simple enough. Now we need to do it again, and we also need to eliminate the first empty packet, because it's a quirk of the pattern. We can probably just say that any empty this string should not be added:
if this ~= "" then
table.insert(output, this)
end
Now, the last packet will return nil for both this and next, because there will not be another V001 at the end. We can prepare for that by simply adding the last part of the string when the pattern does not match.
All put together:
local function doStep(str)
local this, next = string.match(str, "(.-)V001(.*)")
print(this,next)
if this then
-- There is still more packets left
if this ~= "" then
-- This is an empty packet
table.insert(output, this)
end
if next ~= "" then
-- There is more out there!
doStep(next)
end
else
-- We are the last survivor.
table.insert(output, str)
end
end
Of course, this can be improved, but it should be a good starting point. To prove it works, this script:
doStep(input)
print(table.concat(output, "; "))
prints this:
Packet1; Packet2ooo; aaandweredonehere
I have a 4 byte hexadecimal value that I have a script to print out, But I want to now take that value then subtract the value C8 from it 37 times and save them as different variables each time, But the problem is I don't know how to do hexadecimal calculations in lua, If anyone can link me to any documentation on how to do this then that would be much appreciated.
You can make a hexadecimal literal in Lua by prefixing it with 0x, as stated in the reference manual. I found this by googling "lua hex"; such searches usually get good results.
"Hexadecimal numbers" aren't anything special, hexadecimal is just a way to represent numbers, same as decimal or binary. You can do 1000-0xC8 and you'll get the decimal number 800.
Code to convert:
function convertHex()
local decValue = readInteger(0x123456);
hexValue = decValue
end
function hexSubtract()
for i = 1,37 do
local value = 0xC8
hexValue = hexValue - 0xC8
result = hexValue
if i == 37 then
print(result) --Prints dec value
print(string.format('%X',result)); --Prints hex value
end
end
end
Replace 0x123456 with your address, use those functions like this convertHex(),hexSubtract()
I've located a code that I want to use when I'm writing notes on a MUD I play. Lines can only be 79 characters long for each note, so it's a hassle sometimes to write a note unless you're counting characters. The code is below:
function wrap(str, limit, indent, indent1)
indent = indent or ""
indent1 = indent1 or indent
limit = limit or 79
local here = 1-#indent1
return indent1..str:gsub("(%s+)()(%S+)()",
function(sp, st, word, fi)
if fi-here > limit then
here = st - #indent
return "\n"..indent..word
end
end)
end
This would work great; I can type a 300 character line and it will format it to 79 characters, respecting full words.
The problem I'm having, and I cannot seem to figure out how to solve, is that sometimes, I want to add colour codes to the line, and colour codes are not counted against word count. For example:
#GThis is a colour-coded #Yline that should #Bbreak off at 79 #Mcharacters, but ignore #Rthe colour codes (#G, #Y, #B, #M, #R, etc) when doing so.
Essentially, it would strip the colour codes away and break the line appropriately, but without losing the colour codes.
Edited to include what it should check, and what the final output should be.
The function would only check the string below for line breaks:
This is a colour-coded line that should break off at 79 characters, but ignore the colour codes (, , , , , etc) when doing so.
but would actually return:
#GThis is a colour-coded #Yline that should #Bbreak off at 79 #Ncharacters, but ignore
the colour codes (#G, #Y, #B, #M, #R, etc) when doing so.
To complicate things, we also have xterm colour codes, which are similar, but look like this:
#x123
It is always #x followed by a 3-digit number. And lastly, to further complicate things, I don't want it to strip out purpose colour codes (which would be ##R, ##x123, etc.).
Is there a clean way of doing this that I'm missing?
function(sp, st, word, fi)
local delta = 0
word:gsub('#([#%a])',
function(c)
if c == '#' then delta = delta + 1
elseif c == 'x' then delta = delta + 5
else delta = delta + 2
end
end)
here = here + delta
if fi-here > limit then
here = st - #indent + delta
return "\n"..indent..word
end
end
I have written several Lua Dissectors for custom protocols we use and they work fine. In order to spot problems with missing packets I need to check the custom protocol sequence numbers against older packets.
The IP source and Destination addresses are always the same for device A to device B.
Inside this packet we have one custom ID.
Each ID has a sequence number so device B can determine if a packet is missing. The sequence number increments by 256 and rolls over when it reaches 65k
I have tried using global dictionary but when you scroll up and down the trace the decoder is rerun and the values change.
a couple of lines below show where the information is stored.
ID = buffer(0,6):bitfield(12,12)
SeqNum = buffer(0,6):bitfield(32,16)
Ideally I would like to list in each decoded frame if the previous sequence number is more than 256 away and to produce a table lists all these bad frames.
Src IP; Dst IP; ID; Seq
1 10.12.1.2; 10.12.1.3; 10; 0
2 10.12.1.2; 10.12.1.3; 11; 0
3 10.12.1.2; 10.12.1.3; 12; 0
4 10.12.1.2; 10.12.1.3; 11; 255
5 10.12.1.2; 10.12.1.3; 12; 255
6 10.12.1.2; 10.12.1.3; 10; 511 Packet with seq 255 is missing
I have now managed to get the dissector to check the current packet against previous packets by using a global array, where I store specific information about each frame. In the current packet being dissected I recheck the most recent packet and work my way back to the start to find a suitable packet.
dict[pinfo.number] = {frame = pinfo.number, dID = ID, dSEQNUM = SeqNum}
local frameCount = 0
local frameFound = false
while frameFound == false do
if pinfo.number > frameCount then
frameCount = frameCount + 1
if dict[(pinfo.number - frameCount)] ~= nil then
if dict[(pinfo.number - frameCount)].dID == dict[pinfo.number].dID then
seq_difference = (dict[(pinfo.number)].dSEQNUM - dict[(pinfo.number - frameCount)].dSEQNUM)
if seq_difference > 256 then
pinfo.cols.info = string.format('ID-%d SeqNum-%d missing packet(s) %d last frame %d ', ID,SeqNum, seq_difference, dict[(pinfo.number - frameCount)].frame)
end
frameFound = true
end
end
else
frameFound = true
end
end
I'm not sure I see a question to answer? If you're asking "how can I avoid having to deal with the dissector being invoked multiple times and screwing up the previous decoding of the values" - the answer to that is using the pinfo.visited boolean. It will be false the first time a given packet is dissected, and true thereafter no matter how much clicking around the user does - until the file is reloaded or a new one loaded.
To handle the reloading/new-file case, you'd hook into the init() function call for your proto, by defining a function myproto.init() function, and in that you'd clear your entire array table.
Also, you might want to google for related questions/answer on ask.wireshark.org, as that site is more frequently used for wireshark Lua API questions. For example this question/answer is similar and related to your case.
I have 30000 files to process each file has 80000 x 5 lines. I need to read all files and process them finding the average of each line. I have written the code to read and extract all data from the file. My code is in Fortran. There is an array of (30000 X 800000) My program could not go over (3300 X 80000). I need to add the 4th column of each file in 300 file steps, I mean 4th column of 1st file with 4th column of 301st file, 4th col of 2nd file with 4th col of 302nd file and so on .Do you think this is because of the limitation of the size of array that Fortran can handle? If so, is there any way to increase the size of the array that Fortran can handle? What about the no of files? My code looks like this:
This program runs well.
implicit double precision (a-h,o-z),integer(i-n)
dimension x(78805,5),y(78805,5),den(78805,5)
dimension b(3300,78805),bb(78805)
character*70,fn
nf = 3300 ! NUMBER OF FILES
nj = 78804 ! Number of rows in file.
ns = 300 ! No. of steps for files.
ncores = 11 ! No of Cores
c--------------------------------------------------------------------
c--------------------------------------------------------------------
!Initialization
do i = 0,nf
do j = 1, nj
x(j,1) = 0.0
y(j,2) = 0.0
den(j,4) = 0.0
c a(i,j) = 0.0
b(i,j) = 0.0
c aa(j) = 0.0
bb(j) = 0.0
end do
end do
c-------!Body program-----------------------------------------------
iout = 6 ! Output Files upto "ns" no.
DO i= 1,nf ! LOOP FOR THE NUMBER OF FILES
write(fn,10)i
open(1,file=fn)
do j=1,nj ! Loop for the no of rows in the domain
read(1,*)x(j,1),y(j,2),den(j,4)
if(i.le.ns) then
c a(i,j) = prob(j,3)
b(i,j) = den(j,4)
else
c a(i,j) = prob(j,3) + a(i-ns,j)
b(i,j) = den(j,4) + b(i-ns,j)
end if
end do
close(1)
c ----------------------------------------------------------
c -----Write Out put [Probability and density matrix]-------
c ----------------------------------------------------------
if(i.ge.(nf-ns)) then
do j = 1, nj
c aa(j) = a(i,j)/(ncores*1.0)
bb(j) = b(i,j)/(ncores*1.0)
write(iout,*) int(x(j,1)),int(y(j,2)),bb(j)
end do
close(iout)
iout = iout + 1
end if
END DO
10 format(i0,'.txt')
END
It's hard to say for sure because you haven't given all the details yet, but your problem is quite possibly that you are using a 32 bit compiler producing 32 bit executables and you are simply running out of address space.
Although your operating system supports 64 bit address space, your 32 bit process is still limited to 32 bit addresses.
You have found a limit at 3300*78805*8 which is just under 2GB and this supports my theory.
No matter what is the cause of your immediate problem, your fundamental problem is that you appear to be loading everything into memory at once. I've not closely studied your algorithm but on first inspection it seems likely that you could re-arrange it to avoid having everything in memory at once.