ATM pcap file to Ethernet Pcap file - network-programming

I am looking for .pcap file format of ATM.Its needed in my project.I am looking for mechanism so that ATM pcap file can be converted to Ethernet pcap file

There's more than one link-layer header type for ATM. The list of link-layer header type values has both LINKTYPE_ATM_RFC1483/DLT_ATM_RFC1483, where the packets begin with an IEEE 802.2 header, and LINKTYPE_SUNATM/DLT_SUNATM, where the packets being with a SunATM header. There may be some other link-layer header types being used as well; you'll have to determine which of them the ATM pcap files you're looking at use.

Related

How to parse a binary PDF stream of unknown length?

From the PDF docs: "The keyword stream that follows the stream dictionary shall be followed by an end-of-line marker consisting of either a CARRIAGE RETURN and a LINE FEED or just a LINE FEED, and not by a CARRIAGE RETURN alone. The sequence of bytes that make up a stream lie between the end-of-line marker following the stream keyword and the endstream keyword; the stream dictionary specifies the exact number of bytes."
As the contents may be binary, an occurrence of endstream does not necessarily indicate the end of the stream. Now when considering this stream:
%PDF-1.4
%307쏢
5 0 obj
<</Length 6 0 R/Filter /FlateDecode>>
stream
x234+T03203T0^#A(235234˥^_d256220^314^U310^E^#[364^F!endstream
endobj
6 0 obj
30
endobj
The Length is an indirect object that follows the stream. Obviously that length can only be read after the stream has been parsed.
I think allowing Length to be an indirect object that can only be resolved after the stream is a design defect. While it may help PDF writers to output PDFs sequentially, it makes parsing for PDF readers quite difficult. Considering that a PDF file is read more frequently than being written, I don't understand this.
So how can such a stream be parsed correctly?
The Length is an indirect object that follows the stream. Obviously that length can only be read after the stream has been parsed.
This is an understandable conclusion if one assumes that the file is to be read sequentially beginning to end.
This assumption is incorrect, though, because parsing a PDF from the front and determining the PDF objects on the run is not the recommended way of parsing a PDF.
While ISO 32000-1 is a bit vague here and merely says
Conforming readers should read a PDF file from its end.
(ISO 32000-1, section 7.5.5 File Trailer)
ISO 32000-2 clearly specifies:
With the exception of linearized PDF files, all PDF files should be read using the trailer and cross-reference table as described in the following subclauses. Reading a non-linearized file in a serial manner is not reliable because of the way objects are to be processed after an incremental update. (See 6.3.2, "Conformance of PDF processors".)
(ISO 32000-2, section 7.5 File structure)
Thus, in case of your PDF excerpt, a PDF processor trying to read object 5 0
looks up object 5 0 in the cross references and gets its offset in the file,
goes to that offset and starts reading the object, first parsing the stream dictionary,
at the stream keyword recognizes that the object is a stream and retrieves its Length value which happens to be an indirect reference to 6 0,
looks up object 6 0 in the cross references and gets its offset in the file,
goes to that offset and reads the object, the number 30,
reads the stream content of the stream object 5 0 knowing its length is 30.
An approach as yours is explicitly considered "not reliable".
I think allowing Length to be an indirect object that can only be resolved after the stream is a design defect.
If there were no cross references, you'd be correct. That also is why the FDF format (which does not have mandatory cross references) specifies:
FDF is based on PDF; it uses the same syntax and has essentially the same file structure (7.5, "File structure"). However, it differs from PDF in the following ways:
[...]
The length of a stream shall not be specified by an indirect object.
(ISO 32000-2, section 12.7.8 Forms data format)
Concerning the comments:
So I'm correct that PDF cannot be parsed sequentially,
While the very original design of PDF probably was meant for sequential parsing, it has been further developed with only access via cross references in mind. PDF simply is not meant to be parsed sequentially anymore. And that was already the case when I started dealing with PDFs in the late 90s.
and the only reason is that the required length of binary streams may be defined after the stream.
That's by far not the only reason, there are more situations requiring a cross reference lookup to parse correctly.
As #mkl indicated, a parser has to read somewhere before the end of the PDF file to get startxref, hoping that it does not start parsing in the middle of a binary stream.
That's not correct. The PDF must end with "%%EOF" plus optionally an end-of-line. Before that there must be an end-of-line, before that a number, before that an end-of-line, before that startxref.
This is already expressed clearly in ISO 32000-1:
The last line of the file shall contain only the end-of-file marker, %%EOF. The two preceding lines shall contain, one per line and in order, the keyword startxref and the byte offset in the decoded stream from the beginning of the file to the beginning of the xref keyword in the last cross-reference section.
(ISO 32000-1, section 7.5.5 File Trailer)
Thus, no danger of being "in the middle of a binary stream" if the PDF is valid.
The other thing I dislike about the format of PDF is this: When developing a parser, you usually create test files with some elements you are working on. This approach seems to work with everything but streams. The absolute file positions of syntax elements and the requirement for multiple random accesses makes this task harder.
You seem to be subject to the misconception that the PDF format is a tagged text format like HTML. This is not the case. Even though numerous syntactical elements are defined using some ASCII keyword and there are "lines", PDF is a binary format, the cross reference tables are not a gimmick but the central access hub to the objects, and optimization for random access is done by design.

How to get MIMETYPE of local file?

I download some files, such as word, ppt, excel. But i don't know their MIMETYPE and suffix. Is there some way can get MIMETYPE of these file?
If you don't know the suffix, you're forced to look at the file contents. Typically this starts with looking for magic bytes, the first few bytes of the file. You can often qualify the type of file on that basis (though you obviously can't be sure unless you validate the whole file).
For modern Office documents, they should conform to OOXML and the first two bytes should be 0x50 0x4b (i.e. "PK") the indicator of a zip file.
You can then uncompress it (e.g. with ZipArchive).
You can then either parse the docProps/app.xml or see Office Open XML site with links at the top of the page for how to parse word processing, excel, and presentations, respectively.

file encoding on a mac, charset=binary

I typed in
file -I*
to look at all the encoding of all the CSV files in an entire directory. A lot of the file encodings are charset=binary. I'm not too familiar with this encoding format.
Does anyone know how to handle this encoding?
Thanks a lot for your time.
"Binary" encoding pretty much means that the encoding is unknown.
Everything is binary data under the hood. In text files each byte, or sequence of bytes, represents a specific character, and which character in particular depends on the encoding the file was encoded with/you're interpreting the file with. Some encodings are unambiguously recognisable, others aren't (e.g. any file is valid in any single-byte encoding, you can't easily distinguish one single-byte encoding from another). What file is telling you with charset=binary is that it doesn't have any more specific information than that the file contains bits and bytes (Capt'n Obvious to the rescue). It's up to you to interpret the file in the correct encoding/interpret it as the correct file format.

Parse Google Protocol Buffers datagram without .proto file?

is it possible to parse an incoming google protocol buffers datagram without any .proto file? I merely now its been serialized using protocol buffers but have no idea about the IDL file.
I'm looking for a way to just iterate through any value by some sort of reflection? Is this possible?
Thank you!
protoc --decode_raw < my_file
You need to take the following things into account when inspecting the output:
None of the field names are visible, just the tag numbers.
All varint-fields are shown as integers. This is ok for most types, but sint* will appear in the "zigzagged" format.
Doubles and floats will be shown as hex.
Bytes, string fields and submessages all appear the same, i.e. just a bunch of bytes.
If you want to decode the messages programmatically, you can write your own .proto file after you have figured out what the fields mean using the above method.

Erlang, reading a file with character offset

I have code to find a specific occurance of text in a file and give me an offset so I know where this occurance end. Now I want to read the file from that offset to the end of the file. The file contains binary data as well as text. How do I do this in Erlang?
Use pread. (See Erlang documentation on the file module). You have to take care of any character encoding yourself as the function deals with only bytes.

Resources