I have a simple point to point UDP WiFi simulation in NS-3 that outputs data to a trace file. It provides lots of useful data but there is no information that gives a unique ID for each packet. I can't find anything in wireshark either when I open the pcap files.
I have output the results of my simulation to an ascii trace file and pcap files for both nodes but I can't find any packet identifier. I can see the sequence numbers of the packets but that's it.
I am new to NS-3 so I am not sure how to produce this information.
Here is some of the output from the trace file.
t 2.00082 /NodeList/0/DeviceList/0/$ns3::WifiNetDevice/Phy/State/Tx ns3::WifiMacHeader (DATA ToDS=0, FromDS=0, MoreFrag=0, Retry=0, MoreData=0 Duration/ID=0usDA=ff:ff:ff:ff:ff:ff, SA=00:00:00:00:00:01, BSSID=00:00:00:00:00:01, FragNumber=0, SeqNumber=0) ns3::LlcSnapHeader (type 0x806) ns3::ArpHeader (request source mac: 00-06-00:00:00:00:00:01 source ipv4: 10.1.1.1 dest ipv4: 10.1.1.2) ns3::WifiMacTrailer ()
Any suggestions are appreciated.
Thanks.
In case you might not be aware of this already, let me first point out what might seem to be the obvious but: "there is no such thing as unique packet id in real networks" and since pcap traces are designed to contain dumps of real packets in real networks, there is zero chance you will be able to find a unique packet id in a pcap trace generated by ns-3.
Now, ns-3 does contain a per-packet unique id that is available with the Packet::GetId method and you can trivially change the source code of the function that generates your ascii dump to add this in src/wifi/helper/yans-wifi-helper.cc. Grep for "Ascii".
Now if you want to know why it does not do this by default because it is so useful, I honestly can't remember but:
there is probably something related to the ns2 trace format that inspired this ascii format. Compatibility with existing tools might have been an issue
adding a packet id goes against the ns-3 philosophy of matching what real networks do
Related
I have a huge pcap file. I want to know facebook usage in terms of data transfered (upload, download). For that, I am using wireshark to read this file. From a question on stackoverflow , there are many fields that can be used to find bytes.
frame.len==243
ip.len=229
udp.length==209
data.len=201
Now, I have test frame.len and ip.len both gives different results. What I should consider correct ? I am a newbie in networks terminology and I have to just find correct data transfered.
What happens, when you connects to server and requests some simple page:
Server application generates requested data (e.g. <body>Hello world</body> string) and passes it to HTTP layer
HTTP layer generates necessary header according to RFC (specifies HTTP version, status code, content type etc), prepends it to generated data and pass everything to TCP layer
TCP layer may break data into more than one pieces (not our case, message is already too small) and prepend necessary info for transport layer to each piece (src/dst port number, sequence number, some flags, checksum etc), then passes it to IP level
IP layer prepends necessary info for routing (source/dest addresses, TTL and other stuff), then passes it to lower layer (e.g. Ethernet)
Ethernet adds its part (MAC addresses, maybe VLAN tags) and pushes all to physical device
Resulted data is sent byte-by-byte from server's NIC to network
So your question is actually up to you. What do you want to measure? Is it "data, which I need to display excluding all auxiliary info"? Or is it "all number of bytes I need to send/receive for getting this lovely cat picture"? Here is a list of fields to get size of each part:
To get data lenght only (as string, unfortunately): http.content_length_header == "606"
To get (data + HTTP header) length: tcp.len == 973
To get (data + HTTP + TCP + IP layers): ip.len=1013
To get every byte sent: frame.len == 1027
If you want to measure bandwidth occupation, use frame.len. If you're interested in "pure site weight", it should be independent from environment, so use http.content_length_header. Things might become more complicated on high level considering the following:
Era of HTTPS means you can't easily observe HTTP content in traces, so tcp.len might be the highest option
Some data (e.g. audio, video) is transferred over different protocol stack (e.g. IP - UDP - RTP)
Well, even Embarcadero states that it is not guaranteed to return accurate result of the bytes ready to read in the socket buffer, but if you look at it, when you place -1 at Socket.ReceiveBuf (this is what ReceiveLength wraps) it calls ioctlsocket with FIONREAD to determine the amount of data pending in the network's input buffer that can be read from socket s.
so, how is it not safe or bad ?
e.g: ioctlsocket(Socket.SocketHandle, FIONREAD, Longint(i));
The documentation you mention specifically says (emphasis mine)
Note: ReceiveLength is not guaranteed to be accurate for streaming socket connections.
This means that the length is not known ahead of time because it's being supplied by a stream of data. Obviously, if you don't know how big the data is that's being sent ahead of time, you can't properly set the length the client should expect.
Consider it like generic code to copy a file. If you don't know ahead of time how big the file is you'll be copying, you can't predict how many bytes you'll be copying. In the case of the socket, the stream size that's supplying the socket isn't known in advance (for instance, for data being generated real-time and sent), so there's no way to inform the client socket how much to expect.
I tried using PARSE on a PORT! and it does not work:
>> parse open %test-data.r [to end]
** Script error: parse does not allow port! for its input argument
Of course, it works if you read the data in:
>> parse read open %test-data.r [to end]
== true
...but it seems it would be useful to be able to use PARSE on large files without first loading them into memory.
Is there a reason why PARSE couldn't work on a PORT! ... or is it merely not implemented yet?
the easy answer is no we can't...
The way parse works, it may need to roll-back to a prior part of the input string, which might in fact be the head of the complete input, when it meets the last character of the stream.
ports copy their data to a string buffer as they get their input from a port, so in fact, there is never any "prior" string for parse to roll-back to. its like quantum physics... just looking at it, its not there anymore.
But as you know in rebol... no isn't an answer. ;-)
This being said, there is a way to parse data from a port as its being grabbed, but its a bit more work.
what you do is use a buffer, and
APPEND buffer COPY/part connection amount
Depending on your data, amount could be 1 byte or 1kb, use what makes sense.
Once the new input is added to your buffer, parse it and add logic to know if you matched part of that buffer.
If something positively matched, you remove/part what matched from the buffer, and continue parsing until nothing parses.
you then repeat above until you reach the end of input.
I've used this in a real-time EDI tcp server which has an "always on" tcp port in order to break up a (potentially) continuous stream of input data, which actually piggy-backs messages end to end.
details
The best way to setup this system is to use /no-wait and loop until the port closes (you receive none instead of "").
Also make sure you have a way of checking for data integrity problems (like a skipped byte, or erroneous message) when you are parsing, otherwise, you will never reach the end.
In my system, when the buffer was beyond a specific size, I tried an alternate rule which skipped bytes until a pattern might be found further down the stream. If one was found, an error was logged, the partial message stored and a alert raised for sysadmin to sort out the message.
HTH !
I think that Maxim's answer is good enough. At this moment the parse on port is not implemented. I don't think it's impossible to implement it later, but we must solve other issues first.
Also as Maxim says, you can do it even now, but it very depends what exactly you want to do.
You can parse large files without need to read them completely to the memory, for sure. It's always good to know, what you expect to parse. For example all large files, like files for music and video, are divided into chunks, so you can just use copy|seek to get these chunks and parse them.
Or if you want to get just titles of multiple web pages, you can just read, let's say, first 1024 bytes and look for the title tag here, if it fails, read more bytes and try it again...
That's exactly what must be done to allow parse on port natively anyway.
And feel free to add a WISH in the CureCode database: http://curecode.org/rebol3/
I am about to write a message protocol going over a TCP stream. The receiver needs to know where the message boundaries are.
I can either send 1) fixed length messages, 2) size fields so the receiver knows how big the message is, or 3) a unique message terminator (I guess this can't be used anywhere else in the message).
I won't use #1 for efficiency reasons.
I like #2 but is it possible for the stream to get out of sync?
I don't like idea #3 because it means receiver can't know the size of the message ahead of time and also requires that the terminator doesn't appear elsewhere in the message.
With #2, if it's possible to get out of sync, can I add a terminator or am I guaranteed to never get out of sync as long as the sender program is correct in what it sends? Is it necessary to do #2 AND #3?
Please let me know.
Thanks,
jbu
You are using TCP, the packet delivery is reliable. So the connection either drops, timeouts or you will read the whole message.
So option #2 is ok.
I agree with sigjuice.
If you have a size field, it's not necessary to add and end-of-message delimiter --
however, it's a good idea.
Having both makes things much more robust and easier to debug.
Consider using the standard netstring format, which includes both a size field and also a end-of-string character.
Because it has a size field, it's OK for the end-of-string character to be used inside the message.
If you are developing both the transmit and receive code from scratch, it wouldn't hurt to use both length headers and delimiters. This would provide robustness and error detection. Consider the case where you just use #2. If you write a length field of N to the TCP stream, but end up sending a message which is of a size different from N, the receiving end wouldn't know any better and end up confused.
If you use both #2 and #3, while not foolproof, the receiver can have a greater degree of confidence that it received the message correctly if it encounters the delimiter after consuming N bytes from the TCP stream. You can also safely use the delimiter inside your message.
Take a look at HTTP Chunked Transfer Coding for a real world example of using both #2 and #3.
Depending on the level at which you're working, #2 may actually not have an issues with going out of sync (TCP has sequence numbering in the packets, and does reassemble the stream in correct order for you if it arrives out of order).
Thus, #2 is probably your best bet. In addition, knowing the message size early on in the transmission will make it easier to allocate memory on the receiving end.
Interesting there is no clear answer here. #2 is generally safe over TCP, and is done "in the real world" quite often. This is because TCP guarantees that all data arrives both uncorrupted* and in the order that it was sent.
*Unless corrupted in such a way that the TCP checksum still passes.
Answering to old message since there is stuff to correnct:
Unlike many answers here claim, TCP does not guarantee data to arrive uncorrupted. Not even practically.
TCP protocol has a 2-byte crc-checksum that obviously has a 1:65536 chance of collision if more than one bit flips. This is such a small chance it will never be encountered in tests, but if you are developing something that either transmits large amounts of data and/or is used by very many end users, that dice gets thrown trillions of times (not kidding, youtube throws it about 30 times a second per user.)
Option 2: size field is the only practical option for the reasons you yourself listed. Fixed length messages would be wasteful, and delimiter marks necessitate running the entire payload through some sort of encoding-decoding stage to replace at least three different symbols: start-symbol, end-symbol, and the replacement-symbol that signals replacement has occurred.
In addition to this one will most likely want to use some sort of error checking with a serious checksum. Probably implemented in tandem with the encryption protocol as a message validity check.
As to the possibility of getting out of sync:
This is possible per message, but has a remedy.
A useful scheme is to start each message with a header. This header can be quite short (<30 bytes) and contain the message payload length, eventual correct checksum of the payload, and a checksum for that first portion of the header itself. Messages will also have a maximum length. Such a short header can also be delimited with known symbols.
Now the receiving end will always be in one of two states:
Waiting for new message header to arrive
Receiving more data to an ongoing message, whose length and checksum are known.
This way the receiver will in any situation get out of sync for at most the maximum length of one message. (Assuming there was a corrupted header with corruption in message length field)
With this scheme all messages arrive as discrete payloads, the receiver cannot get stuck forever even with maliciously corrupted data in between, the length of arriving payloads is know in advance, and a successfully transmitted payload has been verified by an additional longer checksum, and that checksum itself has been verified. The overhead for all this can be a mere 26 byte header containing three 64-bit fields, and two delimiting symbols.
(The header does not require replacement-encoding since it is expected only in a state whout ongoing message, and the entire 26 bytes can be processed at once)
There is a fourth alternative: a self-describing protocol such as XML.
Is there a way to capture only the data layer and disregard the upper layers in wireshark? If not, is there a different packet dump utility that can do this? PREFERABLY 1 file per packet!
What I am looking for: A utility that dumps only the data (the payload) layer to a file.
This is programming related...! What I really want to do is to compare all of the datagrams in order to start to understand a third party encoding/protocol. Ideally, and what would be great, would be a hex compare utility that compares multiple files!
You should try right-clicking on a packet and select "Follow TCP Stream". Then you can save the TCP communication into a raw file for further processing. This way you won't get all the TCP/IP protocoll junk.
There is a function to limit capture size in Wireshark, but it seems that 68bytes is the smallest value. There are options to starting new files after a certain number of kilo, mega, gigabytes, but again the smallest is 1-kilobyte, so probably not useful.
I would suggest looking at the pcap library and rolling your own. I've done this in the past using the PERL Net::Pcap library, but it could easily be done it other languages too.
If you have Unix/Linux available you might also look into tcpdump. You can limit amount of data captured with -s. For example "-s 14" would typically get you the Ethernet header, which I assume is what you mean by the datalink layer. There are also options for controlling how often files are created by specifying file size with -C. So theoretically if you set the file size to the capture size, you'll get one file per packet.
Using tshark I was able to print data only, by decoding as telnet and printing field telnet.data
tshark -r file.pcap -d tcp.port==80,telnet -T fields -e telnet.data
GET /test.js HTTP/1.1\x0d\x0a,User-Agent: curl/7.35.0\x0d\x0a,Host: 127.0.0.1\x0d\x0a,Accept: */*\x0d\x0a,\x0d\x0a
HTTP/1.1 404 Not Found\x0d\x0a,Server: nginx/1.4.6 (Ubuntu)\x0d\x0a,Date: Fri, 15 Jan 2016 11:32:58 GMT\x0d\x0a,Content-Type: text/html\x0d\x0a,Content-Length: 177\x0d\x0a,Connection: keep-alive\x0d\x0a,\x0d\x0a,<html>\x0d\x0a,<head><title>404 Not Found</title></head>\x0d\x0a,<body bgcolor=\"white\">\x0d\x0a,<center><h1>404 Not Found</h1></center>\x0d\x0a,<hr><center>nginx/1.4.6 (Ubuntu)</center>\x0d\x0a,</body>\x0d\x0a,</html>\x0d\x0a
Not perfect but it was good enough for what I needed, I hope it helps some one.