Writing a Wireshark dissector to count number of TCP flows - wireshark

I have a very large tcpdump file that I split into 1 minute intervals. I am able to use tshark to extract TCP statistics for each of the 1 minute files using a loop code and save the results as a CSV file so I can perform further analysis in Excel. Now I want to be able to count the number of TCP flows in each 1 minute file for all the 1 minute files and save the data in a CSV file. A TCP flow here represents group of packets going from a specific source to a specific destination. Each flow has statistics such as source IP, dest IP, #pcakets from A->B, #bytes from A->B, #packets from B->A, #bytes from B->A, total packets, total bytes, etc. And I just want to count the number of TCP flows in each of the 1 minute files. From what I’ve read so far, it seems I need to create a dissector to do that. Can anyone give me pointers or code on how to get started? Thanks.

Tshark has a command to dump all of the necessary information: tshark -qz conv,tcp -r FILE. This writes one line per flow (plus a header and footer) so to count the flows just count the lines and subtract the header/footer.

Not a dissector, but a tap. See the Wireshark README.tapping document, and see the TShark iousers tap for a, sadly, not at all simple example in C.
It's also possible to write taps in Lua; see, for example, the Lua/Taps page in the Wireshark Wiki and the Lua Support in Wireshark section of the Wireshark User's Manual.
The C structure passed to TCP taps for each packet is:
/* the tcp header structure, passed to tap listeners */
typedef struct tcpheader {
guint32 th_seq;
guint32 th_ack;
gboolean th_have_seglen; /* TRUE if th_seglen is valid */
guint32 th_seglen;
guint32 th_win; /* make it 32 bits so we can handle some scaling */
guint16 th_sport;
guint16 th_dport;
guint8 th_hlen;
guint16 th_flags;
guint32 th_stream; /* this stream index field is included to help differentiate when address/port pairs are reused */
address ip_src;
address ip_dst;
/* This is the absolute maximum we could find in TCP options (RFC2018, section 3) */
#define MAX_TCP_SACK_RANGES 4
guint8 num_sack_ranges;
guint32 sack_left_edge[MAX_TCP_SACK_RANGES];
guint32 sack_right_edge[MAX_TCP_SACK_RANGES];
} tcp_info_t;
So, for C-language taps, the "data" argument to the tap listener's "packet" routine points to a structure of that sort.
For Lua taps, the "tapinfo" table passed as the third argument to the tap listener's "packet" routine is described as "a table of info based on the Listener's type, or nil.". For a TCP tap, the entries in the table include all the fields in that structure except for sack_left_edge and sack_right_edge; the keys in the table are the structure member names.
The th_stream field identifies the connection; each time the TCP dissector finds a new connection, it assigns a new value. As the comment indicates, "this stream index field is included to help differentiate when address/port pairs are reused", so that if a given connection is closed, and a later connection uses the same endpoints, the two connections have different th_stream values even though they have the same endpoints.
So you'd have a table using the th_stream value as a key. The table would store the endpoints (addresses and ports) and counts of packets and bytes in each direction. For each packet passed to the listener's "packet" routine, you'd look up the th_stream value in the table and, if you don't find it, you'd create a new entry, starting the counts off at zero, and use that new entry; otherwise, you'd use the entry you found. You'd then figure out whether the packet was going from A to B or B to A, and increase the appropriate packet count and byte count.
You'd also keep track of the time stamp. For the first packet, you'd store the time stamp for that packet. For each packet, you'd look at the time stamp and, if it's one minute or more later than the stored time stamp, you'd:
dump out the statistics from the table of connections;
empty out the table of connections;
store the new packet's time stamp, replacing the previous stored time stamp.

Related

Does SCTP really prevent head-of-line blocking?

I've known about SCTP for a decade or so, and although I never got to use it yet, I've always wanted to, because of some of its promising (purported) features:
multi-homing
multiplexing w/o head-of-line blocking
mixed order/unordered delivery on the same connection (aka association)
no TIME_WAIT
no SYN flooding
A Comparison between QUIC and SCTP however claims
SCTP intended to get rid of HOL-Blocking by substreams, but its
Transmission Sequence Number (TSN) couples together the transmission
of all data chunks. [...] As a result, in SCTP if a packet is lost,
all the packets with TSN after this lost packet cannot be received
until it is retransmitted.
That statement surprised me because:
removing head-of-line blocking is a stated goal of SCTP
SCTP does have a per-stream sequence number, see below quote from RFC 4960, which should allow processing per stream, regardless of the association-global TSN
SCTP has been in use in the telecommunications sector for perhaps close to 2 decades, so how could this have been missed?
Internally, SCTP assigns a Stream Sequence Number to each message
passed to it by the SCTP user. On the receiving side, SCTP ensures
that messages are delivered to the SCTP user in sequence within a
given stream. However, while one stream may be blocked waiting for
the next in-sequence user message, delivery from other streams may
proceed.
Also, there is a paper Head-of-line Blocking in TCP and SCTP: Analysis and Measurements that actually measures round-trip time of a multiplexed echo service in the face of package loss and concludes:
Our results reveal that [..] a small number of SCTP streams or SCTP unordered mode can avoid this head-of-line blocking. The alternative solution of multiple TCP connections performs worse in most cases.
The answer is not very scholarly, but at least according to the specification in RFC 4960, SCTP seems capable of circumventing head-of-line blocking. The relevant claim seems to be in Section 7.1.
Note: TCP guarantees in-sequence delivery of data to its upper-layer protocol within a single TCP session. This means that when TCP notices a gap in the received sequence number, it waits until the gap is filled before delivering the data that was received with sequence numbers higher than that of the missing data. On the other hand, SCTP can deliver data to its upper-layer protocol even if there is a gap in TSN if the Stream Sequence Numbers are in sequence for a particular stream (i.e., the missing DATA chunks are for a different stream) or if unordered delivery is indicated. Although this does not affect cwnd, it might affect rwnd calculation.
A dilemma is what does "are in sequence for a particular stream" entail? There is some stipulation about delaying delivery to the upper layer until packages are reordered (see Section 6.6, below), but reordering doesn't seem to be conditioned by filling the gaps at the level of the association. Also note the mention in Section 6.2 on the complex distinction between ACK and delivery to the ULP (Upper Layer Protocol).
Whether other stipulations of the RFC indirectly result in the occurence of HOL, and whether it is effective in real-life implementations and situations - these questions warrant further investigation.
Below are some of the excerpts which I've come across in the RFC and which may be relevant.
RFC 4960, Section 6.2 Acknowledgement on Reception of DATA Chunks
When the receiver's advertised window is 0, the receiver MUST drop any new incoming DATA chunk with a TSN larger than the largest TSN received so far. If the new incoming DATA chunk holds a TSN value less than the largest TSN received so far, then the receiver SHOULD drop the largest TSN held for reordering and accept the new incoming DATA chunk. In either case, if such a DATA chunk is dropped, the receiver MUST immediately send back a SACK with the current receive window showing only DATA chunks received and accepted so far. The dropped DATA chunk(s) MUST NOT be included in the SACK, as they were not accepted.
Under certain circumstances, the data receiver may need to drop DATA chunks that it has received but hasn't released from its receive buffers (i.e., delivered to the ULP). These DATA chunks may have been acked in Gap Ack Blocks. For example, the data receiver may be holding data in its receive buffers while reassembling a fragmented user message from its peer when it runs out of receive buffer space. It may drop these DATA chunks even though it has acknowledged them in Gap Ack Blocks. If a data receiver drops DATA chunks, it MUST NOT include them in Gap Ack Blocks in subsequent SACKs until they are received again via retransmission. In addition, the endpoint should take into account the dropped data when calculating its a_rwnd.
Circumstances which highlight how senders may receive acknowledgement for chunks which are ultimately not delivered to the ULP (Upper Layer Protocol).Note this applies to chunks with TSN higher than the Cumulative TSN (i.e. from Gap Ack Blocks). This together with unreliability of SACK order represent good reasons for the stipulation in Section 7.1 (see below).
RFC 4960, Section 6.6 Ordered and Unordered Delivery
Within a stream, an endpoint MUST deliver DATA chunks received with the U flag set to 0 to the upper layer according to the order of their Stream Sequence Number. If DATA chunks arrive out of order of their Stream Sequence Number, the endpoint MUST hold the received DATA chunks from delivery to the ULP until they are reordered.
This is the only stipulation on ordered delivery within a stream in this section; seemingly, reordering does not depend on filling the gaps in ACK-ed chunks.
RFC 4960, Section 7.1 SCTP Differences from TCP Congestion Control
Gap Ack Blocks in the SCTP SACK carry the same semantic meaning as the TCP SACK. TCP considers the information carried in the SACK as advisory information only. SCTP considers the information carried in the Gap Ack Blocks in the SACK chunk as advisory. In SCTP, any DATA chunk that has been acknowledged by SACK, including DATA that arrived at the receiving end out of order, is not considered fully delivered until the Cumulative TSN Ack Point passes the TSN of the DATA chunk (i.e., the DATA chunk has been acknowledged by the Cumulative TSN Ack field in the SACK).
This is stated from the perspective of the sending endpoint, and is accurate for the reason emphasized in section 6.6 above.
Note: TCP guarantees in-sequence delivery of data to its upper-layer protocol within a single TCP session. This means that when TCP notices a gap in the received sequence number, it waits until the gap is filled before delivering the data that was received with sequence numbers higher than that of the missing data. On the other hand, SCTP can deliver data to its upper-layer protocol even if there is a gap in TSN if the Stream Sequence Numbers are in sequence for a particular stream (i.e., the missing DATA chunks are for a different stream) or if unordered delivery is indicated. Although this does not affect cwnd, it might affect rwnd calculation.
This seems to be the core answer to what interests you.
In support of this argument, the format of the SCTP SACK chunk as exposed here and here.

Counting packets in Wireshark

Is it possible to re-do numbering in Wireshark. For example i have filtered packets to one side:
So the numbers are (they are not in order because of filtering):
416,419,420,423,424,426,427.
But i would like to number them like this, line by line:
1,2,3,4,5,6,7
The reason is that it would be easier to count all the packets. I know tshark has statistical operation COUNT, but for quick counting this would be a lot better.
You can export the displayed packets into a new file via File -> Export Specified Packets... -> All packets: Displayed. The new capture file will contain sequentially numbered packets starting from 1.
But if you just want to know how many displayed packets there are, you could just look at the Wireshark status line where it will indicate the number of displayed packets.
Statistics -> Capture File Properties will also tell you the number of displayed packets.

Writing a Wireshark Lua/Tap to Count the Number of TCP Flows

I previously posted a question entitled "Writing a Wireshark Dissector to Count Number of TCP Flows." I got some feedback to use Lua/Tap instead so I set out to write one but I need assistance with the code. I currently have the following functions that a tap must have:
Listener.new,
listener.packet,
listener.draw,
listener.reset.
To get a better understanding of what I want to do, please review my previous question here:
Writing a Wireshark dissector to count number of TCP flows
My new question is, would I need to write a code to do the equivalent of the tshark's command:
tshark -r 1min.pcap -q -n -z conv,tcp
in Lua/Tap to extract the statistics information first before I proceed to write code to count the TCP flows? Or all I need to do is write a code in Lua/Tap to to extract the TCP flow count. In either case, can someone help me with the code? I've search the web but can't find an example close to what I'm looking for so I can customize to suit what I'm trying to achieve. Thanks.
I don't have time to write code for you, but here's some information copied from an edit I made to the answer to your other question:
The C structure passed to TCP taps for each packet is:
/* the tcp header structure, passed to tap listeners */
typedef struct tcpheader {
guint32 th_seq;
guint32 th_ack;
gboolean th_have_seglen; /* TRUE if th_seglen is valid */
guint32 th_seglen;
guint32 th_win; /* make it 32 bits so we can handle some scaling */
guint16 th_sport;
guint16 th_dport;
guint8 th_hlen;
guint16 th_flags;
guint32 th_stream; /* this stream index field is included to help differentiate when address/port pairs are reused */
address ip_src;
address ip_dst;
/* This is the absolute maximum we could find in TCP options (RFC2018, section 3) */
#define MAX_TCP_SACK_RANGES 4
guint8 num_sack_ranges;
guint32 sack_left_edge[MAX_TCP_SACK_RANGES];
guint32 sack_right_edge[MAX_TCP_SACK_RANGES];
} tcp_info_t;
So, for C-language taps, the "data" argument to the tap listener's "packet" routine points to a structure of that sort.
For Lua taps, the "tapinfo" table passed as the third argument to the tap listener's "packet" routine is described as "a table of info based on the Listener's type, or nil.". For a TCP tap, the entries in the table include all the fields in that structure except for sack_left_edge and sack_right_edge; the keys in the table are the structure member names.
The th_stream field identifies the connection; each time the TCP dissector finds a new connection, it assigns a new value. As the comment indicates, "this stream index field is included to help differentiate when address/port pairs are reused", so that if a given connection is closed, and a later connection uses the same endpoints, the two connections have different th_stream values even though they have the same endpoints.
So you'd have a table using the th_stream value as a key. The table would store the endpoints (addresses and ports) and counts of packets and bytes in each direction. For each packet passed to the listener's "packet" routine, you'd look up the th_stream value in the table and, if you don't find it, you'd create a new entry, starting the counts off at zero, and use that new entry; otherwise, you'd use the entry you found. You'd then figure out whether the packet was going from A to B or B to A, and increase the appropriate packet count and byte count.
You'd also keep track of the time stamp. For the first packet, you'd store the time stamp for that packet. For each packet, you'd look at the time stamp and, if it's one minute or more later than the stored time stamp, you'd:
dump out the statistics from the table of connections;
empty out the table of connections;
store the new packet's time stamp, replacing the previous stored time stamp.

How access other layers field in Lua tap listener or tap higher layer?

Going to do some statistics operation on a trace with Lua. Each IP packet can have multiple TCAP and each TCAP may have multiple CAP operation, like
IP {[SCTP-M3UA-SCCP-TCAP-CAP,CAP] [SCTP-M3UA-SCCP-TCAP-CAP,CAP,CAP]}
Now I want to access the whole tree or iterate somehow in TCAP layer in Lua listener tap. The purpose of this kind of iteration is that something like follow TCP stream because the transaction ID is kept in TCAP layer while operation and parameters in sequence of Camel (CAP) layer should be considered.
How can I access the dissector tree in listener tap or dissect upper layer if get the lower layer data part ?
For example the node ID come in first operation of 1 new session in highest layer(CAP) along with another sessions in same packet. Then another parameter that needed to be counted comes in another operation/packet, while same TID in TCAP to be checked to be sure on it belongs to same node.
It is not a reply to your question, only tip.
You should use array for fields extractor instead of plain value.
For example:
tap_diameter = nil
diaSessionIdExtr = Field.new("diameter.Session-Id")
tap_diameter = Listener.new("frame", "diameter && !tcp.analysis.retransmission && !tcp.analysis.lost_segment")
function tap_diameter.packet(pinfo,tvb,userdata)
local answers = {diaSessionIdExtr()} -- this is how to do it
for i in pairs(answers) do
debug(answers[i])
end
end

Wireshark dissect function

When writing a dissector in Wireshark, is the dissect function in the dissector's source called on each packet in order, only once?
What could be possible reasons for tree values changing as I click on packets multiple times?
It is called once when the packet is first to display the high level information.
if (check_col(pinfo->cinfo, COL_PROTOCOL))
or
if (check_col(pinfo->cinfo,COL_INFO))
And called again when showing the body, ie when you click on that one packet.
if (tree)
I'd assume that the second call results are discarded, as if you have a large number of packets to decode keeping the details for each would be too large an overhead.
But as always some quick testing would be able to show if this is the case. (via a static counter)

Resources