In a system I have a custom protocol and I would like to implement a Wireshark dissector so that I can use Wireshark to analyze the communication.
Objects are sent over the protocol, let us call them "Messages". Each message can be large, maybe up to 100 MB, they can also be very small for example 50 byte.
Each Message is split up in chunks of around 1 kB and tagged with a sequence number and a guid message id and those can be used at the other end to reassemble the messages.
So far I have successfully made a dissector that will individually log all chunks to Wireshark but I want to take this further and also log all messages (chunks assembled into messages) in Wireshark. Can this be done and how? Is it maybe possible to implement a dissector on top of the dissector I have developed below?
If it is possible to implement a dissector on top of the one below, how can I make sure it will only analyze myproto PDUs? The dissector below triggers on a specific tcp port, but that is not going to work for the second phase dissector...
myproto_proto = Proto("myproto", "My Protocol")
function myproto_proto.dissector(buffer, pinfo, tree)
pinfo.cols.protocol = "myproto"
local message_length = buffer(0, 4):le_uint()+4
if message_length>pinfo.len then
pinfo.desegment_len = message_length
pinfo.desegment_offset = 0
return;
end
local subtree = tree:add(myproto_proto, buffer(), "My Protocol Data")
local packet = subtree:add(buffer(0, message_length), "Chunk")
packet:add(buffer(0, 4), "Packet length: " .. buffer(0, 4):le_uint())
packet:add(buffer(32, 16), "Message ID")
packet:add(buffer(48, 4), "Block ID: " .. buffer(48, 4):le_uint())
packet:add(buffer(52, 4), "Max Block ID: " .. buffer(52, 4):le_uint())
packet:add(buffer(68, message_length-68-20), "Data")
pinfo.desegment_len = 0
pinfo.desegment_offset = message_length
return
end
tcp_table = DissectorTable.get("tcp.port")
tcp_table:add(1234, myproto_proto)
Let's say you have created a second dissector msgproto. Since you don't seem to have any multiplexing between chunks and messages, you don't need to setup a dissector table. Instead, at the end of myproto_proto.dissector you do a
msgproto.dissector:call(buffer(68, message_length-68-20):tvb, pinfo, tree)
This will pass all the chunk data to your msgproto . In the message protocol dissector you can use the chunk protocol's fields and of course the tvb that will contain just the data of one chunk. You will now need to piece together the chunks to one gloriously huge tvb. Make the msgprotohave state:
local stateMap = {}
function msgproto.init()
stateMap = {}
end
Convert your the tvb into a ByteArray and store into the stateMap together with the arrays from the other calls to your dissector. When you have assembled all your data into one array, let's call it oarr, make a tvb out of it:
local otvb = ByteArray.tvb(oarr, "message data")
-- starting from 0, need to pass a tvb range, not a tvb.
stree:add(msgproto.fields.payload, otvb(0))
supposed you have payload field of type Protofield.bytes. This code will make a new data pane called "message data" appear next to your regular "Frame" pane at the bottom of your Wireshark window.
I'm not sure how well Wireshark will like your ultra large buffers. Apart from that, I'm quite convinced that the approach outlined above will work. I have not used the described techniques in this exact order, but I have used all the described techniques, for example to make a new byte pane filled with data deflated in pure Lua.
Related
I am writing a custom Wireshark Lua dissector. One field in the dissector is a UTF16 string. I tried to specify this field with
msg_f = ProtoField.string("mydissector.msg", "msg", base.UNICODE)
local getMsg = buffer(13) -- starting on byte 13
subtree:add_le(m.msg_f, getMsg)
However, this only adds the first character rather than the whole string. It also raises an Expert Info warning undecoded trailing/stray characters.
What is the correct way to parse a UTF16 string?
You haven't specified the range of bytes that comprises the string. This is typically determined by either an explicit length field or by a NULL-terminator. The exact method of determining the range is dependent upon the particular protocol and field in question.
An example of each type:
If there's a length field, say of 1 byte in length that precedes the string, then you can use something like:
local str_len = buffer(13, 1):le_uint()
subtree:add_le(m.msg_len_f, buffer(13))
if str_len > 0 then
subtree:add_le(m.msg_f, buffer(14, str_len))
end
And if the string is NULL-terminated, you can use something like:
local str = buffer(13):stringz()
local str_len = str:len()
subtree:add_le(m.msg_f, buffer(13, str_len + 1))
These are just pseudo-examples, so you'll need to apply whatever method, possibly none of these, to fit your data.
Refer to the Wireshark's Lua API Reference Manual for more details, or to the Wireshark LuaAPI wiki pages.
The solution I came up with is simply:
msg_f = ProtoField.string("mydissector.msg", "msg")
local getMsg = buffer(13) -- starting on byte 13
local msg = getMsg:le_ustring()
subtree:add(msg_f, getMsg, msg)
This question is based off a previous question I asked concerning a similar topic: Lua: Working with Bit32 Library to Change States of I/O's . I'm trying to use a Lua program that, when a PLC changes the state of a coil at a given address (only two addresses will be used) then it triggers a reaction in another piece of equipment. I have some code that is basically the exact same as my previous topic. But this has to do with what this code is actually doing and not so much the bit32 library. Usually I run code I don't in understand in my Linux IDE and slowly make changes until I finally can make sense of it. But this is producing some weird reactions that I can't make sense of.
Code example:
local unitId = 1
local funcCodes = {
readCoil = 1,
readInput = 2,
readHoldingReg = 3,
readInputReg = 4,
writeCoil = 5,
presetSingleReg = 6,
writeMultipleCoils = 15,
presetMultipleReg = 16
}
local function toTwoByte(value)
return string.char(value / 255, value % 255)
end
local coil = 1
local function readCoil(s, coil)
local req = toTwoByte(0) .. toTwoByte(0) .. toTwoByte(6) .. string.char(unitId, funcCodes.readCoil) .. toTwoByte(coil - 1) .. toTwoByte(1)
s:write(req) --(s is the address of the I/O module)
local res = s:read(10)
return res:byte(10) == 1 -- returns true or false if the 10th bit is ==1 I think??? Please confirm
end
The line that sets local req is the part I'm truly not making sense of. Because of my earlier post, I understand fully about the toTwoByte function and was quickly refreshed on bits & byte manipulation (truly excellent by the way). But that particular string is the reason for this confusion. If I run this in the demo at lua.org I get back an error "lua number has no integer representation". If I separate it into the following I am given back ascii characters that represent those numbers (which I know string.char returns the ascii representation of a given digit). If I run this in my Linux IDE, it displays a bunch of boxes, each containing four digits; two on top of the other two. Now it is very hard to distinguish all of the boxes and their content as they are overlapping.
I know that there is a modbus library that I may be able to use. But I would much rather prefer to understand this as I'm fairly new to programming in general.
Why do I receive different returned results from Windows vs Linux?
What would that string "local req" look like when built at this point to the I/O module. And I don't understand how this req variable translates into the proper string that contains all of the information used to read/write to a given coil or register.
If anyone needs better examples or has further questions that I need to answer, please let me know.
Cheers!
ETA: This is with the Modbus TCP/IP Protocol, not RTU. Sorry.
This is my first time working with Lua, but not with programming. I have experience in Java, Action Script, and HTML. I am trying to create an addon for Elder Scroll Online. I managed to find the ESO API at the following link:
http://wiki.esoui.com/API#Player_Escorting
I am trying to make a function that returns a count of how many items each guild member has deposited in the bank. The code I have so far is as follows
function members()
for i=0, GetNumGuildEvents(3, GUILD_EVENT_BANKITEM_ADDED)
do
GetGuildEventInfo(3, GUILD_EVENT_BANKITEM_ADDED, i)
end
I am having trouble referencing the character making the specific deposit. Once I am able to do that I foresee making a linked list storing character names and an integer/double counter for the number of items deposited. If anyone has an idea of how to reference the character for a given deposit it would be much appreciated.
I don't have the game to test and the API documentation is sparse, so what follows are educated guesses/tips/hints (I know Lua well and programmed WoW for years).
Lua supports multiple assignment and functions can return multiple values:
function foo()
return 1, "two", print
end
local a, b, c = foo()
c(a,b) -- output: 1, "two"
GetGuildEventInfo says it returns the following:
eventType, secsSinceEvent, param1, param2, param3, param4, param5
Given that this function applies to multiple guild event types, I would expect param1 through param5 are specific to the particular event you're querying. Just print them out and see what you get. If you have a print function available that works something like Lua's standard print function (i.e. accepts multiple arguments and prints them all), you can simple write:
print(GetGuildEventInfo(3,GUILD_EVENT_BANKITEM_ADDED,i))
To print all its return values.
If you don't have a print, you should write one. I see the function LogChatText which looks suspiciously like something that would write text to your chat window. If so, you can write a Lua-esque print function like this:
function print(...)
LogChatText(table.concat({...}, ' '))
end
If you find from your experimentation that, say, param1 is the name of the player making the deposit, you can write:
local eventType, secsSinceEvent, playerName = GetGuildEventInfo(3,GUILD_EVENT_BANKITEM_ADDED, i)
I foresee making a linked list storing character names and an integer/double counter for the number of items deposited.
You wouldn't want to do that with a linked list (not in Lua, Java nor ActionScript). Lua is practically built on hashtables (aka 'tables'), which in Lua are very powerful and generalized, capable of using any type as either key or value.
local playerEvents = {} -- this creates a table
playerEvents["The Dude"] = 0 -- this associates the string "The Dude" with the value 0
print(playerEvents["The Dude"]) -- retrieve the value associated with the string "The Dude"
playerEvents["The Dude"] = playerEvents["The Dude"] + 1 -- this adds 1 to whatever was previous associated with The Dude
If you index a table with a key which hasn't been written to, you'll get back nil. You can use this to determine if you've created an entry for a player yet.
We're going to pretend that param1 contains the player name. Fix this when you find out where it's actually located:
local itemsAdded = {}
function members()
for i=0, GetNumGuildEvents(3, GUILD_EVENT_BANKITEM_ADDED ) do
local eventType, secsSinceEvent, playerName = GetGuildEventInfo(3, GUILD_EVENT_BANKITEM_ADDED, i)
itemsAdded[playerName] = (itemsAdded[playerName] or 0) + 1
end
end
itemsAdded now contains the number of items added by each player. To print them out:
for name, count in pairs(itemsAdded) do
print(string.format("Player %s has added %d items to the bank.", name, count))
end
I noticed that Boost spirit offers some limits, in a question here on SO there is an user asking for help about boost spirit and the other user who gave the answer specified that boost spirit works well with statements and not with "generic text" ( I'm sorry if I don't recall it correctly ).
Now I would like to think about Postscript and PDF in terms of tokens and simplify my approach to this formats this way, the problem is that the PDF is kind of a mix between a markup language and a programming language with jumps and tables in it, and I can't think about something similar when considering the most popular file formats like XML, C++ code and others languages and formats.
There is also another fact: I can't really find people that had some kind of experience with boost::spirit wiriting a pdf parser or writer, so I'm asking, boost::spirit it's capable of parsing a PDF file and output the elements as tokens ?
Although this has nothing to do with Boost, let me assure you that the parsing of PDF (and PostScript) are about as trivial as you could want. Let's say that you have a scanner object that returns a series of tokens. The token types you will get from the scanner are:
String
Dict begin (<<)
Dict End (>>)
Name (/whatever)
Number
Hex array
Left Angle (<)
Right Angle (>)
Array begin ([)
Array end (])
Procedure begin ({)
Procedure end (})
Comment (%foo)
Word
My scanner is a finite-state automata with states for Start, Comment, String, HexArray, Token, DictEnd, and Done.
The way you parse PDF is not by parsing it, but by executing it. Given these tokens, my "parser" looks like this (in C#):
while (true) {
MLPdfToken = scanner.GetToken();
if (token == null)
return MachineExit.EndOfFile;
PdfObject obj = PdfObject.FromToken(token);
PdfProcedure proc = obj as PdfProcedure;
if (proc != null)
{
if (IsExecuting())
{
if (token.Type == PdfTokenType.RBrace)
proc.Execute(this);
else
Push(obj);
}
else {
proc.Execute(this);
}
if (proc.IsTerminal)
return Machine.ParseComplete;
}
else {
Push(obj);
}
}
I'll also add that if you give every PdfObject an Execute() method such that the base class implementation is machine.Push(this) and IsTerminal that returns false, the REPL gets easier:
while (true) {
MLPdfToken = scanner.GetToken();
if (token == null)
return MachineExit.EndOfFile;
PdfObject obj = PdfObject.FromToken(token);
if (IsExecuting())
{
if (token.Type == PdfTokenType.RBrace)
obj.Execute(this);
else
Push(obj);
}
else {
obj.Execute(this);
if (obj.IsTerminal)
return Machine.ParseComplete;
}
}
There's more support in Machine - Machine has a Stack of PdfObject and a few methods for accessing it (Push, Pop, Mark, CountToMark, Index, Dup, Swap), as well as ExecProcBegin and ExecProcEnd.
Beyond that, it's very light. The only thing that is slightly odd is that PdfObject.FromToken takes a token and if it is a primitive type (number, string, name, hex, bool) returns a corresponding PdfObject. Otherwise, it takes the given token and looks in a "proc set" dictionary of procedure names associated with PdfProcedure objects. So when you encounter the token << that gets looked up in a the proc set and comes up with this code:
void DictBegin(PdfMachine machine)
{
machine.Push(new PdfMark(PdfMarkType.Dictionary));
}
So << really means "mark the stack as the start of a dictionary. >> gets more interesting:
void DictEnd(PdfMachine machine)
{
PdfDict dict = new PdfDict();
// PopThroughMark pops the entire stack up to the first matching mark,
// throws an exception if it fails.
PdfObject[] arr = machine.PopThroughMark(PdfMarkType.Dictionary);
if ((arr.Length & 1) != 0)
throw new PdfException("dictionaries need an even number of objects.");
for (int i=0; i < arr.Length; i += 2)
{
PdfObject key = arr[i], val = arr[i + 1];
if (key.Type != PdfObjectType.Name)
throw new PdfException("dictionaries need a /name for the key.");
dict.put((PdfName)key, val);
}
machine.Push(dict);
}
So >> Pops up to the nearest dictionary mark into an array then puts each pair into the dictionary. Now, I could have done this without allocating the array. I could just pop pairs, putting them into the dictionary until I either hit the mark, fail to get a name or underflow the stack.
The important takeaway is that there really isn't any syntax in PDF, nor is there any in PostScript. At least not so much as you'd notice. The only real Syntax (and the read-eval-(push) loop shows it) is '}'.
So when you this is a PDF 14 0 obj << /Type /Annot /SubType /Square >> endobj what your really seeing is a series of procedures:
Push 14
Push 0
Execute obj (Pop two numbers and push a "definition" object).
Execute dictionary begin
Push /Type
Push /Annot
Push /SubType
Push /Square
Execute dictionary end
Execute endobj (pop the top object and then get (not pop) the next one. If the second is a definition, set its "value" to the first object, else throw).
Since "endobj" is terminal, parsing ends and the top of the stack is the result.
So when you are asked to look up object 14 in the PDF, the cross-reference table tells you where to seek to, you make a new Machine with the stream pointer at that location and run it. If the top of the stack is a "definition" object, you've succeeded.
About now you should be nodding but not trusting me, since you're thinking about PDF streams, which look like this:
<< [/key value]* >> stream ...raw data... endstream endobj
Again, there is no syntax. The proc stream looks at the top of the stack, which should be a PdfDict. If it is, it consumes characters until the next newline (scanner does this), stores the current file position in the stream as data start, reads the stream length from the dict (which may cause another Machine to get newed up), and skips past the end of stream and pushes the new stream object on the stack. endstream is a no-op. The only difference between a PdfDict and a PdfStream is that a PdfStream has a start position and a bool saying that it's a stream, otherwise I dual-purpose the object.
PostScript is almost identical except that the execution environment is a little more complex. For example, you need several stacks in your machine: a parameter stack, a dictionary stack, and an execution stack. From there, you more or less just bind your tokenizer into the set of primitive procedures as well as the word exec, and then most of your interpreter is written in PS itself.
If you're talking about boost, you're looking at C++, which means that you can't be as fast and loose with memory as I am, so you'll want to either use smart pointers or figure out where you scope is and be careful to dispose objects instead of blithely throwing them away, but that's just the normal C++ stuff.
Currently, I make PDF tools for my company in .NET, but in a former life I worked on Acrobat versions 1-4, and most of what I described is exactly what Acrobat did under the hood (well, more or less - it was C, not C++, but it's the same approach).
With respect to the xref table (or xref stream), you read that first - the spec tells you that if you jump to EOF and scan back, you find the start of the xref table. You parse that (which is a CS 101 assignment), parse the trailer, seek to the /Prev if any and repeat until no more /Prev entries. That gives you a complete xref for looking up objects.
As for writing - there are a number of approaches that you can take. The most obvious one is that when an object is meant to be referenced, you create a new reference object by assigning the newest available xref entry to it. Whenever objects refer to other objects for writing, they ask if these objects are referenced. If they are, they write the reference (ie, 14 0 R). When it comes time to write a referenced object, you get the current stream pointer and store it in the xref, then write <objnum> <generation> obj <object contents> endobj. For example, my code to write a dictionary looks like this:
public override ToStream(PdfStreamingContext context)
{
if (context.HasReference(this)) // is object referenced in xref
{
PdfUtils.WriteObjectDefinitionBegin(this, context);
}
context.Writer.Indent();
context.Writer.WriteLine("<<");
WriteContents(context);
context.Writer.Exdent();
context.Writer.Writeline(">>");
if (context.HasReference(this))
{
PdfUtils.WriteObjectDefinitionEnd(this, context);
}
}
I've chopped out some chaff so you can see the wheat underneath. The context is an object that holds a new xref table as well as an object for writing to streams that automagically handles appropriate newline discipline, indentation, line wrapping, and so on.
What you should see is that the basics here are straight forward, if not trivial. And now's when you should be asking yourself the question, "if it's trivial, how come there isn't more (serious) competition for Acrobat in the market? The answer is that even though it's trivial, it's still easy to write PDFs that aren't spec compliant and Acrobat handles most of those. The real challenge is to be able to honor the spec and make sure that you include all required values in a dictionary and that they are in range and semantically correct. Hell, even the date time format--which is pretty well-specified--is a mound of special case code in my library to manage where other people have screwed it up royally. Being able to generate consistently correct PDF is hard and consuming the garbage in the sea of PDFs in the world is harder.
I could (and probably should) write a book about how to do this. While a lot of the fringe code is grubby, the overall structure can be very pretty.
tl;dr - If you're thinking of a recursive descent parser for PDF, you're thinking too hard. All you need is a tokenizer and a simple REPL.
I'm trying to write a dissector for the Safari Remote Debug protocol which is based on bplists and have been reasonably successful (current code is here: https://github.com/andydavies/bplist-dissector).
I'm running into difficultly with reassembling packets though.
Normally the protocol sends a packet with 4 bytes containing the length of the next packet, then the packet with the bplist in.
Unfortunately some packets from the iOS simulator don't follow this convention and the four bytes are either tagged onto the front of the bplist packet, or onto the end of the previous bplist packet, or the data is multiple bplists.
I've tried reassembling them using desegment_len and desegment_offset as follows:
function p_bplist.dissector(buf, pkt, root)
-- length of data packet
local dataPacketLength = tonumber(buf(0, 4):uint())
local desiredPacketLength = dataPacketLength + 4
-- if not enough data indicate how much more we need
if desiredPacketLen > buf:len() then
pkt.desegment_len = dataPacketLength
pkt.desegment_offset = 0
return
end
-- have more than needed so set offset for next dissection
if buf:len() > desiredPacketLength then
pkt.desegment_len = DESEGMENT_ONE_MORE_SEGMENT
pkt.desegment_offset = desiredPacketLength
end
-- copy data needed
buffer = buf:range(4, dataPacketLen)
...
What I'm attempting to do here is always force the size bytes to be the first four bytes of a packet to be dissected but it doesn't work I still see a 4 bytes packet, followed by a x byte packet.
I can think of other ways of managing the extra four bytes on the front, but the protocol contains a lookup table thats 32 bytes from the end of the packet so need a way of accurately splicing the packet into bplists.
Here's an example cap: http://www.cloudshark.org/captures/2a826ee6045b #338 is an example of a packet where the bplist size is at the start of the data and there are multiple plists in the data.
Am I doing this right (looking other questions on SO, and examples around the web I seem to be) or is there a better way?
TCP Dissector packet-tcp.c has tcp_dissect_pdus(), which
Loop for dissecting PDUs within a TCP stream; assumes that a PDU
consists of a fixed-length chunk of data that contains enough information
to determine the length of the PDU, followed by rest of the PDU.
There is no such function in lua api, but it is a good example how to do it.
One more example. I used this a year ago for tests:
local slicer = Proto("slicer","Slicer")
function slicer.dissector(tvb, pinfo, tree)
local offset = pinfo.desegment_offset or 0
local len = get_len() -- for tests i used a constant, but can be taken from tvb
while true do
local nxtpdu = offset + len
if nxtpdu > tvb:len() then
pinfo.desegment_len = nxtpdu - tvb:len()
pinfo.desegment_offset = offset
return
end
tree:add(slicer, tvb(offset, len))
offset = nxtpdu
if nxtpdu == tvb:len() then
return
end
end
end
local tcp_table = DissectorTable.get("tcp.port")
tcp_table:add(2506, slicer)