How can I parse info with Logstash? - parsing

I have an input:
May 16 12:45:47 host-dev1 kernel: [ 162.648366] wireguard: wg0: Sending keepalive packet to peer 2 (171.12.198.123:51079)
I want to parse the info as: TIMESTAMP "Sending keepalive packet to peer 2" IP:PORT
For the middle sentence I want to parse whatever is after wg0: until the first parenthesis of the port. This sentence can change to "Sending handshake initiation to peer 10" for example.
I've done
filter {
grok {
match => { "message" => "%{SYSLOGBASE:timestap} %{GREEDYDATA:action} %{IP:peerip}:%{NUMBER:port}" }
}
}
I need to change GREEDYDATA to something that will specifically parse the mentioned boundaries

Give this a try:
%{SYSLOGBASE:timestamp} \[ ?%{NUMBER:TIMESTAMP} ?\]( %{WORD}:)* %{GREEDYDATA:action} \(%{IP:peerip}:%{NUMBER:port}
Here's a breakdown:
%{SYSLOGBASE:timestamp} - the syslog prefix
\[ ?%{NUMBER:TIMESTAMP} ?\] - the application timestamp
( %{WORD}:)* - any words followed by a colon, like
'wg0:', zero or more times
%{GREEDYDATA:action} - any characters ('DATA' would also work)
\(%{IP:peerip}:%{NUMBER:port} - a literal '(' followed by IP and port
The important thing in making GREEDYDATA / DATA work here is that the the boundaries (%{WORD}: and \() are properly defined.
You may need to vary the boundary definitions depending on what other log messages look like (specifically, whether you can rely on the colon and parenthesis at the boundaries).
It may be helpful to use a named capture group, depending on whether existing grok patterns cover your other message formats, like : (?<notColons>[^:]*) \( to specify "a colon, then a space, then any number of non-colon characters, then a space, then an open bracket".

Related

What's the escape character in AT-commands?

Im using a BG96 modem to connect to AWS iot over MQTT.
I'm trying to set my MQTT Last Will and Testament with the following AT-command:
+QMTCFG:"will",(0-5),(0,1),(0-2),(0,1),"willtopic","willmessage"
Which works great.
But now I'm trying to add a JSON formatted string to "willmessage", so I need to add "" (double quotes) in there, which means I need to escape them in my command. But I have no clue if I can escape them or what the escape character is.
Things I tried: \" (backslash) and "" (double double quotes)
I looked in all of the BG96 datasheets, and I don't see it mentioned anywhere.
I had the same issue while using MQTT commands on a SIMCOM SIM800c, and I noticed that the regular backslash (\) escapes the quotation marks (as it does in c) when communicating directly with the GSM unit via a USB to TTL converter. To implement this in software I printed the following string to the UART connected to the GSM Modem:
AT+SMPUB=\"testTopicPost\",0,1,\"{\x5c\x22Key\x5c\x22 : \x5c\x22Value\x5c\x22}\"
What this basically does is send the raw \ and " characters to the GSM unit. Hope this solution works for you as well.
Escaping of " within a string is covered in chapter 5.4.2.2 String constants in the V.250 standard - which is a MUST READ for anyone writing code handling AT commands (read at least all of chapter 5):
String constants shall consist of a sequence of ... except for the characters """ ... . String constants shall be bounded at the beginning and end by the double-quote character (""" ... . ... The double-quote character, used as the beginning and ending string delimiter, shall be represented within a string constant as "\22".
So the escape mechanism is \22 not \x22. This should be universally the case for all modems and not something that is implementation dependent.
I did not find reference to documentation of MQTT and BG96 and you did not link any of your "all of the BG96 datasheets" so I am just providing example syntax for an imaginary command to send a JSON payload of {"key": "value"}:
AT+SOMECOMMAND=...,"{\22key\22: \22value\22}"
Some special characters have different values between the ASCII character set and the GSM character set.
i.e.
In the ASCII character set: \ = 0x5C.
In the GSM character set: Ö = 0x5C.
Beyond this point, some special characters must be entered using a specific way, such as a 2-byte representation. I suggest you check the standard/version of AT commands implemented on your hardware (i.e. GSM 07.07, GSM 07.05, manufacturer specific set...).
i.e. I'm using a GPS+GPRS modem from Ai-Thinker called A9G. In this one, to use the AT+MQTTPUB command with data formatted in JSON style, I need to append \x5c\x32\x32. So the module will interpret this as \22 and the server as \".
i.g.
"{\x5c\x32\x32Key\x5c\x32\x32:\x5c\x32\x32Value\x5c\x32\x32}"
at the cloud it will be:
{"Key":"Value"}

Send data over TCP/IP with Netcat or Rails

I have an IP of the server and a port on which I'm able to connect via nc on Ubuntu 14.04.
> nc x.x.x.x PORT
In order to communicate with the server, the first step is to send a WAKEUP call and get acknowledgment. The server expects a 3 byte ID in the wakeup call. An example is provided in the documentation that shows the success scenario of sending the ID and receiving the ack using a software. i.e
The client sends:
<sy><sy><eq>111<et>
And the server responds with:
<sy><ak>A<et><cr>
Here is some detail of <sy>
Within <> brackets is a non-printable ASCII character (<sy> = ASCII 22 or Hex 0x16)
I tried to replicate the exact same scenario but failed to do so. The server doesn't respond to the data I send, although the data is received there. I'm not sure about these tags <sy><sy><eq> etc. How to send the ID(111) along with these tags <sy> correctly?
Also tried to send this data using Rails framework and Bindata ruby gem but don't know how to represent the above format.
netcat is probably the wrong tool for this. Or at least you will want to use some other program to feed it input.
If I were doing this, I would code up something in python or C that would both connect to the server and feed it whatever data I needed to send it (and receive/interpret the responses) leaving out nc altogether. There are many examples on the web.
You can encode the control characters in a byte string in python with the syntax b'\x16' for your <sy> character. Most other languages have an equivalent capability.
I can't be sure exactly what those characters are. It seems likely they are standard ASCII control characters, but they aren't using the standard abbreviations (see http://www.theasciicode.com.ar/ for example). So presumably the documentation you are looking at has a list of the corresponding values. Assuming for the sake of example that <eq> corresponds to the ASCII ENQ character and <et> to the ASCII EOT (and given you already know that <sy> is equivalent to ASCII SYN), your desired string <sy><sy><eq>111<et> can be encoded in a python byte string: b'\x16\x16\x05111\x04'
(or equivalently b'\x16\x16\x05\x31\x31\x31\x04' if you like regularity: the 1 characters are simply ASCII digits, so you can replace each 1 with its binary equivalent b'\x31')
To return to nc, trying to type in the control characters to the nc input from a terminal window is, while possible in most cases, very difficult and error-prone. You will need to know the equivalent control character mapping (for example, 0x16 is "Ctrl-V") and will need to know how to get the terminal to accept that literal character (coincidentally, in linux, you have to precede most control characters with a Ctrl-V in order to enter them as input and avoid having them interpreted in the usual way: Ctrl-D == EOF, Ctrl-C == Interrupt, Ctrl-W == Delete-Previous-Word, etc).
So if you wanted to enter the data above into nc's input from the command line, you would need to type these characters:
Ctrl-V Ctrl-V <sy> / SYN
Ctrl-V Ctrl-V <sy> / SYN
Ctrl-V Ctrl-E <eq> / ENQ
1
1
1
Ctrl-V Ctrl-D <et> / EOT
But also important to note is that ordinarily nc will not actually send anything until you enter a newline (i.e. press the Return key). Then that newline character will also get sent to the server which might not be what you want.

(F) Lex, how do I match negation?

Some language grammars use negations in their rules. For example, in the Dart specification the following rule is used:
~('\'|'"'|'$'|NEWLINE)
Which means match anything that is not one of the rules inside the parenthesis. Now, I know in flex I can negate character rules (ex: [^ab] , but some of the rules I want to negate could be more complicated than a single character so I don't think I could use character rules for that. For example I may need to negate the sequence '"""' for multiline strings but I'm not sure what the way to do it in flex would be.
(TL;DR: Skip down to the bottom for a practical answer.)
The inverse of any regular language is a regular language. So in theory it is possible to write the inverse of a regular expression as a regular expression. Unfortunately, it is not always easy.
The """ case, at least, is not too difficult.
First, let's be clear about what we are trying to match.
Strictly speaking "not """" would mean "any string other than """". But that would include, for example, x""".
So it might be tempting to say that we're looking for "any string which does not contain """". (That is, the inverse of .*""".*). But that's not quite correct either. The typical usage is to tokenise an input like:
"""This string might contain " or ""."""
If we start after the initial """ and look for the longest string which doesn't contain """, we will find:
This string might contain " or "".""
whereas what we wanted was:
This string might contain " or "".
So it turns out that we need "any string which does not end with " and which doesn't contain """", which is actually the conjunction of two inverses: (~.*" ∧ ~.*""".*)
It's (relatively) easy to produce a state diagram for that:
(Note that the only difference between the above and the state diagram for "any string which does not contain """" is that in that state diagram, all the states would be accepting, and in this one states 1 and 2 are not accepting.)
Now, the challenge is to turn that back into a regular expression. There are automated techniques for doing that, but the regular expressions they produce are often long and clumsy. This case is simple, though, because there is only one accepting state and we need only describe all the paths which can end in that state:
([^"]|\"([^"]|\"[^"]))*
This model will work for any simple string, but it's a little more complicated when the string is not just a sequence of the same character. For example, suppose we wanted to match strings terminated with END rather than """. Naively modifying the above pattern would result in:
([^E]|E([^N]|N[^D]))* <--- DON'T USE THIS
but that regular expression will match the string
ENENDstuff which shouldn't have been matched
The real state diagram we're looking for is
and one way of writing that as a regular expression is:
([^E]|E(E|NE)*([^EN]|N[^ED]))
Again, I produced that by tracing all the ways to end up in state 0:
[^E] stays in state 0
E in state 1:
(E|NE)*: stay in state 1
[^EN]: back to state 0
N[^ED]:back to state 0 via state 2
This can be a lot of work, both to produce and to read. And the results are error-prone. (Formal validation is easier with the state diagrams, which are small for this class of problems, rather than with the regular expressions which can grow to be enormous).
A practical and scalable solution
Practical Flex rulesets use start conditions to solve this kind of problem. For example, here is how you might recognize python triple-quoted strings:
%x TRIPLEQ
start \"\"\"
end \"\"\"
%%
{start} { BEGIN( TRIPLEQ ); /* Note: no return, flex continues */ }
<TRIPLEQ>.|\n { /* Append the next token to yytext instead of
* replacing yytext with the next token
*/
yymore();
/* No return yet, flex continues */
}
<TRIPLEQ>{end} { /* We've found the end of the string, but
* we need to get rid of the terminating """
*/
yylval.str = malloc(yyleng - 2);
memcpy(yylval.str, yytext, yyleng - 3);
yylval.str[yyleng - 3] = 0;
return STRING;
}
This works because the . rule in start condition TRIPLEQ will not match " if the " is part of a string matched by {end}; flex always chooses the longest match. It could be made more efficient by using [^"]+|\"|\n instead of .|\n, because that would result in longer matches and consequently fewer calls to yymore(); I didn't write it that way above simply for clarity.
This model is much easier to extend. In particular, if we wanted to use <![CDATA[ as the start and ]]> as the terminator, we'd only need to change the definitions
start "<![CDATA["
end "]]>"
(and possibly the optimized rule inside the start condition, if using the optimization suggested above.)

Matching function in erlang based on string format

I have user information coming in from an outside source and I need to check if that user is active. Sometimes I have a User and a Server and other times I have User#Server. The former case is no problem, I just have:
active(User, Server) ->
do whatever.
What I would like to do with the User#Server case is something like:
active([User, "#", Server]) ->
active(User, Server).
Doesn't seem to work. When calling active in the erlang terminal with a#b for example, I get an error that there is no match. Any help would be appreciated!
You can tokenize the string to get the result:
active(UserString) ->
[User,Server] = string:tokens(UserString,"#"),
active(User,Server).
If you need something more elaborate, or with better handling of something like email addresses, it might then be time to delve into using regular expressions with the re module.
active(UserString) ->
RegEx = "^([\\w\\.-]+)#([\\w\\.-]+)$",
{match, [User,Server]} = re:run(UserString,RegEx,[{capture,all_but_first,list}]),
active(User,Server).
Note: The supplied Regex is hardly sufficient for email address validation, it's just an example that allows all alphanumeric characters including underscores (\\w), dots (\\.), and dashes (-) seperated by an at symbol. And it will fail if the match doesn't stretch the whole length of the string: (^ to $).
A note on the pattern matching, for the real solution to your problem I think #chops suggestions should be used.
When matching patterns against strings I think it's useful to keep in mind that erlang strings are really lists of integers. So the string "#" is actually the same as [64] (64 being the ascii code for #)
This means that you match pattern [User, "#", Server] will match lists like: [97,[64],98], but not "a#b" (which in list form is [97,64,98]).
To match the string you need to do [User,$#,Server]. The $ operator gives you the ascii value of the character.
However this match pattern limits the matching string to be 1 character followed by # and then one more character...
It can be improved by doing [User, $# | Server] which allows the server part to have arbitrary length, but the User variable will still only match one single character (and I don't see a way around that).

Regex for extracting second level domain from FQDN?

I can't figure this out. I need to extract the second level domain from a FQDN. For example, all of these need to return "example.com":
example.com
foo.example.com
bar.foo.example.com
example.com:8080
foo.example.com:8080
bar.foo.example.com:8080
Here's what I have so far:
Dim host = Request.Headers("Host")
Dim pattern As String = "(?<hostname>(\w+)).(?<domainname>(\w+.\w+))"
Dim theMatch = Regex.Match(host, pattern)
ViewData("Message") = "Domain is: " + theMatch.Groups("domainname").ToString
It fails for example.com:8080 and bar.foo.example.com:8080. Any ideas?
I used this Regex successfully to match "example.com" from your list of test cases.
"(?<hostname>(\w+\.)*)(?<domainname>(\w+\.\w+))"
The dot character (".") needs to escaped as "\.". The "." character in a regex pattern matches any character.
Also the regex pattern you provided requires that there be 1 or more word characters followed by a dot before the domainname match (this part "(?(\w+))." of the pattern. Also, I'm assuming that the . character was supposed to be escaped). This fails to make a match for the input "example.com" because there's no word character and dot before the domainname match.
I changed the pattern so that the hostname match would have zero or more matches of "1 or more word characters followed by a dot". This will match "foo" in "foo.example.com" and "foo.bar" in "foo.bar.example.com".
This assumes you've validated the contents of the fqdn elsewhere (e.g.: dashes allowed, no underscores or other non-alphanumeric characters), and is otherwise as liberal as possible.
'(?:(?<hostname>.+)\.)?(?<domainname>[^.]+\.[^.]+?)(?:\:(?<port>[^:]+))?$'
Matches the hostname component if present (including multiple additional levels):
bar.foo.example.com:8000 would match:
hostname: bar.foo (optional)
domainname: example.com
port: 8000 (optional)
I'm not familiar with VB.NET or ASP, but on the subject of regular expressions...
First off, you'll want to anchor your expression with ^ and $.
Next, \w may match different things depending on implementation, locale, etc., so you may want to be explicit. For example, \w may not match a hyphen, a valid character in domain names.
You don't seem to be taking into account an optional port number.
I'm sure there's a more RFC-accurate expression out there, but here's a start at something that should work for you.
^([a-z0-9\-]+\.)*([a-z0-9\-]+\.[a-z0-9\-]+)(:[0-9]+)?$
Broken down:
([a-z0-9\-]+\.)*: Start with zero or more hostnames...
([a-z0-9\-]+\.[a-z0-9\-]+): followed by two hostnames...
(:[0-9]+)?: followed by an optional port declaration.
Note that if you're dealing with a domain like example.ne.jp, you will only get .ne.jp. Also, note that the above example expression should be matched case-insensitively.

Resources