Is it possible to parse dynamic xml-structured log contents with Grok? - parsing

Is it feasible using Grok to parse dynamic xml-structured log contents, such as:
<tag_1> contents </tag_1> ... <tag_N> contents </tag_N>
where "tag_*" would be the field name and "contents" - the actual contents.
Therefore the parsed message would look like:
{
"tag_1": [
[
"contents"
]
],
....
"tag_N": [
[
"contents"
]
]
}

Not with grok. You will need to resort to ruby code to parse the XML and toss it into the event structure.
If your XML is super regular (ie has a root element and only one level under it), you could maybe use code like this:
filter {
ruby {
code => "
msg = event['message'].split('><');
for part in msg
endpos = part.index('</')
startpos = part.index('>')
if !endpos.nil? && !startpos.nil? then
tag = part[0,startpos];
text = part[startpos+1,endpos-startpos-1];
event[tag]=text
end
end
"
}
}
If your xml is more complex, you are going to have to resort to a real XML parser and figure out how to use it with logstash (I've never brought an external library into logstash).

Related

Jmeter ForEach Controller failing to write variables to file in order retrieved

Jmeter ForEach Controller failing to write variables in original order correctly
I am executing a http request retrieving a json payload with an array of employees. For each record (employee) I need to parse the record for specific fields e.g. firstName, lastName, PersonId and write to a single csv file, incrementing a new row per record.
Unfortunately, the file created has two issues. The PersonId never gets written and secondly the sequence of the values is not consistent with the returned original values. Sometimes I get the same record for lastName with the wrong firstName and vice versa. Not sure if the two issues are related, I suspect my regular expression extract is wrong for a number.
Jmeter setup. (5.2.1)
jmeter setUp
Thread group
+ HTTP Request
++ JSON JMESPath Extractor
+ ForEach Controller
++ Regular Expression Extractor: PersonId
++ Regular Expression Extractor: firstName
++ Regular Expression Extractor: lastName
++ BeanShell PostProcessor
getWorker returns the following payload
jsonPayload
JSON JMESPath Extractor to handle the payload.
{
"items" : [
{
"PersonId" : 398378,
"firstName" : "Sam",
"lastName" : "Shed"
},
{
"PersonId" : 398379,
"firstName" : "Bob",
"lastName" : "House"
}
],
"count" : 2,
"hasMore" : true,
"limit" : 2,
"offset" : 0,
"links" : [
{
"rel" : "self",
"href" : "https://a.site.on.the.internet.com/employees",
"name" : "employees",
"kind" : "collection"
}
]
}
JSON JMESPath Extractor Configuration
Name of created variables: items
JMESPath expressions: items
Match No. -1
Default Values: Not Found
ForEach Controller
ForEach Controller Configuration
Input variable prefix: items
Start Index: Empty
End Index: Empty
Output variable name: items
Add "_"? Checked
Each of the Regular Expression Extracts follow the same pattern as below.
Extract PersonId with Regular Expression
Apply to: Main Sample Only
Field to check: Body
Name of created variable: PersonId
Regular Expression: "PersonId":"(.+?)"
Template: $1$
Match No. Empty
Default Value: PersonId
The final step in the thread is where I write out the parsed results.
BeanShell PostProcessor
PersonNumber = vars.get("PersonNumber");
DisplayName = vars.get("DisplayName");
f = new FileOutputStream("/Applications/apache-jmeter-5.2.1/bin/scripts/getWorker/responses/myText.csv", true);
p = new PrintStream(f);
this.interpreter.setOut(p);
print(PersonId+", "+ PersonNumber+ ", " + DisplayName);
f.close();
I am new to this and looking either for someone to tell me where I screwed up or direct me to a place I can read up on the appropriate topics. (Both are fine). Thank you.
For Each Controller doesn't know the structure of items variable since it is in JSON format. It is capable of just understanding an array and traverses through them. I would suggest to move away from For Each Controller in your case and use the JSON extractor itself for all the values like below
Person ID
First Name
Last Name
Beanshell Sampler Code
import java.io.FileWriter; // Import the FileWriter class
int matchNr = Integer.parseInt(vars.get("personId_C_matchNr"));
log.info("Match number is "+matchNr);
f = new FileOutputStream("myText.csv", true);
p = new PrintStream(f);
for (int i=1; i<=matchNr; i++){
PersonId = vars.get("personId_C_"+i);
FirstName = vars.get("firstName_C_"+i);
LastName = vars.get("lastName_C_"+i);
log.info("Iteration is "+i);
log.info("Person ID is "+PersonId);
log.info("First Name is "+FirstName);
log.info("Last Name is "+LastName);
p.println(PersonId+", "+FirstName+", "+LastName);
}
p.close();
f.close();
Output File
HOW THE ABOVE ACTUALLY WORKS
When you extract values using the matchNr, it goes in a sequential order in which the response has arrived. For example, in your case, Sam & Shed appear as first occurrences and Bob & House appear as subsequent occurrences. Hence JMeter captures them with the corresponding match and stores them as 1st First Name = Sam, 2nd First Name = Bob and so on.
GENERIC STUFF
The regex expression for capturing Person ID which you have used seems to be inaccurate. The appropriate one would be
"PersonId" :(.+?),
and not
"PersonId":"(.+?)"
Move to JSR223 processors instead of Beanshell as they are more performant. Source: Which one is efficient : Java Request, JSR223 or BeanShell Sampler for my script. The migration is pretty simple. Just copy the code that you have in Beanshell and paste it in JSR223.
Close any stream or writer that is open appropriately else it might cause issues when other users are trying to write to the file during load test
In case you are planning to use this file as a subsequent input within JMeter, please note that there is a space between comma and the next element. For example, it is "Sam, Shed" and not "Sam,Shed".JMeter by default does not trim any spaces and will use the value just like that. Hence you might want to take a judicious call regarding that space
Hope this helps!
Since JMeter 3.1 you shouldn't be using Beanshell, go for JSR223 Test Elements and Groovy language for scripting.
Given Groovy has built-in JSON support you shouldn't need any extractors, you can write the data into a file in a single shot like:
new groovy.json.JsonSlurper().parse(prev.getResponseData()).items.each { item ->
new File('myText.csv') << item.get('PersonId') << ',' << item.get('firstName') << ',' << item.get('lastName') << System.getProperty('line.separator')
}
More information: Apache Groovy - Why and How You Should Use It

Read multiple concatenated json objects in Ruby

I have a file that contains multiple JSON objects that are not separated by comma :
{
"field" : "value",
"another_field": "another_value"
} // no comma
{
"field" : "value"
}
Each of the objects standalone is a valid json object.
Is there a way that I can process this file easily?
I know this is NOT a valid json, but unfortunately this file is being generated by a 3rd party tool. I have no option of changing the way the output looks like.
I can't open a text editor and smart-insert commas / square brackets before the run, since this is an automated process (I also really don't want to write code that opens the file and manipulates it).
In .NET there's a library that has this exact feature :
https://stackoverflow.com/a/29480032/2970729
https://www.newtonsoft.com/json/help/html/P_Newtonsoft_Json_JsonReader_SupportMultipleContent.htm
Is there anything equivalent in Ruby?
As long as your file is that simple you might want to do something like this:
# content = File.read(filename)
content =<<-EOF
{
"field" : "value",
"another_field": "another_value"
} // no comma
{
"field" : "value"
}
EOF
require 'json'
JSON.parse("[#{content.gsub(/\}.*?\{/m, '},{')}]")
#=> [{"field"=>"value", "another_field"=>"another_value"}, {"field"=>"value"}]
The yajl-ruby gem enables processing concatenated JSON in Ruby. The parser can read from a String or an IO. Each complete object is yielded to a block.
require 'yajl'
File.open 'file.json' do |f|
Yajl.load f do |object|
# do something with object
end
end
See the documentation for other options (buffer size, symbolized keys, etc).

How to parse JSON with the Oj SAX parser, Saj

I want to parse a 10-20MB JSON file, and figure it's probably a good idea to not parse the entire JSON file at once and cause major memory usage. After looking around it seems like Oj's Saj or ScHandler APIs might be a good fit.
The only problem is that I can't really wrap my head around how to use them, and the documentation doesn't make it much clearer. I've looked at the example in Saj source code, and defined a super simple subclass of Oj::Saj like below:
class MySaj < Oj::Saj
def hash_start(key)
p key
end
end
Used like this:
open(URL) do |contents|
Oj.saj_parse(handler, contents)
end
And this leads to a lot of keys from my JSON being printed out. But I still have no idea how to actually access the values belonging to the keys I'm printing.
Can I access the hash itself somehow, or how am I supposed to do this?
SAX-style parsing is complicated. You have to maintain the state of the parsing, and deal with each state change appropriately.
The hash_start and array_start callbacks, notify your SAX handler that Saj has found the beginning of a hash, and that the next callbacks that occur will be in the context of that hash. Note that hashes may be nested, contain (or be contained within) arrays, or simple values.
Here is a simple Saj handler that parses a very simple JSON object:
require 'oj'
class MySaj < ::Oj::Saj
def initialize()
#hash_cnt = 0
#array_cnt = 0
end
def hash_start(key)
#hash_cnt += 1
puts "Start-Hash[#hash_cnt]: '#{key}'"
end
def hash_end(key)
#hash_cnt -= 1
puts "End-Hash[#hash_cnt]: '#{key}'"
end
def array_start(key)
#array_cnt += 1
puts "Start-Array[#array_cnt]: '#{key}'"
end
def array_end(key)
#array_cnt -= 1
puts "End-Array[#array_cnt]: '#{key}'"
end
def add_value(value, key);
puts "Value: [#{key}] = '#{value}'"
end
def error(message, line, column)
puts "ERRRORRR: #{line}:#{column}: #{message}"
end
end
json = '[{ "key1": "abc", "key2": 123}, { "test1": "qwerty", "pi": 3.14159 }]'
cnt = MySaj.new()
Oj.saj_parse(cnt, json)
The results of this basic JSON parsing with Saj gives this result:
Start-Array[#array_cnt]: ''
Start-Hash[#hash_cnt]: ''
Value: [key1] = 'abc'
Value: [key2] = '123'
End-Hash[#hash_cnt]: ''
Start-Hash[#hash_cnt]: ''
Value: [test1] = 'qwerty'
Value: [pi] = '3.14159'
End-Hash[#hash_cnt]: ''
End-Array[#array_cnt]: ''
You may notice that this output is roughly equivalent to one callback per token (omitting ',' and ':'). You essentially have to build into your callbacks the knowledge of what to do with individual JSON elements. Along those lines, you also need to build the hierarchy described by the callbacks. For example, when hash_start is called, push an empty hash on the stack; when hash_end is called, pop the hash or move back one level in the hierarchy.
For example you could have a handler in hash_end that checks to see if this is ending a top-level hash, and when it is, then do something with that hash. Note that you can often not do this with arrays, as the top-level element in a very large number of JSON documents is an array, so you have to determine when the array is the top+1 level array.
If you like writing compiler backends, this is the JSON parsing solution for you. Personally, I've never enjoyed working in Sax, but for large documents, it can be very resource-friendly and highly performant, depending on how well you write the handler. Be prepared for oodles of debugging and slightly mismatched state management, as that's par for the course with Sax-style parsing.
However, you shouldn't be too concerned with 10-20MB JSON, as that's actually not very large. I've processed 80+MB JSON with "regular" Oj (load and dump) quite a lot, and not had a problem with it. Unless you're running on a severely resource-constrained machine, the standard Oj will work well for you.
Saj is a streaming parser. What that means, in practice, is that it doesn't know a file's contents in their entirety and parses them whole — it instead notifies you of parse events as it encounters them. Your thinking is solid: the larger the file, the more you benefit from parsing in that manner if you wish to pick and choose from it.
hash_start is one such event, fired when Oj sees the beginning of an Object (which will become a Hash in Ruby land).
Take this JSON for instance:
{
"student-1": {
"name": "John Doe",
"age": 42,
"knownAliases": ["Blabby Joe", "Stack Underflow"],
"trainingGrades": {
"Advanced Zumba Dancing": "A+",
"Introduction to Twitter Arguments": "C-"
}
},
"student-2": {
"name": "Rebecca Melecca",
"age": 26,
"knownAliases": ["Booger Becca", "Tanktop Terror"],
"trainingGrades": {
"Intermediate Groin Kickery": "A+",
"Advanced Quantum Mechanics": "A+"
}
}
And the following parser:
class StudentParser < Oj::Saj
def hash_start(key)
puts "hash_start(#{key.inspect})"
end
def hash_end(key)
puts "hash_end(#{key.inspect})"
end
def array_start(key)
puts "array_start(#{key.inspect})"
end
def array_end(key)
puts "array_end(#{key.inspect})"
end
def add_value(value, key)
puts "add_value(#{value.inspect}, #{key.inspect})"
end
end
And you'll get the following sequence of events:
hash_start(nil)
hash_start("student-1")
add_value("John Doe", "name")
add_value(42, "age")
array_start("knownAliases")
add_value("Blabby Joe", nil)
add_value("Stack Underflow", nil)
array_end("knownAliases")
hash_start("trainingGrades")
add_value("A+", "Advanced Zumba Dancing")
add_value("C-", "Introduction to Twitter Arguments")
hash_end("trainingGrades")
hash_end("student-1")
hash_start("student-2")
add_value("Rebecca Melecca", "name")
add_value(26, "age")
array_start("knownAliases")
add_value("Booger Becca", nil)
add_value("Tanktop Terror", nil)
array_end("knownAliases")
hash_start("trainingGrades")
add_value("A+", "Intermediate Groin Kickery")
add_value("A+", "Advanced Quantum Mechanics")
hash_end("trainingGrades")
hash_end("student-2")
hash_end(nil)
When you see hash_start(nil), it means the parser has found a top-level object (that very first opening brace). Conversely, hash_end(nil) means that top-level object has been closed, and its innards properly parsed (i.e. no parsing erros have been found).
Parsing in this manner means you have to keep track of nesting, if that's meaningful to you, of adding keys and values at the right value, et cetera. That makes it annoying and hard, but worthwhile if you wish to carve out bits of a large file without committing everything to memory.

Declarative representation of a file path with embedded environment variables

I am looking at a situation where I'd like to bring some structure to what would be a string in an typical language. And wondering how to use Rebol's parts box to do it.
So let's say I've got a line that looks like this in the original language I'm trying to dialect:
something = ("/foo/mumble" "/foo/${BAR}/baz")
I want to use Rebol's primitives, so certainly a file path. Here is a random example of what I thought of off the top of my head:
something: [%/foo/mumble [%/foo/ BAR %/baz]]
If it were code you'd use REJOIN or COMBINE. But this is not designed to be executed, it's more like a configuration file. You're not supposed to be running arbitrary code, just getting a list of files.
I'm not sure about how feasible it is to stick with strings and yet still have these type as FILE!. Not all characters work in a FILE!, for instance:
>> load "%/foo/${BAR}/baz"
== [%/foo/$ "BAR" /baz]
It makes me wonder what my options are in Rebol data that's supposed to represent a configuration file. I can use plain old strings and do substitutions like other things do. Maybe REWORD with an OBJECT block to represent the environment?
What is the 'reword' function in Rebol and how do I use it?
In any case, I want to know how to represent a filename in a declarative context with environment variable substitutions like this.
I should use file! Your example need "" after %
f: load {%"/foo/${BAR}/baz"}
replace f "${BAR}" "MYVALUE" ;== %/foo/MYVALUE/baz
you could use path! with parens.
the only issue is the root, for which you can use another character to replace the "%" used for files... let's use '! (note this should be a word 'valid character).
when calling to-block on a path! type, it returns each part as its own token... useful.
to-block '!/path/(foo)/file.txt
== [! path (foo) file.txt]
here is a little script which loads three paths and uses parens as a constructed part of the path and uses tags to escape path-illegal characters (like a space!)
environments: make object! [
foo: "FU"
bar: "BR"
]
paths: [
!/path/(foo)/file.txt
!/root/<escape weird chars $>/(bar ".txt")
!/("__" foo)/path/(bar)
]
parse paths [
some [
(print "------" )
set data path! here: ( insert/only here to-block data to-block data )
(out-path: copy %"" )
into [
path-parts: (?? path-parts)
'!
some [
[ set data [word! | tag! | number!] (
append out-path rejoin ["/" to-string data]
)]
|
into [
( append out-path "/")
some [
set data word! ( append out-path rejoin [to-string get in environments data] )
| set data skip ( append out-path rejoin [ to-string data])
]
]
| here: set data skip (to-error rejoin ["invalid path token (" type? data ") here: " mold here])
]
]
(?? out-path)
]
]
Note this works both in Rebol3 and Rebol2
output is as follows:
------
path-parts: [! path (foo) file.txt]
out-path: %/path/FU/file.txt
------
path-parts: [! root <escape weird chars $> (bar ".txt")]
out-path: %/root/escape%20weird%20chars%20$/BR.txt
------
path-parts: [! ("__" foo) path (bar)]
out-path: %/__FU/path/BR
------

XML Tags using Builder

I am currently building an XML export for a Real Estate app. using the Builder gem in Rails. And I am looking for a way to do the following:
<commercialRent>
"commercialRent figure"
<range>
...
</range>
</commercialRent>
I can't seem to find a way to implement the "Text Goes Here" part
My code:
b.commercialRent(period: "annual", plusOutgoings: self.plus_outgoings) {
self.rent_price;
b.range {
b.min(self.rent_psm_pa_min);
b.max(self.rent_psm_pa_max)
};
};
Returns:
<commercialRent period="annual" plusOutgoings="no">
<rentPerSquareMeter>
<range>
<min>1000</min>
<max>10000</max>
</range>
</rentPerSquareMeter>
</commercialRent>
Everything prints fine, except the self.rent_price is missing.
I can't figure it out.
You use the text! method to produce a text node:
- (Object) text!(text)
Append text to the output target. Escape any markup. May be used within the markup brackets as:
builder.p { |b| b.br; b.text! "HI" } #=> <p><br/>HI</p>
So something like this should do the trick:
b.commercialRent(period: "annual", plusOutgoings: self.plus_outgoings) do
b.text! self.rent_price
#...
end

Resources