Flume morphline interceptor-split command - flume

Hi I'm trying to use morphline inteceptor and convert my syslog to JSON
for start i tried to use split command for splitting my string ,but im getting error as below:
"" Source r1 has been removed due to an error during configuration
com.typesafe.config.ConfigException$WrongType: /root/flume.17/conf/morph.conf: 21: Cannot concatenate object or list with a non-object-or-list, ConfigString("split") and SimpleConfigObject({"outputFields":"substrings","inputField":"message","addEmptyStrings":false,"isRegex":false,"trim":true,"separator":" "}) are not compatible""
my morphline configuration file:
morphlines : [
{
# Name used to identify a morphline. E.g. used if there are multiple
# morphlines in a morphline config file
id : morphline1
# Import all morphline commands in these java packages and their
# subpackages. Other commands that may be present on the classpath are
# not visible to this morphline.
importCommands : ["org.kitesdk.**"]
commands : [
{
# Parse input attachment and emit a record for each input line
readLine {
charset : UTF-8
}
}
,split {
inputField : message
outputFields : "substrings"
separator : " "
isRegex : false
addEmptyStrings : false
trim : true }
}
}
]
}
]
what do i have to do?I'm new to this

From morhpline documentation
outputField - The name of the field to add output values to, i.e. a single string. Example: tokens. One of outputField or outputFields must be present, but not both.
outputFields - The names of the fields to add output values to, i.e. a list of strings. Example: [firstName, lastName, "", age]. An empty string in a list indicates omit this column in the output. One of outputField or outputFields must be present, but not both.
So you should just specify
outputField : substrings
instead of
outputFields : "substrings"
http://kitesdk.org/docs/1.1.0/morphlines/morphlines-reference-guide.html#split

Related

Jmeter ForEach Controller failing to write variables to file in order retrieved

Jmeter ForEach Controller failing to write variables in original order correctly
I am executing a http request retrieving a json payload with an array of employees. For each record (employee) I need to parse the record for specific fields e.g. firstName, lastName, PersonId and write to a single csv file, incrementing a new row per record.
Unfortunately, the file created has two issues. The PersonId never gets written and secondly the sequence of the values is not consistent with the returned original values. Sometimes I get the same record for lastName with the wrong firstName and vice versa. Not sure if the two issues are related, I suspect my regular expression extract is wrong for a number.
Jmeter setup. (5.2.1)
jmeter setUp
Thread group
+ HTTP Request
++ JSON JMESPath Extractor
+ ForEach Controller
++ Regular Expression Extractor: PersonId
++ Regular Expression Extractor: firstName
++ Regular Expression Extractor: lastName
++ BeanShell PostProcessor
getWorker returns the following payload
jsonPayload
JSON JMESPath Extractor to handle the payload.
{
"items" : [
{
"PersonId" : 398378,
"firstName" : "Sam",
"lastName" : "Shed"
},
{
"PersonId" : 398379,
"firstName" : "Bob",
"lastName" : "House"
}
],
"count" : 2,
"hasMore" : true,
"limit" : 2,
"offset" : 0,
"links" : [
{
"rel" : "self",
"href" : "https://a.site.on.the.internet.com/employees",
"name" : "employees",
"kind" : "collection"
}
]
}
JSON JMESPath Extractor Configuration
Name of created variables: items
JMESPath expressions: items
Match No. -1
Default Values: Not Found
ForEach Controller
ForEach Controller Configuration
Input variable prefix: items
Start Index: Empty
End Index: Empty
Output variable name: items
Add "_"? Checked
Each of the Regular Expression Extracts follow the same pattern as below.
Extract PersonId with Regular Expression
Apply to: Main Sample Only
Field to check: Body
Name of created variable: PersonId
Regular Expression: "PersonId":"(.+?)"
Template: $1$
Match No. Empty
Default Value: PersonId
The final step in the thread is where I write out the parsed results.
BeanShell PostProcessor
PersonNumber = vars.get("PersonNumber");
DisplayName = vars.get("DisplayName");
f = new FileOutputStream("/Applications/apache-jmeter-5.2.1/bin/scripts/getWorker/responses/myText.csv", true);
p = new PrintStream(f);
this.interpreter.setOut(p);
print(PersonId+", "+ PersonNumber+ ", " + DisplayName);
f.close();
I am new to this and looking either for someone to tell me where I screwed up or direct me to a place I can read up on the appropriate topics. (Both are fine). Thank you.
For Each Controller doesn't know the structure of items variable since it is in JSON format. It is capable of just understanding an array and traverses through them. I would suggest to move away from For Each Controller in your case and use the JSON extractor itself for all the values like below
Person ID
First Name
Last Name
Beanshell Sampler Code
import java.io.FileWriter; // Import the FileWriter class
int matchNr = Integer.parseInt(vars.get("personId_C_matchNr"));
log.info("Match number is "+matchNr);
f = new FileOutputStream("myText.csv", true);
p = new PrintStream(f);
for (int i=1; i<=matchNr; i++){
PersonId = vars.get("personId_C_"+i);
FirstName = vars.get("firstName_C_"+i);
LastName = vars.get("lastName_C_"+i);
log.info("Iteration is "+i);
log.info("Person ID is "+PersonId);
log.info("First Name is "+FirstName);
log.info("Last Name is "+LastName);
p.println(PersonId+", "+FirstName+", "+LastName);
}
p.close();
f.close();
Output File
HOW THE ABOVE ACTUALLY WORKS
When you extract values using the matchNr, it goes in a sequential order in which the response has arrived. For example, in your case, Sam & Shed appear as first occurrences and Bob & House appear as subsequent occurrences. Hence JMeter captures them with the corresponding match and stores them as 1st First Name = Sam, 2nd First Name = Bob and so on.
GENERIC STUFF
The regex expression for capturing Person ID which you have used seems to be inaccurate. The appropriate one would be
"PersonId" :(.+?),
and not
"PersonId":"(.+?)"
Move to JSR223 processors instead of Beanshell as they are more performant. Source: Which one is efficient : Java Request, JSR223 or BeanShell Sampler for my script. The migration is pretty simple. Just copy the code that you have in Beanshell and paste it in JSR223.
Close any stream or writer that is open appropriately else it might cause issues when other users are trying to write to the file during load test
In case you are planning to use this file as a subsequent input within JMeter, please note that there is a space between comma and the next element. For example, it is "Sam, Shed" and not "Sam,Shed".JMeter by default does not trim any spaces and will use the value just like that. Hence you might want to take a judicious call regarding that space
Hope this helps!
Since JMeter 3.1 you shouldn't be using Beanshell, go for JSR223 Test Elements and Groovy language for scripting.
Given Groovy has built-in JSON support you shouldn't need any extractors, you can write the data into a file in a single shot like:
new groovy.json.JsonSlurper().parse(prev.getResponseData()).items.each { item ->
new File('myText.csv') << item.get('PersonId') << ',' << item.get('firstName') << ',' << item.get('lastName') << System.getProperty('line.separator')
}
More information: Apache Groovy - Why and How You Should Use It

Read multiple concatenated json objects in Ruby

I have a file that contains multiple JSON objects that are not separated by comma :
{
"field" : "value",
"another_field": "another_value"
} // no comma
{
"field" : "value"
}
Each of the objects standalone is a valid json object.
Is there a way that I can process this file easily?
I know this is NOT a valid json, but unfortunately this file is being generated by a 3rd party tool. I have no option of changing the way the output looks like.
I can't open a text editor and smart-insert commas / square brackets before the run, since this is an automated process (I also really don't want to write code that opens the file and manipulates it).
In .NET there's a library that has this exact feature :
https://stackoverflow.com/a/29480032/2970729
https://www.newtonsoft.com/json/help/html/P_Newtonsoft_Json_JsonReader_SupportMultipleContent.htm
Is there anything equivalent in Ruby?
As long as your file is that simple you might want to do something like this:
# content = File.read(filename)
content =<<-EOF
{
"field" : "value",
"another_field": "another_value"
} // no comma
{
"field" : "value"
}
EOF
require 'json'
JSON.parse("[#{content.gsub(/\}.*?\{/m, '},{')}]")
#=> [{"field"=>"value", "another_field"=>"another_value"}, {"field"=>"value"}]
The yajl-ruby gem enables processing concatenated JSON in Ruby. The parser can read from a String or an IO. Each complete object is yielded to a block.
require 'yajl'
File.open 'file.json' do |f|
Yajl.load f do |object|
# do something with object
end
end
See the documentation for other options (buffer size, symbolized keys, etc).

Jenkins Customize Editable Email Content

In my Jenkins step I have windows batch command which runs a java jar file (java -Dfile.encoding=UTF-8 -jar C:\Test1\Test.jar C:\Test\test.log) and output of which is a String value (verified Jenkins console the string is getting printed) . How will I use this string content and insert in the editable email content body so I can send this content as an email . I wouldn't want the whole Jenkins console in the email only this String. I would assume the string has to be set as an environment variable after the script runs . Not sure how exactly I can use EnvInjPlugin for my scenario if at all it can be.
Try to use pre-send script.
For example You have in log the string like: "this random integer should be in email content: 3432805"
and want to add randomly generated integer to email content.
Set the Default Content with whatever you want but add some
value which will be replaced. For example:
This is the random int from build.log: TO_REPLACE
Then click "Advanced Settings" and add Pre-send Script:
String addThisStringToContent = "";
build.getLog(1000).each() { line ->
java.util.regex.Pattern p = java.util.regex.Pattern.compile("random\\sinteger.+\\:\\s(\\d+)");
java.util.regex.Matcher m = p.matcher(line);
if (m.find()) {
addThisStringToContent = m.group(1);
}
}
if (addThisStringToContent == "") {
logger.println("Proper string not found. Email content has not been updated.");
} else {
String contentToSet = ((javax.mail.Multipart)msg.getContent()).getBodyPart(0).getContent().toString().replace("TO_REPLACE", addThisStringToContent);
msg.setContent(contentToSet, "text/plain");
}
where:
build.getLog(1000) - retrieves the last 1000 lines of build output.
Pattern.compile("random\\sinteger.+\\:\\s(\\d+)") - regex to find the proper string
"text/plain" - Content Type
String contentToSet = ((javax.mail.Multipart)msg.getContent()).getBodyPart(0).getContent().toString().replace("TO_REPLACE", addThisStringToContent); - replaces the string TO_REPLACE with your value
Hope it will help you.
Unfortunately I have not enough reputation to comment Alex' great answer, so I write a new answer. The call
msg.setContent(contentToSet, "text/plain")
has two disadvantages:
Special characters are garbled
An attachment gets lost
So I use the following call to set the modified text
((javax.mail.Multipart)msg.getContent()).getBodyPart(0).setContent(contentToSet, "text/plain;charset=UTF-8")

ElasticSearch: Altering indexed version of text

Before the text in a field is indexed, I want to run code on it to transform it, basically what's going on here https://www.elastic.co/guide/en/elasticsearch/reference/master/gsub-processor.html (but that feature isn't out yet).
For example, I want to be able to transform all . in a field into - for the indexed version.
Any advice? Doing this in elasticsearch-rails.
Use a char_filter where you replace all . into - but this will change the characters of the indexed terms, not the _source itself. Something like this:
"char_filter" : {
"my_mapping" : {
"type" : "mapping",
"mappings" : [
". => -"
]
}
}
or use Logstash with mutate and gsub filter to pre-process the data before being sent to Elasticsearch. Or you do it in your own indexer (whatever that is).

Is it possible to parse dynamic xml-structured log contents with Grok?

Is it feasible using Grok to parse dynamic xml-structured log contents, such as:
<tag_1> contents </tag_1> ... <tag_N> contents </tag_N>
where "tag_*" would be the field name and "contents" - the actual contents.
Therefore the parsed message would look like:
{
"tag_1": [
[
"contents"
]
],
....
"tag_N": [
[
"contents"
]
]
}
Not with grok. You will need to resort to ruby code to parse the XML and toss it into the event structure.
If your XML is super regular (ie has a root element and only one level under it), you could maybe use code like this:
filter {
ruby {
code => "
msg = event['message'].split('><');
for part in msg
endpos = part.index('</')
startpos = part.index('>')
if !endpos.nil? && !startpos.nil? then
tag = part[0,startpos];
text = part[startpos+1,endpos-startpos-1];
event[tag]=text
end
end
"
}
}
If your xml is more complex, you are going to have to resort to a real XML parser and figure out how to use it with logstash (I've never brought an external library into logstash).

Resources