How to remove a line with a quote - parsing

I have a bunch of logs that I what to process with logstash. Between each entry, there is this annoying line with just a double quote, that my pattern can't parse and results in a grokparsefailure. What's the best way of ignoring/removing it?
My input is
2016-09-18 00:00:02,013 UTC, idf="639b26a731284b43beac8b26f829bcab", message="24308 * thread http-nio-8443-exec-4
24308 > host: localhost:8443
24308 > user-agent: curl/7.40.0
"
2016-09-18 00:02:35,555 UTC, idf="7d65da6966ec4c26a685b04ec7bfd851", message="24309 * thread http-nio-8443-exec-1
24309 > host: example.com
24309 > user-agent: Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)
24309 > x-forwarded-for: 69.162.124.236
"
2016-09-18 00:07:35,591 UTC, idf="8998b9c5f2de414182ce3143b4ec39af", message="24310 * thread http-nio-8443-exec-10
24310 > HEAD https://example.com/status
I've tried adding another grok filter at the beginning and then removing the field
grok {
match => { "message" => "(?<onlyquote>\" )"}
remove_field => [ "onlyquote" ]
}
And gsub
mutate {
gsub => [ "message" , "\" \n" , "" ]
}
But both approaches result in even more failures. Any idea?

Use the drop filter:
if (/^"$/) { // match line contains only "
drop {}
}
And put it before your grok filter

Related

yandex-tank-api-client example doesn't work

I don't realize how yandex-tank-api-client example works.
I've made following corrections in tank-example.py:
replaced (the error was otherwise)
84: tsk = tankapi.shoot(*cfg)
with
84: tsk = tankapi.shoot(*cfg, f)
where f - is the function:
def f(status, t):
print "Status: %s" % status
I've replaced example's load.yaml file with my file:
- tanks: ['localhost']
config: ./load.ini
log_name: task-A
download: ['*.log','*.ini']
upload: ['*.ammo']
expected_codes: [0]
Where load.ini is:
[DEFAULT]
use_caching = 0
[phantom]
phantom_path = /home/krylov/prj/yandex/phantom/bin/phantom
phantom_modules_path = /home/krylov/prj/yandex/phantom/lib/phantom
address=localhost:8000
rps_schedule=const(10, 10s)
; Headers and URIs for GET requests
header_http = 1.1
headers = /
[Host: 127.0.0.1]
[Connection: close]
uris = /
/instances
/datastore
/datastore-indexes
After I run yandex-tank-api-server on localhost and tank-example.py script test starts but it cannot leave the configure stage:
2016-04-19 12:44:10,263 [DEBUG] Tank localhost: API returned code 200, contents:
{
"status": "running",
"retcode": null,
"stage_completed": true,
"break": "configure",
"current_stage": "lock",
"tank_status": {},
"failures": []
}
Status: 0
tank.log file ends by lines (you can here full tank.log):
2016-04-19 12:44:00,287 [INFO] yandex_tank_api.worker Changing the next break from lock to configure
2016-04-19 12:44:00,288 [WARNING] yandextank.core.tankcore Lock file present: /var/lock/lunapark_1WgKl6.lock
2016-04-19 12:44:00,288 [DEBUG] root No process[3]: [Errno 3] No such process
2016-04-19 12:44:00,288 [DEBUG] yandextank.core.tankcore Lock PID 13895 not exists, ignoring and trying to remove
I'd like to understand why PID not exists. Does the example have an error?

Checking variables exist before building an array

I am generating a string from a number of components (title, authors, journal, year, journal volume, journal pages). The idea is that the string will be a citation as so:
#citation = article_title + " " + authors + ". " + journal + " " + year + ";" + journal_volume + ":" + journal_pages
I am guessing that some components occasionally do not exist. I am getting this error:
no implicit conversion of nil into String
Is this indicating that it is trying to build the string and one of the components is nil? If so, is there a neat way to build a string from an array while checking that each element exists to circumvent this issue?
It's easier to use interpolation
#citation = "#{article_title} #{authors}. #{journal} #{year}; #{journal_volume}:#{journal_pages}"
Nils will be substituted as empty strings
array = [
article_title, authors ,journal,
year, journal_volume, journal_pages
]
#citation = "%s %s. %s %s; %s:%s" % array
Use String#% format string method.
Demo
>> "%s: %s" % [ 'fo', nil ]
=> "fo: "
Considering that you presumably are doing this for more than one article, you might consider doing it like so:
SEPARATORS = [" ", ". ", " ", ";", ":", ""]
article = ["Ruby for Fun and Profit", "Matz", "Cool Tools for Coders",
2004, 417, nil]
article.map(&:to_s).zip(SEPARATORS).map(&:join).join
# => "Ruby for Fun and Profit Matz. Cool Tools for Coders 2004;417:"

Parse dig output export csv

using a dig command in a shell script and want to output into csv format flags and authority section
dig #ns1.hosangit.com djzah.com +noall +authority +comments
output
; <<>> DiG 9.8.3-P1 <<>> #ns1.hosangit.com djzah.com +noall +authority +comments
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64505
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 0
;; WARNING: recursion requested but not available
;; AUTHORITY SECTION:
djzah.com. 3600 IN NS ns3.eventguyz.com.
djzah.com. 3600 IN NS ns1.eventguyz.com.
djzah.com. 3600 IN NS ns2.eventguyz.com.
Expected output for csv is ( domain, flags (not always these three), authority section (could be 5) ):
djzah.com,qr,aa,rd,ns3.eventguyz.com,ns1.eventguyz.com,ns2.eventguyz.com
I was trying to use awk and/or sed but am having difficulty searching for a pattern like for the flags section
;; flags: (then use a space delimiter until you reach ;)
Then the Authority section, I assume you would search for
;; AUTHORITY SECTION:
Then create an array and only use the last.
I don't know what I'm doing.
#!/usr/bin/awk -f
BEGIN { OFS = "," }
/^;; flags:/ {
sub(/;; flags: /, "")
sub(/;.*$/, "")
$1 = $1
flags = "," $0
next
}
/^;/ || NF < 5 { next }
!($1 in a) {
keys[++k] = $1
}
{
t = $5
sub(/[.][ \t\r]*$/, "", t)
a[$1] = a[$1] "," t
}
END {
for (i = 1; i <= k; ++i) {
key = keys[k]
t = key
sub(/[.][ \t\r]*$/, "", t)
print t flags a[key]
}
}
Usage:
dig #ns1.hosangit.com djzah.com +noall +authority +comments | awk -f script.awk
Test:
awk -f script.awk sample
Output:
djzah.com,qr,aa,rd,ns3.eventguyz.com,ns1.eventguyz.com,ns2.eventguyz
BEGIN { OFS = "," }: Every section in awk always run everytime a record is processed. BEGIN block only run once at start. This basically sets OFS to ,.
/^;; flags:/ matches ;; flags:. The section that is presented by it basically extracts the flags from the record (line). The sub commands basically remove unnecessary parts from the record. $1 = $1 just makes sure $0 is updated with OFS. flags = "," $0 assigns the now comma-separated flags into flags variable. next makes awk jump to the next record.
/^;/ || NF < 5 { next } basically makes awk skip unnecessary lines.
!($1 in a) { keys[++k] = $1 } if $1 e.g. djzah.com. is first encountered, add to keys array.
{ t = $5; sub(/[.][ \t\r]*$/, "", t); a[$1] = a[$1] "," t } adds the value of the 5th column e.g. ns3.eventguyz.com to the collection with the leading . removed.
When processing is finished, END block executes. It iterates through the keys found and prints the data bound to it.

Using flourish lib, double quotes in email subject are not fetched: FETCH command issue

When I check with the FETCH command, I get like this:
2148 FETCH (UID 2159 INTERNALDATE "06-Nov-2013 06:36:15 +0000" RFC822.SIZE 3702 ENVELOPE ("Wed, 6 Nov 2013 12:06:39 +0530" {19} Reg: "test subject" (("karthick kumar" NIL "ngkarthick" "aroxo.com")) (("karthick kumar" NIL "ngkarthick" "aroxo.com")) (("karthick kumar" NIL "ngkarthick" "aroxo.com")) ((NIL NIL "phpkarthick" "gmail.com")) NIL NIL NIL ""))
There is some thing unwanted {19} in fetch command
The {19} followed by a CRLF is called a literal. It says the next 19 characters are to read and fetched without any additional interpretation. They are part of the IMAP protocol, and they are one way that strings with difficult characters can be transferred. They are used to transmit bodies, usually, since they tend to have CRLFs of their own.
In this case, the server has decided to transfer the subject like this, so it doesn't have to escape the quotes, that would otherwise be significant to the protocol.
Perhaps your library is not entirely protocol compliant?
i had solved this issue by doing some pattern replace and successfully resolved and fetch all Email ,
below are my code which help me achive this
please add this code to fMailbox.php after the line
$output = array();
line no nearly 1117
$response = $this -> write('FETCH ' . $messages . ' (UID INTERNALDATE RFC822.SIZE ENVELOPE)');
$hintter = implode(' ', $response);
$pattern = '(\{[0-9]+\})';
if (preg_match($pattern, $hintter, $match)) {
$responses = preg_replace($pattern, '', $response);
$responsesnew = array();
$i = 0;
$j = 0;
foreach ($responses as $reps) {
if (substr(trim($reps), 0, 1) != '*') {
$responsesnew[$i] = $responses[$j - 1] . $reps;
$i++;
}
$j++;
}
} else {
$responsesnew = $response;
}
and change the foreach parameters to
foreach ($responsesnew as $line)
from
foreach ($responsesnew as $line)

How to parse text in Groovy

I need to parse a text (output from a svn command) in order to retrieve a number (svn revision).
This is my code. Note that I need to retrieve all the output stream as a text to do other operations.
def proc = cmdLine.execute() // Call *execute* on the strin
proc.waitFor() // Wait for the command to finish
def output = proc.in.text
//other stuff happening here
output.eachLine {
line ->
def revisionPrefix = "Last Changed Rev: "
if (line.startsWith(revisionPrefix)) res = new Integer(line.substring(revisionPrefix.length()).trim())
}
This code is working fine, but since I'm still a novice in Groovy, I'm wondering if there were a better idiomatic way to avoid the ugly if...
Example of svn output (but of course the problem is more general)
Path: .
Working Copy Root Path: /svn
URL: svn+ssh://svn.company.com/opt/svnserve/repos/project/trunk
Repository Root: svn+ssh://svn.company.com/opt/svnserve/repos
Repository UUID: 516c549e-805d-4d3d-bafa-98aea39579ae
Revision: 25447
Node Kind: directory
Schedule: normal
Last Changed Author: ubi
Last Changed Rev: 25362
Last Changed Date: 2012-11-22 10:27:00 +0000 (Thu, 22 Nov 2012)
I've got inspiration from the answer below and I solved using find(). My solution is:
def revisionPrefix = "Last Changed Rev: "
def line = output.readLines().find { line -> line.startsWith(revisionPrefix) }
def res = new Integer(line?.substring(revisionPrefix.length())?.trim()?:"0")
3 lines, no if, very clean
One possible alternative is:
def output = cmdLine.execute().text
Integer res = output.readLines().findResult { line ->
(line =~ /^Last Changed Rev: (\d+)$/).with { m ->
if( m.matches() ) {
m[ 0 ][ 1 ] as Integer
}
}
}
Not sure it's better or not. I'm sure others will have different alternatives
Edit:
Also, beware of using proc.text. if your proc outputs a lot of stuff, then you could end up blocking when the inputstream gets full...
Here is a heavily commented alternative, using consumeProcessOutput:
// Run the command
String output = cmdLine.execute().with { proc ->
// Then, with a StringWriter
new StringWriter().with { sw ->
// Consume the output of the process
proc.consumeProcessOutput( sw, System.err )
// Make sure we worked
assert proc.waitFor() == 0
// Return the output (goes into `output` var)
sw.toString()
}
}
// Extract the version from by looking through all the lines
Integer version = output.readLines().findResult { line ->
// Pass the line through a regular expression
(line =~ /Last Changed Rev: (\d+)/).with { m ->
// And if it matches
if( m.matches() ) {
// Return the \d+ part as an Integer
m[ 0 ][ 1 ] as Integer
}
}
}

Resources