multi-line grep not making list - grep

I have a grep line that someone else provided for me that I don't know how to change.
The original line was this:
grep id=\"desc\"* $ADDON_SETTINGS | awk -v ORS=, '{gsub(/"/, "");print $2}' | tr -s 'value=' ' ' | sed 's/ //g' | echo "[$(cat)]"
And it pulled from a file that contained the following (this is a sample segment):
<settings>
<setting id="cfirst" value="false" />
<setting id="cicons" value="false" />
<setting id="days" value="3" />
<setting id="delay" value="0.000000" />
<setting id="desc01" value="10" />
<setting id="desc02" value="18" />
<setting id="desc03" value="6" />
<setting id="desc04" value="13" />
<setting id="desc05" value="6" />
...
It pulled out the value for lines with "desc" in the id and resulted in a list:
10,18,6,13,6...
Now the program that generates the data file has changed the data to look like this:
<settings version="2">
<setting id="allc" default="true">false</setting>
<setting id="cfirst" default="true">false</setting>
<setting id="cicons" default="true">false</setting>
<setting id="days">3</setting>
<setting id="delay" default="true">0</setting>
<setting id="desc01">10</setting>
<setting id="desc02">18</setting>
<setting id="desc03">6</setting>
<setting id="desc04">13</setting>
...
I figured this might be easier as I just need to pull the value between > and < but if I use this:
grep id=\"desc\"* $ADDON_SETTINGS | awk -v ORS=, '{">|<";print $3}' | echo "[$(cat)]"
But it's not working right. Not sure what I'm missing.

Try:
$ awk -F'[<>]' '/"desc/{printf "%s%s",c,$3; c=","} END{print""}' file
10,18,6,13
How it works:
-F'[<>]'
This tells awk to use < or > as field separators.
/"desc/{printf "%s%s",c,$3; c=","}
For any line that containst "desc, this tells awk to print the variable c followed by the third field. The third field is the number we want. The variable c is initially the empty string but after the first print, we set it to a comma, ,. This causes the numbers we want to be printed, each separated by a comma.
END{print""}
After we have finished reading the file, this tells awk to print a newline character.

The reason your solution doesn't work is '{">|<";print $3}', which doesn't make sense. The expression ">|<" doesn't do anything. You want a line like:
$ grep id=\"desc input.txt | awk -F"<|>" '{print $3}'
However, a single awk solution is:
awk 'match($0,/id=\"desc[0-9]+\">([0-9]+)/, a){printf "%s%s",sep,a[1];sep=","} END{print ""}' input.txt
10,18,6,13
or: with a file:
$ cat tst.awk
match($0,/id=\"desc[0-9]+\">([0-9]+)/, a){
printf "%s%s",sep,a[1];sep=","
}
END{print ""}
$ awk -f tst.awk input.txt
10,18,6,13
Explanation:
the match with regex id=\"desc[0-9]+\">([0-9]+) puts the number between parentheses in a[1].
Print a[1] with the separator sep, which hasn't a value the first time around.
END: you need to print a newline.

Your content has html/xml format. The proper way would be to use html/xml parsers.
xmlstarlet solution:
Sample input.html content:
<settings version="2">
<setting id="allc" default="true">false</setting>
<setting id="cfirst" default="true">false</setting>
<setting id="cicons" default="true">false</setting>
<setting id="days">3</setting>
<setting id="delay" default="true">0</setting>
<setting id="desc01">10</setting>
<setting id="desc02">18</setting>
<setting id="desc03">6</setting>
<setting id="desc04">13</setting>
</settings>
The job:
res=($(xmlstarlet sel -t -v "//setting[contains(#id, 'desc')]" 1.html | tr '\n' ' '))
This will extract <setting> tag values with attribute id containing "desc" and make them items of the array res
Check the 2nd array item value:
echo ${res[1]}
18

grep digits between desc\d+"> and <
grep -oP 'desc\d+">\K\d+(?=<)' file | paste -sd ","
This will capture digits between desc\d+"> and <.
Note:desc\d+ would resolve to desc01, desc02 etc
-o is used to capture groups
-P tells it is a perl regex
\K is a lookbehind assertion which tells to start capturing data
(?=) is a lookahead assertion which here tells to stop capturing data once < is found

Related

xmllint extract value without xpath

I need to verify xml textvalue of an element in my xml as show below, so far I got the value of all the dmdindex:field but I just need the last one drep.rightsFacet result output. Keep in mind that xmllint version I have does not have xpath so I have to resort to xmllint --shell. Any help is appreciated.
here's the xml file snippet:
<?xml version="1.0" encoding="UTF-8"?>
<dmdindex:dmdindex xmlns:dmdindex="http://www.example.com/example/dmdindex/"
xmlns:functx="http://www.functx.com"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<dmdindex:record>
<dmdindex:field name="drep.yearStart">1881</dmdindex:field>
<dmdindex:field name="drep.pubConcat">[Detroit : Parke, Davis & Co., 1881?]</dmdindex:field>
<dmdindex:field name="drep.rights">blahblahblah</dmdindex:field>
<dmdindex:field name="drep.rightsLink">https://creativecommons.org/publicdomain/mark/1.0/</dmdindex:field>
<dmdindex:field name="drep.rightsFacet">Public domain</dmdindex:field>
</dmdindex:record>
</dmdindex:dmdindex>
here's the command I used
echo "cat //*[local-name()='dmdindex']/*[local-name()='record']/*[local-name()='field']" | xmllint --shell example.xml | sed '/^\/ >/d' | sed 's/<[^>]*.//g'
which returned
-------
Philadelphia : Sunshine Press, [1897]
-------
blahblahblah
-------
https://creativecommons.org/publicdomain/mark/1.0/
-------
Public domain
I only need to grab the value of "Public domain" and ignore the rest. thanks!
You can add a predicate to your xpath to test the value of the name attribute...
echo "cat //*[local-name()='dmdindex']/*[local-name()='record']/*[local-name()='field'][#name='drep.rightsFacet']/text()" | xmllint --shell example.xml | sed '/^\/ >/d'

How to check for some specfic values in web.config using powershell

I am trying to work around finding some specific lines/values in web.config but not sure how to do that. For example if there is a line
<add key="webpages:Version" value="2.0.0.0"/>
how to write a script to find this line in the web.config file also if the values are correct for specific environment
You want to use Select-XML, do a select -expand Node, and then a Where{$_.Key -eq "webpages:Version"} to specify your specific node. It would look something like this:
Select-XML -Path C:\Path\To\web.config -XPath "//add" | Select -ExpandProperty Node | Where{$_.Key -eq "webpages:Version"}
That will output:
key value
--- -----
webpages:Version 2.0.0.0
Or you can get-content into a variable and use -XML $Variable instead of -Path C:\Path\To\file.xml. Like:
[XML]$WebConfig = Get-Content C:\Path\To\web.config
Select-XML -XML $WebConfig -XPath "//add" | Select -ExpandProperty Node | Where{$_.Key -eq "webpages:Version"}

Use awk to parse and modify every CSV field

I need to parse and modify a each field from a CSV header line for a dynamic sqlite create table statement. Below is what works from the command line with the appropriate output:
echo ",header1,header2,header3"| awk 'BEGIN {FS=","}; {for(i=2;i<=NF;i++){printf ",%s text ", $i}; printf "\n"}'
,header1 text ,header2 text ,header3 text
Well, it breaks when it is run from within a bash shell script. I got it to work by writing the output to a file like below:
echo $optionalHeaders | awk 'BEGIN {FS=","}; {for(i=2;i<=NF;i++){printf ",%s text ", $i}; printf "\n"}' > optionalHeaders.txt
This sucks! There are a lot of examples that show how to parse/modify specific Nth fields. This issue requires each field to be modified. Is there a more concise and elegant Awk one liner that can store its contents to a variable rather than writing to a file?
sed is usually the right tool for simple substitutions on a single line. Take your pick:
$ echo ",header1,header2,header3" | sed 's/[^,][^,]*/& text/g'
,header1 text,header2 text,header3 text
$ echo ",header1,header2,header3" | sed -r 's/[^,]+/& text/g'
,header1 text,header2 text,header3 text
The last 1 above requires GNU sed to use EREs instead of BREs. You can do the same in awk using gsub() if you prefer:
$ echo ",header1,header2,header3" | awk '{gsub(/[^,]+/,"& text")}1'
,header1 text,header2 text,header3 text
I found the problem and it was me... I forgot to echo the contents of the variable to the Awk command. Brianadams comment was so simple that forced me to re-look at my code and find the problem! Thanks!
I am ok with resolving this but if anyone wants to propose a more concise and elegant Awk one liner - that would be cool.
You can try the following:
#! /bin/bash
header=",header1,header2,header3"
newhead=$(awk 'BEGIN {FS=OFS=","}; {for(i=2;i<=NF;i++) $i=$i" text"}1' <<<"$header")
echo "$newhead"
with output:
,header1 text,header2 text,header3 text
Instead of modifying fields one by one, another option is with a simple substitution:
echo ",header1,header2,header3" | awk '{gsub(/[^,]+/, "& text", $0); print}'
That is, replace a sequence of non-comma characters with text appended.
Another alternative would be replacing the commas, but due to the irregularities of your header line (first comma must be left alone, no comma at the end), that's a bit less easy:
echo ",header1,header2,header3" | awk '{gsub(/,/, " text,", $0); sub(/^ text,/, "", $0); print $0 " text"}'
Btw, the rough equivalent of the two commands in sed:
echo ",header1,header2,header3" | sed -e 's/[^,]\{1,\}/& text/g'
echo ",header1,header2,header3" | sed -e 's/\(.\),/\1 text,/g' -e 's/$/ text/'

Groovy Pretty Print XML assertion fails

I'm writing a unit test that verify if the xml is formatted correctly, but this is failing and I can't figure out why.
So I decided to test the code of this blog post and test in the Grails console, it also fails.
import groovy.xml.*
def prettyXml = '''\
<?xml version="1.0" encoding="UTF-8"?>
<languages>
<language id="1">Groovy</language>
<language id="2">Java</language>
<language id="3">Scala</language>
</languages>
'''
// Pretty print a non-formatted XML String.
def xmlString = '<languages><language id="1">Groovy</language><language id="2">Java</language><language id="3">Scala</language></languages>'
assert XmlUtil.serialize(xmlString) == prettyXml
Assertion fails with:
Assertion failed:
assert XmlUtil.serialize(xmlString) == prettyXml
| | | |
| | | <?xml version="1.0" encoding="UTF-8"?>
| | | <languages>
| | | <language id="1">Groovy</language>
| | | <language id="2">Java</language>
| | | <language id="3">Scala</language>
| | | </languages>
| | false
| <languages><language id="1">Groovy</language><language id="2">Java</language><language id="3">Scala</language></languages>
<?xml version="1.0" encoding="UTF-8"?>
<languages>
<language id="1">Groovy</language>
<language id="2">Java</language>
<language id="3">Scala</language>
</languages>
I'm using Grails 2.2.1, that uses Groovy 2.0.7, on Windows 7.
Maybe is something related with the OS line separator?
EDIT
I saved both strings to file, and checked with Notepad++
The parsed xml (XmlUtils) have CL+RF as line separator but the prettyXml have only LF. I also tested using \n instead of a multi line declaration, with same result!
Groovy shouldn't use CL+RF always, since this is the Windows line separator?
In the Groovy String/GString docs, it says in relation to multi-line literals:
There[sic] are always represented by the character '\n', regardless of
the line-termination conventions of the host system.
They don't really say why, unfortunately.

simple filtering with `grep` , `awk`, `sed` or whatever else that's capable

I have a file, each line of which can be described by this grammar:
<text> <colon> <fullpath> <comma> <"by"> <text> <colon> <text> <colon> <text> <colon> <text>
Eg.,
needs fixing (Sunday): src/foo/io.c, by Smith : in progress : <... random comment ...>
How do I get the <fullpath> portion, which lies between the first <colon> and the first <comma>
(I'm not very inclined to write a program to parse this, though this looks like it could be done easily with javacc. Hoping to use some built-in tools like sed, awk, ...)
Or with a regex substitution
sed -n 's/^[^:]*:\([^:,]*\),.*/\1/p' file
Linux sed dialect; if on a different platform, maybe you need an -E option and/or take out the backslashes before the round parentheses; or just go with Perl instead;
perl -nle 'print $1 if m/:(.*?),/' file
Assuming the input will be similar to what you have above:
awk '{print $4}' | tr -d ,
For the entire file you can just type the file name next to the awk command to the command I have above.
If you're using bash script to parse this stuff, you don't even need tools like awk or sed.
$ text="needs fixing (Sunday): src/foo/io.c, by Smith : in progress : <... comment ...>"
$ text=${text%%,*}
$ text=${text#*: }
$ echo "$text"
src/foo/io.c
Read about this on the bash man page under Parameter Expansion.
with GNU grep:
grep -oP '(?<=: ).*?(?=,)'
This may find more than one substring if there are subsequent commas in the line.

Resources