I need to verify xml textvalue of an element in my xml as show below, so far I got the value of all the dmdindex:field but I just need the last one drep.rightsFacet result output. Keep in mind that xmllint version I have does not have xpath so I have to resort to xmllint --shell. Any help is appreciated.
here's the xml file snippet:
<?xml version="1.0" encoding="UTF-8"?>
<dmdindex:dmdindex xmlns:dmdindex="http://www.example.com/example/dmdindex/"
xmlns:functx="http://www.functx.com"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<dmdindex:record>
<dmdindex:field name="drep.yearStart">1881</dmdindex:field>
<dmdindex:field name="drep.pubConcat">[Detroit : Parke, Davis & Co., 1881?]</dmdindex:field>
<dmdindex:field name="drep.rights">blahblahblah</dmdindex:field>
<dmdindex:field name="drep.rightsLink">https://creativecommons.org/publicdomain/mark/1.0/</dmdindex:field>
<dmdindex:field name="drep.rightsFacet">Public domain</dmdindex:field>
</dmdindex:record>
</dmdindex:dmdindex>
here's the command I used
echo "cat //*[local-name()='dmdindex']/*[local-name()='record']/*[local-name()='field']" | xmllint --shell example.xml | sed '/^\/ >/d' | sed 's/<[^>]*.//g'
which returned
-------
Philadelphia : Sunshine Press, [1897]
-------
blahblahblah
-------
https://creativecommons.org/publicdomain/mark/1.0/
-------
Public domain
I only need to grab the value of "Public domain" and ignore the rest. thanks!
You can add a predicate to your xpath to test the value of the name attribute...
echo "cat //*[local-name()='dmdindex']/*[local-name()='record']/*[local-name()='field'][#name='drep.rightsFacet']/text()" | xmllint --shell example.xml | sed '/^\/ >/d'
I am trying to work around finding some specific lines/values in web.config but not sure how to do that. For example if there is a line
<add key="webpages:Version" value="2.0.0.0"/>
how to write a script to find this line in the web.config file also if the values are correct for specific environment
You want to use Select-XML, do a select -expand Node, and then a Where{$_.Key -eq "webpages:Version"} to specify your specific node. It would look something like this:
Select-XML -Path C:\Path\To\web.config -XPath "//add" | Select -ExpandProperty Node | Where{$_.Key -eq "webpages:Version"}
That will output:
key value
--- -----
webpages:Version 2.0.0.0
Or you can get-content into a variable and use -XML $Variable instead of -Path C:\Path\To\file.xml. Like:
[XML]$WebConfig = Get-Content C:\Path\To\web.config
Select-XML -XML $WebConfig -XPath "//add" | Select -ExpandProperty Node | Where{$_.Key -eq "webpages:Version"}
I need to parse and modify a each field from a CSV header line for a dynamic sqlite create table statement. Below is what works from the command line with the appropriate output:
echo ",header1,header2,header3"| awk 'BEGIN {FS=","}; {for(i=2;i<=NF;i++){printf ",%s text ", $i}; printf "\n"}'
,header1 text ,header2 text ,header3 text
Well, it breaks when it is run from within a bash shell script. I got it to work by writing the output to a file like below:
echo $optionalHeaders | awk 'BEGIN {FS=","}; {for(i=2;i<=NF;i++){printf ",%s text ", $i}; printf "\n"}' > optionalHeaders.txt
This sucks! There are a lot of examples that show how to parse/modify specific Nth fields. This issue requires each field to be modified. Is there a more concise and elegant Awk one liner that can store its contents to a variable rather than writing to a file?
sed is usually the right tool for simple substitutions on a single line. Take your pick:
$ echo ",header1,header2,header3" | sed 's/[^,][^,]*/& text/g'
,header1 text,header2 text,header3 text
$ echo ",header1,header2,header3" | sed -r 's/[^,]+/& text/g'
,header1 text,header2 text,header3 text
The last 1 above requires GNU sed to use EREs instead of BREs. You can do the same in awk using gsub() if you prefer:
$ echo ",header1,header2,header3" | awk '{gsub(/[^,]+/,"& text")}1'
,header1 text,header2 text,header3 text
I found the problem and it was me... I forgot to echo the contents of the variable to the Awk command. Brianadams comment was so simple that forced me to re-look at my code and find the problem! Thanks!
I am ok with resolving this but if anyone wants to propose a more concise and elegant Awk one liner - that would be cool.
You can try the following:
#! /bin/bash
header=",header1,header2,header3"
newhead=$(awk 'BEGIN {FS=OFS=","}; {for(i=2;i<=NF;i++) $i=$i" text"}1' <<<"$header")
echo "$newhead"
with output:
,header1 text,header2 text,header3 text
Instead of modifying fields one by one, another option is with a simple substitution:
echo ",header1,header2,header3" | awk '{gsub(/[^,]+/, "& text", $0); print}'
That is, replace a sequence of non-comma characters with text appended.
Another alternative would be replacing the commas, but due to the irregularities of your header line (first comma must be left alone, no comma at the end), that's a bit less easy:
echo ",header1,header2,header3" | awk '{gsub(/,/, " text,", $0); sub(/^ text,/, "", $0); print $0 " text"}'
Btw, the rough equivalent of the two commands in sed:
echo ",header1,header2,header3" | sed -e 's/[^,]\{1,\}/& text/g'
echo ",header1,header2,header3" | sed -e 's/\(.\),/\1 text,/g' -e 's/$/ text/'
I'm writing a unit test that verify if the xml is formatted correctly, but this is failing and I can't figure out why.
So I decided to test the code of this blog post and test in the Grails console, it also fails.
import groovy.xml.*
def prettyXml = '''\
<?xml version="1.0" encoding="UTF-8"?>
<languages>
<language id="1">Groovy</language>
<language id="2">Java</language>
<language id="3">Scala</language>
</languages>
'''
// Pretty print a non-formatted XML String.
def xmlString = '<languages><language id="1">Groovy</language><language id="2">Java</language><language id="3">Scala</language></languages>'
assert XmlUtil.serialize(xmlString) == prettyXml
Assertion fails with:
Assertion failed:
assert XmlUtil.serialize(xmlString) == prettyXml
| | | |
| | | <?xml version="1.0" encoding="UTF-8"?>
| | | <languages>
| | | <language id="1">Groovy</language>
| | | <language id="2">Java</language>
| | | <language id="3">Scala</language>
| | | </languages>
| | false
| <languages><language id="1">Groovy</language><language id="2">Java</language><language id="3">Scala</language></languages>
<?xml version="1.0" encoding="UTF-8"?>
<languages>
<language id="1">Groovy</language>
<language id="2">Java</language>
<language id="3">Scala</language>
</languages>
I'm using Grails 2.2.1, that uses Groovy 2.0.7, on Windows 7.
Maybe is something related with the OS line separator?
EDIT
I saved both strings to file, and checked with Notepad++
The parsed xml (XmlUtils) have CL+RF as line separator but the prettyXml have only LF. I also tested using \n instead of a multi line declaration, with same result!
Groovy shouldn't use CL+RF always, since this is the Windows line separator?
In the Groovy String/GString docs, it says in relation to multi-line literals:
There[sic] are always represented by the character '\n', regardless of
the line-termination conventions of the host system.
They don't really say why, unfortunately.
I have a file, each line of which can be described by this grammar:
<text> <colon> <fullpath> <comma> <"by"> <text> <colon> <text> <colon> <text> <colon> <text>
Eg.,
needs fixing (Sunday): src/foo/io.c, by Smith : in progress : <... random comment ...>
How do I get the <fullpath> portion, which lies between the first <colon> and the first <comma>
(I'm not very inclined to write a program to parse this, though this looks like it could be done easily with javacc. Hoping to use some built-in tools like sed, awk, ...)
Or with a regex substitution
sed -n 's/^[^:]*:\([^:,]*\),.*/\1/p' file
Linux sed dialect; if on a different platform, maybe you need an -E option and/or take out the backslashes before the round parentheses; or just go with Perl instead;
perl -nle 'print $1 if m/:(.*?),/' file
Assuming the input will be similar to what you have above:
awk '{print $4}' | tr -d ,
For the entire file you can just type the file name next to the awk command to the command I have above.
If you're using bash script to parse this stuff, you don't even need tools like awk or sed.
$ text="needs fixing (Sunday): src/foo/io.c, by Smith : in progress : <... comment ...>"
$ text=${text%%,*}
$ text=${text#*: }
$ echo "$text"
src/foo/io.c
Read about this on the bash man page under Parameter Expansion.
with GNU grep:
grep -oP '(?<=: ).*?(?=,)'
This may find more than one substring if there are subsequent commas in the line.