Rails - strip xml import from whitespace and line break - ruby-on-rails

I am stuck with something quite simple but really annoying:
I have an xml file with one node, where the content includes line breaks and whitspaces.
Sadly I can't change the xml.
<?xml version="1.0" encoding="utf-8" ?>
<ProductFeed>
ACME Ltd.
Fooproduct
Foo Root :: Bar Category
I get to the node and can read from it without trouble:
url = "http://feeds.somefeed/feed.xml.gz"
#source = open((url), :http_basic_authentication=>["USER", "PW"])
#gz = Zlib::GzipReader.new(#source)
#result = #gz.read
#doc = Nokogiri::XML(#result)
#doc.xpath("/ProductFeed/Vendors/Vendor").each do |manuf|
vendor = manuf.css("Name").first.text
manuf.xpath("//child::Product").each do |product|
product_name = product.css("Name").text
foocat = product.css("Category").text
puts "#{vendor} ---- #{product_name} ---- #{foocat} "
end
end
This results in:
ACME Ltd. ---- Fooproduct ----
Foo Root :: Bar Category
Obviously there are line breaks and tab stops or spaces in the string returned by product.css("Category").text.
Does anyone know how to strip the result from linebreaks and taps or spaces right here?
Alternatively I could do that in the next step, where I do a find on 'foocat' like
barcat = Category.find_by_foocat(foocat)
Thanks for helping!
Val

You could use XSLT to remove all the unnecessary characters.

Related

How Can I Use Config Entries with Dots When Parsing with XmlSlurper

I'm trying to use a groovy Config entry to parse an xml file with XmlSlurper.
Here's the Config file:
sample {
xml {
frompath = "Email.From"
}
}
Here's the XML
<xml>
<Email>
<From>
<Address>foo#bar.com</Address>
<Alias>Foo Bar</Alias>
</From>
<Email>
</xml>
This is what I tried initially:
XmlSlurper slurper = new XmlSlurper()
def record = slurper.parseText((new File("myfile.xml")).text)
def emailFrom = record?."${grailsApplication.config.sample.xml.frompath}".Address.text()
This doesn't work because XmlSlurper allows one to use special characters in path names as long as they're surrounded by quotes, so the app is translating this as:
def emailFrom = record?."Email.From".Address.text()
and not
def emailFrom = record?.Email.From.Address.text()
I tried setting the frompath property to be "Email"."From" and then '"Email"."From"'. I tried tokenizing the property in the middle of the parse statement (don't ask.)
Can someone please point me towards some resources to find out if/how I can do this?
I feel like this issue getting dynamic Config parameter in Grails taglib and this https://softnoise.wordpress.com/2013/07/29/grails-injecting-config-parameters/ may have whispers of a solution, but I need fresh eyes to see it.
The solution in issue getting dynamic Config parameter in Grails taglib is a proper way to deref down such a path. E.g.
def emailFrom = 'Email.From'.tokenize('.').inject(record){ r,it -> r."$it" }
def emailFromAddress = emailFrom.Address.text()
If your path there can get complex and you rather go with the potentially more dangerous way, you could also use Eval. E.g.
def path = "a[0].b.c"
def map = [a:[[b:[c:666]]]] // dummy map, same as xmlslurper
assert Eval.x(map, "x.$path") == 666

Groovy- searching and excretion xml code from log file

I have so many texts in log file but sometimes i got responses as a xml code and I have to cut this xml code and move to other files.
For example:
sThread1....dsadasdsadsadasdasdasdas.......dasdasdasdadasdasdasdadadsada
important xml code to cut and move to other file: <response><important> 1 </import...></response>
important xml code to other file: <response><important> 2 </important...></response>
sThread2....dsadasdsadsadasdasdasdas.......dasdasdasdadasdasdasdadadsada
Hindrance: xml code starting from difference numbers of sign (not always start in the same number of sign)
Please help me with finding method how to find xml code in text
Right now i tested substring() method but xml code not always start from this same sign :(
EDIT:
I found what I wanted, function which I searched was indexOf().
I needed a number of letter where String "Response is : " ending: so I used:
int positionOfXmlInLine = lineTxt.indexOf("<response")
And after this I can cut string to the end of the line :
def cuttedText = lineTxt.substring(positionOfXmlInLine);
So I have right now only a XML text/code from log file.
Next is a parsing XML value like BDKosher wrote under it.
Hoply that will help someone You guys
You might be able to leverage XmlSlurper for this, assuming your XML is valid enough. The code below will take each line of the log, wrap it in a root element, and parse it. Once parsed, it extracts and prints out the value of the <important> element's value attribute, but instead you could do whatever you need to do with the data:
def input = '''
sThread1..sdadassda..sdadasdsada....sdadasdas...
important code to cut and move to other file: **<response><important value="1"></important></response>**
important code to other file: ****<response><important value="3"></important></response>****
sThread2..dsadasd.s.da.das.d.as.das.d.as.da.sd.a.
'''
def parser = new XmlSlurper()
input.eachLine { line, lineNo ->
def output = parser.parseText("<wrapper>$line</wrapper>")
if (!output.response.isEmpty()) {
println "Line $lineNo is of importance ${output.response.important.#value.text()}"
}
}
This prints out:
Line 2 is of importance 1
Line 3 is of importance 3

Read XML file with Nokogiri

I currently have an XML file that is reading correctly except for one part. It is an item list and sometimes one item has multiple barcodes. In my code it only pulls out the first. How can I iterate over multiple barcodes. Please see code below:
def self.pos_import(xml)
Plu.transaction do
Plu.delete_all
xml.xpath('//Item').each do |xml|
plu_import = Plu.new
plu_import.update_pointer = xml.at('Update_Type').content
plu_import.plu = xml.at('item_no').content
plu_import.dept = xml.at('department').content
plu_import.item_description = xml.at('item_description').content
plu_import.price = xml.at('item_price').content
plu_import.barcodes = xml.at('UPC_Code').content
plu_import.sync_date = Time.now
plu_import.save!
end
end
My test XML file looks like this:
<?xml version="1.0" encoding="UTF-16" standalone="no"?>
<items>
<Item>
<Update_Type>2</Update_Type>
<item_no>0000005110</item_no>
<department>2</department>
<item_description>DISC-ALCOHOL PAD STERIL 200CT</item_description>
<item_price>7.99</item_price>
<taxable>No</taxable>
<Barcode>
<UPC_Code>0000005110</UPC_Code>
<UPC_Code>1234567890</UPC_Code>
</Barcode>
</Item>
</Items>
Any ideas how to pull both UPC_Code fields out and write them to my database?
.at will always return a single element. To get an array of elements use xpath like you do to get the list of Item elements.
plu_import.barcodes = xml.xpath('//UPC_Code').map(&:content)
Thanks for all the great tips. It definitely led me in the right direction. The way that I got it to work was just adding a period before the double //.
plu_import.barcodes = xml.xpath('.//UPC_Code').map(&:content)

Sphinx references to other sections containing section number and section title

I am using Sphinx to write a document with lots of references:
.. _human-factor:
The Human Factor
================
...
(see :ref:`human-factor` for details)
The compiled document contains something like this:
(see The Human Factor for details)
Instead I would like to have it formatted like this:
(see 5.1 The Human Factor for details)
I tried to google the solution and I found out that the latex hyperref package can do this but I have no idea how to add this to the Sphinx build.
I resolved it by basically using numsec.py from here: https://github.com/jterrace/sphinxtr
I had to replace the doctree_resolved function with this one to get section number + title (e.g. "5.1 The Human Factor").
def doctree_resolved(app, doctree, docname):
secnums = app.builder.env.toc_secnumbers
for node in doctree.traverse(nodes.reference):
if 'refdocname' in node:
refdocname = node['refdocname']
if refdocname in secnums:
secnum = secnums[refdocname]
emphnode = node.children[0]
textnode = emphnode.children[0]
toclist = app.builder.env.tocs[refdocname]
anchorname = None
for refnode in toclist.traverse(nodes.reference):
if refnode.astext() == textnode.astext():
anchorname = refnode['anchorname']
if anchorname is None:
continue
linktext = '.'.join(map(str, secnum[anchorname]))
node.replace(emphnode, nodes.Text(linktext
+ ' ' + textnode))
To make it work one needs to include the numsec extension in conf.py and also to add :numbered: in the toctree like so:
.. toctree::
:maxdepth: 1
:numbered:

Generate a file list based on an array

I tried a few things but this week i feel like my brain's having holidays and i need to complete this thing.. so i hope someone can help me.
I need to create a filelist based on a hash which is saved into a database. The has looks like this:
['file1', 'dir1/file2', 'dir1/subdir1/file3']
Output should be like this:
file1
dir1
file2
subdir1
file3
in html, preferrably like this (to extend it with js to fold and multiselect)
<ul>
<li>file1
<li>dir1</li>
<ul>
<li>file2</li>
<li>subdir1</li>
<ul>
<li>file3</li>
</ul>
</ul>
</ul>
I'm using Ruby on Rails and try to achieve this in an RJS template. But this don't really matters. You can also help me with some detailed pseudo-code.
Someone know how to solve this?
Edit
Thanks to everyone for these solutions. Listing works, i extended it to a foldable solution to show/hide directory contents. I still have one problem: The code aims to have complete file paths in checkboxes behind the entries for a synchronisation. Based on sris' solution, i can only read the current file and it's subs, but not the whole path from the root. For a better understanding:
Currently:
[x] dir1
[x] dir2
[x] file1
gives me
a checkbox with the same value a sthe text displays, e.g "file1" for [x] file1. But what i need is a full path, e.g "dir1/dir2/file1" for [x] file1.
Does someone have another hint how to add this?
Here's a quick implementation you can use for inspiration. This implementation disregards the order of files in the input Array.
I've updated the solution to save the entire path as you required.
dirs = ['file1', 'dir1/file2', 'dir1/subdir1/file3', 'dir1/subdir1/file5']
tree = {}
dirs.each do |path|
current = tree
path.split("/").inject("") do |sub_path,dir|
sub_path = File.join(sub_path, dir)
current[sub_path] ||= {}
current = current[sub_path]
sub_path
end
end
def print_tree(prefix, node)
puts "#{prefix}<ul>"
node.each_pair do |path, subtree|
puts "#{prefix} <li>[#{path[1..-1]}] #{File.basename(path)}</li>"
print_tree(prefix + " ", subtree) unless subtree.empty?
end
puts "#{prefix}</ul>"
end
print_tree "", tree
This code will produce properly indented HTML like your example. But since Hashes in Ruby (1.8.6) aren't ordered the order of the files can't be guaranteed.
The output produced will look like this:
<ul>
<li>[dir1] dir1</li>
<ul>
<li>[dir1/subdir1] subdir1</li>
<ul>
<li>[dir1/subdir1/file3] file3</li>
<li>[dir1/subdir1/file5] file5</li>
</ul>
<li>[dir1/file2] file2</li>
</ul>
<li>[file1] file1</li>
</ul>
I hope this serves as an example of how you can get both the path and the filename.
Think tree.
# setup phase
for each pathname p in list
do
add_path_to_tree(p)
od
walk tree depth first, emitting HTML
add_path_to_tree is recursive
given pathname p
parse p into first_element, rest
# that is, "foo/bar/baz" becomes "foo", "bar/baz"
add first_element to tree
add_path_to_tree(rest)
I'll leave the optimal data struct (list of lists) for the tree (list of lists) as an exercise.
Expanding on sris's answer, if you really want everything sorted and the files listed before the directories, you can use something like this:
def files_first_traverse(prefix, node = {})
puts "#{prefix}<ul>"
node_list = node.sort
node_list.each do |base, subtree|
puts "#{prefix} <li>#{base}</li>" if subtree.empty?
end
node_list.each do |base, subtree|
next if subtree.empty?
puts "#{prefix} <li>#{base}</li>"
files_first_traverse(prefix + ' ', subtree)
end
puts '#{prefix}</ul>'
end

Resources