How to extract text from a node but not from its children - nokogiri

I'm trying to parse this fragment with nokogiri from a page(so named a var) that contains
...
<dd>
Access
- 19/07/11
</dd>
...
page.at("dd").text shows me the whole text, not only dd's but also it's descendants text too. I mean
"Access - 19/07/11"
How can I extract only "- 19/07/11"?
(this is only an example)

Try page.xpath("//dd/text()").text

Related

Cannot get class name from node after locating it in playwright

I have an SVG object like that:
<svg class="class-a color-green marker" .../>
In Playwright I want to get an exact list of classes of this element. I use the following code to locate it:
page.locator(".status-marker").first
The node is located properly, but when I call evaluate("node => node.className") on it, I get an empty dict, like the locator stripped all information about classes at all.
In general, it doesn't matter how I get the handle of this element, I always get an empty dict on evaluate("node => node.className").
Calling page.locator(".status-marker").first.is_visible() returns True, so the object exists.
Also if I run page.locator(".status-marker").first.evaluate("node => node.outerHTML") I'll get the full HTML of that node, that does have the class name included. I could parse that, but it would be pretty clunky solution.
I found out that I could use expect(locator).to_have_class(), but If the node has more than one class I need to put all of them for it to pass, when I care only on one of them (the other classes are dynamically generated, so I can't even know about them during tests).
Edit:
Here's some additional sample:
assert page.locator(".marker").first.evaluate("node => node.className") == {}
expect(page.locator(".marker").first).to_have_class("text-green-1")
The first assert passes - the evaluate("node => node.className") returns an empty dict. The expect() fails with the following error:
AssertionError: Locator expected to have class 'text-green-1'
E Actual value: inline pr-2 text-green-1 marker svelte-fa s-z-WEjw8Gh1FG
I've found a way to reproduce it (it happens to me in font awesome plugin for svelte):
def test_svelte_fa(page):
page.goto("https://cweili.github.io/svelte-fa/")
item = page.locator(".svelte-fa").first
assert item.is_visible()
assert "svelte-fa" in item.evaluate("node => node.className")
In your example, the className of an SVG is an SVGAnimatedString object. Which is not serializable.
If you do JSON.stringify($('.svelte-fa').className) on the browser, you will see that the value is {}.
Values returned by the evaluate function needs to be serializable.

Strip string from a field with defined separator in invoice report

first of all - I'm still newbie in Odoo so this is maybe explained wrong but I will try.
In inherited invoice_report xml document i have conditional field that needs to be shown - if column (field) in db is equal to another column. To be more precise - if invoice_origin from account_move is equal to name in sale_order.
This is it's code:
<t t-foreach="request.env['sale.order'].search([('name', '=', o.invoice_origin)])" t-as="obj">
For example in database this invoice_origin is [{'invoice_origin': 'S00151-2022'}]
On invoices that are created from more than one sales orders it is this [{'invoice_origin': 'S00123-2022, S00066-2022'}]
How can I strip this data to use in foreach separately the part [{'invoice_origin': 'S00123-2022'}] and separately [{'invoice_origin': 'S00066-2022'}]
Thank you.
You can try to split up the invoice origin and use the result for your existing code:
<t t-set="origin_list" t-value="o.invoice_origin and o.invoice_origin.split(', ') or []" />
<t t-foreach="request.env['sale.order'].search([('name', 'in', origin_list)])" t-as="obj">
<!-- do something -->
</t>

'<' character in property value strips rest of string

When adding a node in Neo4j with Cypher by:
CREATE (a: testNode {text: 'te>st<ignored' })
RETURN a.text
This returns: "te>st".
How come the rest of the string is stripped due to the character '<'?
Neo4j is not ignoring the rest of the string, It'd storing as it is and also returning the same. But there is some issue with the Neo4j browser, It's not showing it properly.
You can verify this by viewing results in text format in the Neo4j browser.
Please find screenshot below:
Neo4j browser considers it as a start of HTML tag and hence doesn't show it properly. You can try adding HTML tags in the text and it will render the same in the output.
Here is an example:
If you add an input tag it shows the input box.
CREATE (a: testNode {text: 'test input <input type="text" name="fname">' })
RETURN a.text

Rails Microsoft Word, XML databinding, repeat rows

Those willing to jump straight to my questions can go to the paragraph "Please help with". You will find there my beginning of implementation, along with short XML samples
The story
The famous problem of inserting repeating content, like table rows, into a word template, using the rails framework.
I decided to implement a 'cleaner' solution for replacing some variables in a Word document with rails, using XML databinding. This solution works very well for non-repetitive content, but for repetitive content, a little extra dirty work must be done and I need help with it.
No C#, No Visual, just plain olde ruby on rails & XML
The databinded document
I have a Word document with some content controls, tagged with "human-readable" text, so my users know what should be inside.
I have used Word 2007 Content Control Toolkit to add some custom XML to a .docx file. Therefore in each .docx I have some customXml/itemsx.xml that contains my custom XML.
I have manually databinded this XML to text content control I have in my word template, using drag & drop with Word 2007 Content Control Toolkit.
The replacing process with nokogiri
Basically I already have some code that replaces every XML node by the corresponding value from a hash. For example if I provide this hash to my function :
variables = {
"some_xml-node" => "some_value"
}
It will properly replace XML in customXml/itemsx.xml of .docx file :
<root> <some> <xml-node>some_value</xml-node></some> </root>
So this is taken care of !
The repetitive content
Now as I said, this works perfectly for non-repetitive content. For repetitive content (in my case I want to repeat some <w:tr> in a document), the solution I'd like to go with, is
Manually insert some tags in word/document.xml of .docx file (this is dirty, but hell I can't think of anything else) before every <tr> that needs to be duplicated
In rails, parse the XML and locate the <tr> that needs duplicating using Nokogiri
Copy the tr as many times as I need
Look at some text inside this <tr>, find the databinding (which looks like <w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/name[1]"
Replace movie[1] by movie[index]
Repeat for every table that needs <tr> duplication
With this solution Therefore I ensure 100% compatibility with my existing system ! It's some kind of preprocessing...
Please help with
Finding an XML comment containing a custom string, and selecting the node just below it (using Nokogiri)
Changing attributes in many sub-nodes of the node found in 1.
XML/Hash samples that could be used (my beginning of implementation after that):
Sample of .docx word/document.xml
<w:document>
<!-- My_Custom_Tag_ID -->
<w:tr someparam="something">
<w:td></w:td>
<w:td><w:sthelse></w:sthelse><w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/name[1]><w:sth>Value</w:sth></w:td>
<w:td></<:td>
</w:tr>
</w:document>
Sample of input parameter repeat_tag hash
repeat_tags_sample = [
{
"tag" => "My_Custom_Tag_ID",
"repeatable-content" => "movie"
},
{
"tag" => "My_Custom_Tag_ID_2",
"repeatable-content" => "cartoons"
}
]
Sample of input parameter contents hash
contents_sample =
{
"movies" => [{"name" => "X-Men",
"year" => 1998,
"property-xxx" => 42
}, { "name" => "X-Men-4",
"year" => 2007,
"property-xxx" => 42
}],
"cartoons" => [{"name" => "Tom_Jerry",
"year" => 1995,
"property-yyy" => "cat"
}, { "name" => "Random_name",
"year" => 2008,
"property-yyy" => 42
}]
}
My beginning of implementation :
def dynamic_table_content(zip, repeat_tags, contents)
doc = zip.find_entry("word/document.xml")
xml = Nokogiri::XML.parse(doc.get_input_dtream)
# repeat_tags_sample = [ {
# "tag" => My_Custom_Tag_ID",
# "repeatable-content" => "movie"},
# ...]
repeat_tags.each do |rpt|
content = contents[rpt[:repeatable-content]]
# content now looks like [
# {"name" => "X-Men",
# "year" => 1998,
# "property-xxx" => 42, ...},
# ...]
content_name = rpt[:repeateable_content].to_s
# the 'movie' of '/root[1]/movies[1]/movie[1]/name[1]' (see below)
puts "Processing #{rpt[:tag]}, adding #{content_name}s"
# Word document.xml sample code looks like this :
# <!-- My_Custom_Tag_ID_inserted_manually -->
# <w:tr ...>
# ...
# <w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/name[1]>
# ...
# </w:tr>
Find a comment containing a custom string, and select the node just below
# Find starting <w:tr > tag located after <!-- rpt[:tag] -->
base_tr_node = find the node after
# Duplicate it as many times as we want.
content.each_with_index do |content, index|
puts "Adding #{content_name} : #{content}.to_s"
new_tr_node = base_tr_node.add_next_sibling(base_tr_node)
# inside this new node there are many
# <w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/name[1]>
# <w:dataBinding w:xpath="/root[1]/movies[1]/movie[1]/year[1]>
# ..../movie[1]/property-xxx[1]
# GOAL : replace every movie[1] by movie[index]
Change attributes in many sub-nodes of the node found in 1.
new_tr_node.change_attributes as shown in (see GOAL in previous comments)
# Maybe, it would be something like
# new_tr_node.gsub("(#{content_name})\[([1-9]+)\]", "\1\[#{index}\]")
# ... But new_tr_node is a nokogiri element so .gsub doesn't exist
end
end
#replace["word/document.xml"] = xml.serialize :save_zip_with => 0
end
I have looked at the DoPE extension for Word documents. It looks great ! But alas I had already done a lot of work, and just now I (almost) finished building my own preprocessor.
What I needed was more complicated than what I originally asked. But nevertheless, the answers would be :
EDIT : fixed bad regex/xpath
# 1. Find a comment containing a custom string, and select the node just below
comment_nodes = doc.xpath("//comment()")
# Loop like comment_nodes.each do |comment|
base_tr_node = comment.next_sibling.next_sibling
# For some reason, need to apply next_sibling twice, thought the comment is indeed just above the <w:tr> node
# 2. Change attributes in many sub-nodes of the node found in 1.
matches = tr_node.search('.//*[name()='w:dataBinding']')
matches.each do |databinding_node|
# replace '.*phase[1].*' by '.*phase[index].*'
databinding_node['w:xpath'].gsub("#{comment.text}\[1\]", "#{comment.text}\[#{index}\]")
end

Hpricot Element intersection

I want to remove all images from a HTML page (actually tinymce user input) which do not meet certain criteria (class = "int" or class = "ext") and I'm struggeling with the correct approach. That's what I'm doing so far:
hbody = Hpricot(input)
#internal_images = hbody.search("//img[#class='int']")
#external_images = hbody.search("//img[#class='ext']")
But I don't know how to find images where the class has the wrong value (not "int" or "ext").
I also have to loop over the elements to check other attributes which are not standard html (I use them for setting internal values like the DB id, which I set in the attribute dbsrc). Can I access these attributes too and is there a way to remove certain elements (which are in the hpricot search result) when they don't meet my criteria?
Thanks for your help!
>> doc = Hpricot.parse('<html><img src="foo" class="int" /><img src="bar" bar="42" /><img src="foobar" class="int"></html>')
=> #<Hpricot::Doc {elem <html> {emptyelem <img class="int" src="foo">} {emptyelem <img src="bar" bar="42">} {emptyelem <img class="int" src="foobar">} </html>}>
>> doc.search("img")[1][:bar]
=> "42"
>> doc.search("img") - doc.search("img.int")
=> [{emptyelem img src"bar" bar"42"}]
Once you have results from search you can use normal array operations. nonstandard attributes are accessible through [].
Check out the not CSS selector.
(hbody."img:not(.int)")
(hbody."img:not(.ext)")
Unfortunately, it doesn't seem you can concat not expressions. You might want to fetch all img nodes and remove those where the .css selector doesn't include neither .int nor .ext.
Additionally, you could use the difference operator to calculate which elements are not part of both collections.
Use the .remove method to remove nodes or elements: Hpricot Altering documentation.

Resources