As in Nokogiri::XML::Element, there is a method called attributes to get all as a hash. While for NodeSet object, there are no such method and we need to specify attribute key to get its value. I know that xpath have the ability to extract attributes but I couldn't think of the solutions of the following situation:
Normally, there is only one attr called match-type in match element document:
<D:match match-type="starts-with">appren</D:match>
But now, I need to assume only matct-type attr is allowed in this element tag:
<D:match caseless="bogus" match-type="starts-with">appren</D:match>
My idea is to get all attributes inside this element and find out the size of the attributes other than 'match-type'.
Any solution that I can do that? Thanks!
This isn't going to directly answer your question, because it's not clear whether you've tried anything. Instead, this code can be modified to do what you want but you're going to need to figure out what to change:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<html>
<body>
<a id="some_id" href="/foo/bar/index.html" class='bold'>anchor text</a>
<a id="some_other_id" href="/foo/bar/index2.html" class='italic'>anchor text</a>
</body>
</html>
EOT
doc.search('a').map{ |node| node.keys.reject{ |k| k == 'id' }.map{ |p| node[p].size }.inject(:+) } # => [23, 26]
Related
There is the structure like:
<div class="parent">
<div>
<div class="fieldRow">...</div>
</div>
<div>
<div class="fieldRow">
<div class="CheckBox">
</div>
</div>
<div>
<div class="fieldRow">...</div>
</div>
<div>
<div class="fieldRow">...</div>
</div>
</div>
In my script I am writing a loop for each of the 4 div's under div[#class='parent'] and aiming to click the checkbox if there is, i.e.
members = page.all(:xpath, '//div[#class='parent'])
members.each do |a|
if **page.has_xpath?(a).find(:xpath, "div[#class='fieldRow']/div[#class='CheckBox']")**
a.find(:xpath, "div[#class='fieldRow']/div[#class='CheckBox']").click
end
end
However I can't look for the correct usage of has_xpath? with xpath including variable.
Please advice? Thank you!
has_xpath? takes an XPath expression (not an element) and returns a boolean (true/false) based on whether there are any elements that match that expression within the current scope - http://www.rubydoc.info/gems/capybara/Capybara/Node/Matchers#has_xpath%3F-instance_method. Since it returns true/false you can't then call find on it. For the example you posted there's no need for XPath or checking for the existence of the elements, just find all the matching elements and call click on them. Something like
page.all('div.parent div.fieldRow div.Checkbox').each { |cb| cb.click }
or
page.all('div.parent div.Checkbox').each { |cb| cb.click }
if the fieldRow class isn't something you really need to check.
Note: this assumes clicking the elements doesn't invalidate any of the other matched elements/change the page.
If you REALLY need to do it with the whole members and looping on them , using XPath, and checking for presence then it would be something like
members = page.all(:xpath, './/div[#class='parent'])
members.each do |a|
if a.has_xpath?(:xpath, ".//div[#class='fieldRow']/div[#class='CheckBox']")
a.find(:xpath, ".//div[#class='fieldRow']/div[#class='CheckBox']").click
end
end
Note: the .// at the beginning of the XPath expressions is needed for scoping to work correctly - see https://github.com/teamcapybara/capybara#beware-the-xpath--trap - which is an issue using CSS selectors doesn't have, so you should really prefer CSS selectors whenever possible.
I have a Thymeleaf template with a rather complex data- attribute, like so:
<div data-dojo-props="required: true, placeholder:'Foo bar baz', more: stuff" ...>
I'd like to have Thymeleaf provide the placeholder, like so:
<div th:data-dojo-props="placeholder:'#{foo.bar.baz}'" ...>
It doesn't work, of course. I'm supposed to use th:attr like so:
<div th:attr="data-dojo-props=placeholder:'#{foo.bar.baz}'" ...>
Which also doesn't work. As soon as you add a : or ' within the th:attr, the template breaks. I also tried escaping them, e.g. \: and \', and also tried using HTML entities, e.g. &38;, but also didn't work.
So I tried th:prependattr and th:appendattr:
<div th:prependattr="data-dojo-props=placeholder:'"
th:attr="data-dojo-props=#{foo.bar.baz}"
th:appendattr="data-dojo-props='"
...>
But they also can't handle : and ', nor escaping them:
<div th:prependattr="data-dojo-props=placeholder&58;&39;"
th:attr="data-dojo-props=#{foo.bar.baz}"
th:appendattr="data-dojo-props=&39;"
...>
Any way to make this work that I'm missing?
You can use parameters in a Thymeleaf message property for example:
Messages.properties:
dojo.props=required: {0}, placeholder: {1}, more: {...}
dojo.props.required=true
dojo.props.placeholder=Foo bar baz
HTML with message properties:
<div th:attr="data-dojo-props=#{dojo.props(#{dojo.props.required}, #{dojo.props.placeholder})}"></div>
Or if you want to get the values from a object:
<div th:attr="data-dojo-props=#{dojo.props(${dojo.props.required}, ${dojo.props.placeholder})}"></div>
Even selectors work:
<div th:attr="data-dojo-props=#{dojo.props(*{dojo.props.required}, *{dojo.props.placeholder})}"></div>
i am making my first steps in writing cucumber features in Ruby On Rails application and am struggling with getting a value of an element.
The structure is something like this:
<div class="selectize-dropdown-content">
<div data-value="test1" data-selectable="" class="option">TEST 1</div>
<div data-value="test2" data-selectable="" class="option">TEST 2</div>
</div>
I would like to get the value of the div element when the data-value is "test1" ... so, TEST 1
Currently I am doing it this way:
within(:xpath, '//div[#class="selectize-dropdown-content"]') do
find(:xpath, '//div[#data-value="' + value + '"]')
end
But it fails for not finding the "within" div.
So, I guess I am doing it wrong.
How does one go about it?
Thx
You need to call text method on the desired element
within('.selectize-dropdown-content') do
find(:xpath, "//div[#data-value='#{value}']").native.text
end
if there's a parent element for your block of code with ID you can do it like:
text = page.find('#parentID div:nth-child(1) div:first-child', visible: true).text
if you don't try it with javascript
text=page.evaluate_script('document.getElementsByClassName("selectize-dropdown-content")[0].getElementsByTagNam("div")[0].value')
I'm trying to get the src value of a block of HTML. I am specifically trying to achieve this using the at_css and not using XPath.
So far all I'm getting is either nil or a blank string.
This is the HTML:
<div class="" id="imageProductContainer">
<a id="idLinkProductMainImage" href='URL'>
<img id="productMainImage" src="SRC.jpg" alt="alt" title="A Title" align="left" class="product_image_productpage_main selectorgadget_selected">
</a>
</div>
The code I have is:
item = page.doc.at_css("#productMainImage img").text.strip unless page.doc.at_css("#productMainImage img").nil?
puts item #prints blank
item = item["src"]
puts item #prints blank
Where page.doc is the Nokogiri HTML element.
If you need the src attribute, you can do it like this:
pace.doc.at_css('#idLinkProductMainImage img').attr('src')
Also, I believe the problem is the way you are getting the img tag. You are trying to get all img tags inside #productMainImage, but this id is the image itself, so it will find nothing.
If you use the link id #idLinkProductMainImage, then you have a img tag to search inside it.
I have a document which look like this:
<div id="block">
link
</div>
I can't get Nokogiri to get me the value of href attribute. I'd like to store the address in a Ruby variable as a string.
html = <<HTML
<div id="block">
link
</div>
HTML
doc = Nokogiri::HTML(html)
doc.xpath('//div/a/#href')
#=> [#<Nokogiri::XML::Attr:0x80887798 name="href" value="http://google.com">]
Or if you wanna be more specific about the div:
>> doc.xpath('//div[#id="block"]/a/#href')
=> [#<Nokogiri::XML::Attr:0x80887798 name="href" value="http://google.com">]
>> doc.xpath('//div[#id="block"]/a/#href').first.value
=> "http://google.com"
doc = Nokogiri::HTML(open("[insert URL here]"))
href = doc.css('#block a')[0]["href"]
The variable href is assigned to the value of the "href" attribute for the <a> element inside the element with id 'block'. The line doc.css('#block a') returns a single item array containing the attributes of #block a. [0] targets that single element, which is a hash containing all the attribute names and values. ["href"] targets the key of "href" inside that hash and returns the value, which is a string containing the url.
Having struggled with this question in various forms, I decided to write myself a tutorial disguised as an answer. It may be helpful to others.
Starting with with this snippet:
require 'rubygems'
require 'nokogiri'
html = <<HTML
<div id="block1">
link1
</div>
<div id="block2">
link2
<a id="tips">just a bookmark</a>
</div>
HTML
doc = Nokogiri::HTML(html)
extracting all the links
We can use xpath or css to find all the elements and then keep only the ones that have an href attribute:
nodeset = doc.xpath('//a') # Get all anchors via xpath
nodeset.map {|element| element["href"]}.compact # => ["http://google.com", "http://stackoverflow.com"]
nodeset = doc.css('a') # Get all anchors via css
nodeset.map {|element| element["href"]}.compact # => ["http://google.com", "http://stackoverflow.com"]
But there's a better way: in the above cases, the .compact is necessary because the searches return the "just a bookmark" element as well. We can use a more refined search to find just the elements that contain an href attribute:
attrs = doc.xpath('//a/#href') # Get anchors w href attribute via xpath
attrs.map {|attr| attr.value} # => ["http://google.com", "http://stackoverflow.com"]
nodeset = doc.css('a[href]') # Get anchors w href attribute via css
nodeset.map {|element| element["href"]} # => ["http://google.com", "http://stackoverflow.com"]
finding a specific link
To find a link within the <div id="block2">
nodeset = doc.xpath('//div[#id="block2"]/a/#href')
nodeset.first.value # => "http://stackoverflow.com"
nodeset = doc.css('div#block2 a[href]')
nodeset.first['href'] # => "http://stackoverflow.com"
If you know you're searching for just one link, you can use at_xpath or at_css instead:
attr = doc.at_xpath('//div[#id="block2"]/a/#href')
attr.value # => "http://stackoverflow.com"
element = doc.at_css('div#block2 a[href]')
element['href'] # => "http://stackoverflow.com"
find a link from associated text
What if you know the text associated with a link and want to find its url? A little xpath-fu (or css-fu) comes in handy:
element = doc.at_xpath('//a[text()="link2"]')
element["href"] # => "http://stackoverflow.com"
element = doc.at_css('a:contains("link2")')
element["href"] # => "http://stackoverflow.com"
find text from a link
And what if you want to find the text associated with a particular link?
Not a problem:
element = doc.at_xpath('//a[#href="http://stackoverflow.com"]')
element.text # => "link2"
element = doc.at_css('a[href="http://stackoverflow.com"]')
element.text # => "link2"
useful references
In addition to the extensive Nokorigi documentation, I came across some useful links while writing this up:
a handy Nokogiri cheat sheet
a tutorial on parsing HTML with Nokogiri
interactively test CSS selector queries
doc = Nokogiri::HTML("HTML ...")
href = doc.css("div[id='block'] > a")
result = href['href'] #http://google.com
data = '<html lang="en" class="">
<head>
<a href="https://example.com/9f40a.css" media="all" rel="stylesheet" /> link1</a>
<a href="https://example.com/4e5fb.css" media="all" rel="stylesheet" />link2</a>
<a href="https://example.com/5s5fb.css" media="all" rel="stylesheet" />link3</a>
</head>
</html>'
Here is my Try for above sample of HTML code:
doc = Nokogiri::HTML(data)
doc.xpath('//#href').map(&:value)
=> [https://example.com/9f40a.css, https://example.com/4e5fb.css, https://example.com/5s5fb.css]
document.css("#block a")["href"]
where document is the Nokogiri HTML parsed.