Nokogiri get element if type is in brackets - ruby-on-rails

I have a content like this:
[caption id=\"attachment_3182\" align=\"aligncenter\" width=\"800\" caption=\"blah blah\"]<img class=\"size-full wp-image-3182\" title=\"blah\" src=\"http://www.test.com/blah.jpg\" alt=\"\" width=\"800\" height=\"533\" />[/caption]
<div>other code here</div>
I want to get all caption elements from it, so I'm trying to do something like this:
doc.css("[caption]") and doc.xpath('.//[caption]')
but have had no success.

Try doc.css("[caption]").attr("caption")

I transformed [caption] to a <caption> tag. In my case it was:
text.gsub!("[caption", "<caption").gsub!('"]', '">').gsub!("[/caption]", "</caption>")
after that I was able to get the <caption. tag with Nokogiri.

Related

Nokogiri: Get text which is not inside the <a> tag

Take a look at this example:
<li>This is a website, it belongs to John Sulliva</li>
I can get the content of the <li> tag by using:
nodeset = doc.css('li')
I also can get the text inside the <a> tag by using:
nodeset.each do |element|
ahref = element.css('a') // <-- This is a website
name = ahref.text.strip // <--This is a website
end
But how do I get the rest of the text within the <li> tag but without the text from the <a> tag?
From this example, I like to get
", it belongs to John Sullivan"
How can I do this?
This is straightforward using XPath and the text() node test. If you have extracted the lis into nodeset, you can get the text with:
nodeset.xpath('./text()')
Or you can get it directly from the whole doc:
doc.xpath('//li/text()')
This uses the text() node test as part of te XPath expression, not the text Ruby method. It extracts any text nodes that are direct descendants of the li node, so doesn’t include the contents of the a element.
I found a cheap way to get the rest of the text:
ahref = element.css('a')
name = ahref.text.strip
suppl = element.text.strip.gsub(name, '')

Is there a way to use {#select } tag inside array in dust?

Here is the code snippet I tried to achieve index based selection of elements from an array:
{#result}
{#select key={$idx}}
{#lte value=3}
<p>{notes}</p>
<p style="color:grey;">{createdBy}-{createdDate}</p>
{/lte}
{/select}
{/result}
But above code throws error "SyntaxError: Expected end tag for result but it was not found". Can anyone please suggest any fix for this error?
It looks like the error is the curly braces surrounding $idx. Dust references in parameters don't use curly braces (e.g. {#select key=$idx}) or they must have quotes around them (e.g. {#select key="{$idx}"}). So, your template would look something like:
{#result}
{#select key=$idx}
{#lte value=3}
<p>{name}</p>
<p style="color:grey;">{createdBy}-{createdDate}</p>
{/lte}
{/select}
{/result}

Got the right node with Nokogiri, but need to search further

I am using this.
doc = Nokogiri::HTML(open(url))
pic = doc.search "[text()*='hiRes']"
to get this script node:
<script type="text/javascript">
var data = {
'colorImages': { 'initial':
[{"hiRes":"http://ecx.images-joes.com/images
/I/71MBTEP1W9L._UL1500_.jpg","thumb":"http://ecx.images-joes.com/images
/I/41xE2XADIvL._US40_.jpg","large":"http://ecx.images-joes.com/images
/I/41xE2XADIvL.jpg","main":{"http://ecx.images-joes.com/images
/I/71MBTEP1W9L._UX395_.jpg":[395,260],"http://ecx.images-joes.com/images
/I/71MBTEP1W9L._UX500_.jpg":[500,329],"http://ecx.images-joes.com/images
/I/71MBTEP1W9L._UX535_.jpg":[535,352],"http://ecx.images-joes.com/images
/I/71MBTEP1W9L._UX575_.jpg":[575,379]}
and the node keeps going from there..
But the only thing I need to pull out is the entire URL that contains the string. "UL1500" or the URL that follows "hiRes:".. ex. http://ecx.images-joes.com/images/I/71MBTEP1W9L.UL1500.jpg
I looked up the class that Nokogiri returns, and its a Nokogiri::XML::NodeSet
But I'm not sure how to interact with it in order to get what I need?
Thanks
I went from just using Nokogiri to a regex expression.. but ended up finding this and it worked like magic!!
https://stackoverflow.com/a/5939906/4386626
Yeah. It's a NodeSet, because of the generic case.
See: http://www.rubydoc.info/github/sparklemotion/nokogiri/master/Nokogiri/XML/NodeSet#children-instance_method
In this case you could try:
pic.children.first.content

How to extract xpath from element name using Appium on iOS

I'd like to find a way to extract the full element's XPath by using its name.
For instance I got something like this:
name: Moses
type: UIAStaticText
xpath: "//UIAApplication[1]/UIAWindow[1]/UIAScrollView[1]/UIAStaticText[3]"
Now I'd like to find the full xpath using the "Moses" name tag.
SeleniumHelper.GetElement("//UIAStaticText[(#name='Moses')]");
But this doesn't seem to work.
Cheers, Pavel
Try this:
MobileElement obj1 = (MobileElement)driver.findElementByClassName("UIAStaticText");
Or:
MobileElement obj1 = (MobileElement)driver.findElementByClassName("UIAWindow");
Once you get the object, then:
Obj1.getAttribute("name");
If you still see the issue, please attach the screen shot with XML.

Hpricot Element intersection

I want to remove all images from a HTML page (actually tinymce user input) which do not meet certain criteria (class = "int" or class = "ext") and I'm struggeling with the correct approach. That's what I'm doing so far:
hbody = Hpricot(input)
#internal_images = hbody.search("//img[#class='int']")
#external_images = hbody.search("//img[#class='ext']")
But I don't know how to find images where the class has the wrong value (not "int" or "ext").
I also have to loop over the elements to check other attributes which are not standard html (I use them for setting internal values like the DB id, which I set in the attribute dbsrc). Can I access these attributes too and is there a way to remove certain elements (which are in the hpricot search result) when they don't meet my criteria?
Thanks for your help!
>> doc = Hpricot.parse('<html><img src="foo" class="int" /><img src="bar" bar="42" /><img src="foobar" class="int"></html>')
=> #<Hpricot::Doc {elem <html> {emptyelem <img class="int" src="foo">} {emptyelem <img src="bar" bar="42">} {emptyelem <img class="int" src="foobar">} </html>}>
>> doc.search("img")[1][:bar]
=> "42"
>> doc.search("img") - doc.search("img.int")
=> [{emptyelem img src"bar" bar"42"}]
Once you have results from search you can use normal array operations. nonstandard attributes are accessible through [].
Check out the not CSS selector.
(hbody."img:not(.int)")
(hbody."img:not(.ext)")
Unfortunately, it doesn't seem you can concat not expressions. You might want to fetch all img nodes and remove those where the .css selector doesn't include neither .int nor .ext.
Additionally, you could use the difference operator to calculate which elements are not part of both collections.
Use the .remove method to remove nodes or elements: Hpricot Altering documentation.

Resources