I wish to run a test* to check for presence of something like this:
<tr>
<td> name </td> <td> date </td>
</tr>
Code, such as below:
assert_select "tr" do
assert_select "td", name
assert_select "td", date
end
looks plausible, but is not correct, as the below for example (which is not the match required) would also pass:
<tr>
<td> name </td>
</tr>
<tr>
<td> date </td>
</tr>
I’m struggling to see how this should be approached from the documentation of assert_select.
Thank you
Daniel
within a default Rails integration test (I believe this means MiniTest)
I ended up using a regexs on the result of a css_select, which seems a bit inelegant, but worked for my purposes. If there is a better way I'd be interested to hear it. I used something like this:
pars = css_select "tr"
regexs = /<td>#{name}<\/td>.*<td>#{date}<\/td>/m
match = false
pars.each { |i| if i.to_s =~ regexs then match = true end}
assert match
Related
With the help of firePath, I got this:
.//*[#id='#table-row-51535240d7037e70b9000062']/td[1]
Parot of My HTML looks like this:
<table class="table table-bordered table-striped">
<tbody>
<tr>
<tr>
<tr id="#table-row-51535240d7037e70b9000062"> #this is the id that i want to get
<td> 54 </td> #this is the td that i know
<td>
<td>
<td>Open</td>
<td/>
What i really want to do here is, by giving the td value (54), I want to be able to get the id (parse the id), any hints how can i achieve that?
Thanks in advance.
PS: sorry for my English, and for my lack of knowledge :)
First of all your HTML is invalid (because it contains nested <tr> nodes). Nokogiri may be able to parse it, but if you can you should fix it before that.
You can fetch that id by the following ruby code:
doc.at_xpath("//td[contains(text(), '54')]/..")['id']
//td[contains(text(), '54')] will grab all the <td> nodes which contain 54, /.. will go to their parents.
Document#at_xpath will fetch only the first matching item
['id'] will get the attribute of the matching node.
Using jquery
$(function(){
// (i dont know if you have id for that td or not, it will be more easy if u do have id for that td)
console.log($('table tbody tr td:first').closest('tr').attr('id')); // you can remove :first if you want to.
});
Oops, I misread your question, and one more thing, there is a problem in your tr tag.
I have the following HTML snippet:
<divv id="items">
<trr>
<td>
<p>Cars</p>
<a>Toyota</a>
<a>Opel</a>
<a>Audi</a>
</td>
</tr>
<tr>
<td>
<p>Planes</p>
<a>A320</a>
<a>B787</a>
<a>B767</a>
</td>
</tr>
<div/>
What I want is to create a XPath query so I can retrieve only the Cars.
Currently I am using this: //div[#id='items']/tr/td. But with this I get also the Plane items. I do not know how to test for the 'p' tag.
Anyone can help me ?
Thanks.
//div[#id='items']/tr/td[p='Cars']
The last predicate tests the existence of a <p> child element with Cars text content and thus filters out the <td> with <p>Planes</p>.
If picking the first group is enough, then you can use:
//div[#id='items']/tr/td[1]
I have the following HTML code :
<table class="report" width="100%">
<thead>
</thead>
<tbody>
<tr class="alt">
<td>
<a onclick="window.open(this.href);return false;" href="/search/searches/1563/reports/946">56175-746-45619568-noor.fli.zip</a>
</td>
<td class="_"> Report </td>
<td class="_"> 09 Apr 2012</td>
<td class="_"> Noor</td>
<td class="_"> 2.8 MB</td>
<td class="_">Ready</td>
</tr>
I want to click on href="/search/searches/1563/reports/946">56175-746-45619568-noor.fli.zip but I do not want to use XPATH. I tried a lot of things but failed, is there a way to click on this href without using XPATH. Thanks a lot.
You can use the href
br.link(:href => '/search/searches/1563/reports/946').click
or the text
br.link(:text => '56175-746-45619568-noor.fli.zip').click
or you can use variations with regex matches
br.link(:href => /reports/).click
or
br.link(:text => /noor.fli.zip/).click
Is it the only link in that table? or always the first link in that table?
browser.table(:class => 'report').a.click
If there are multiple tables, then you have to figure out how to find the one you want. perhaps by the text inside the table. If in your example the text Noor is unique to that table, then you could try something like this
browser.table(:class => 'report', :text => /Noor/).a.click
or if you know the structure above will persist where the link and the info about the report are on a single table row)
browser.row(:text => /Noor/).a.click
You'd have to try to decide which is going to be the most robust or least brittle
I have been scratching my head over this for a while. Help me out before I start picking my brain.
I have a html document that has an events table which has 'In' and 'Out' as part of the columns. A record can either be an In or Out event. I wan't to only get the rows with values in the 'In' column and then save the text in an event model with the same attributes. The code below is what I have which returns '0'.
#!/usr/bin/env ruby
require 'rubygems'
require 'nokogiri'
doc = Nokogiri::HTML <<-EOS
<table><thead><th>Reference</th><th>Event Date</th><th>Event Details</th><th>In</th><th>Out</th></thead><tbody><tr><td>BCE16</td><td>2011-08-16 11:14:52</td><td>Received from Arap Moi</td><td>30.00</td><td></td></tr><tr><td>B07K2</td><td>2011-08-16 11:10:06</td><td>Sent out to John Doe.</td><td> </td><td>-50.00</td></tr></tbody><tfoot></tfoot></table>
EOS
minus_received = doc.xpath('//td[contains(text(), "Received from")]').each do |node|
node.parent.remove
end
p minus_received.to_s
Human Readable markup
<table>
<thead>
<th>Reference</th>
<th>Event Date</th>
<th>Event Details</th>
<th>In</th>
<th>Out</th>
</thead>
<tbody>
<tr>
<td>BCE16</td>
<td>2011-08-16 11:14:52</td>
<td>Received from Arap Moi.</td>
<td>30.00</td>
<td></td>
</tr>
<tr>
<td>B07K2</td>
<td>2011-08-16 11:10:06</td>
<td>Sent out to John Doe.</td>
<td> </td>
<td>-50.00</td>
</tr>
</tbody>
<tfoot></tfoot>
</table>
I appreciate your help.
You're outputting the value of .each - if you look at doc after your each call finishes, the html only contains the header and John Doe.
I have an HTML document of this format:
<tr><td colspan="4"><span class="fullName">Bill Gussio</span></td></tr>
<tr>
<td class="sectionHeader">Contact</td>
<td class="sectionHeader">Phone</td>
<td class="sectionHeader">Home</td>
<td class="sectionHeader">Work</td>
</tr>
<tr valign="top">
<td class="sectionContent"><span>Screen Name:</span> <span>bhjiggy</span><br><span>Email 1:</span> <span>wmgussio#erols.com</span></td>
<td class="sectionContent"><span>Mobile: </span><span>2404173223</span></td>
<td class="sectionContent"><span>NY</span><br><span>New York</span><br><span>78642</span></td>
<td class="sectionContent"><span>MD</span><br><span>Owings Mills</span><br><span>21093</span></td>
</tr>
<tr><td colspan="4"><hr class="contactSeparator"></td></tr>
<tr><td colspan="4"><span class="fullName">Eddie Osefo</span></td></tr>
<tr>
<td class="sectionHeader">Contact</td>
<td class="sectionHeader">Phone</td>
<td class="sectionHeader">Home</td>
<td class="sectionHeader">Work</td>
</tr>
<tr valign="top">
<td class="sectionContent"><span>Screen Name:</span> <span>eddieOS</span><br><span>Email 1:</span> <span>osefo#wam.umd.edu</span></td>
<td class="sectionContent"></td>
<td class="sectionContent"><span></span></td>
<td class="sectionContent"><span></span></td>
</tr>
<tr><td colspan="4"><hr class="contactSeparator"></td></tr>
So it alternates - chunk of contact info and then a "contact separator". I want to grab the contact info so my first obstacle is to grab the chunks in between the contact separator. I have already figured out the regular expression using rubular. It is:
/<tr><td colspan="4"><span class="fullName">((.|\s)*?)<hr class="contactSeparator">/
You can check on rubular to verify that this isolates chunks.
However my big issue is that I am having trouble with the ruby code. I use the built in match function and make prints, but do not get the results I expect. Here is the code:
page = agent.get uri.to_s
chunks = page.body.match(/<tr><td colspan="4"><span class="fullName">((.|\s)*?)<hr class="contactSeparator">/).captures
chunks.each do |chunk|
puts "new chunk: " + chunk.inspect
end
Note that page.body is just the body of the html document grabbed by Mechanize. The html document is much larger but has this format. So, the unexpected output is below:
new chunk: "Bill Gussio</span></td></tr>\r\n\t<tr>\r\n\t\t<td class=\"sectionHeader\">Contact</td>\r\n\t\t<td class=\"sectionHeader\">Phone</td>\r\n\t\t<td class=\"sectionHeader\">Home</td>\r\n\t\t<td class=\"sectionHeader\">Work</td>\r\n\t</tr>\r\n\t<tr valign=\"top\">\r\n\t\t<td class=\"sectionContent\"><span>Screen Name:</span> <span>bhjiggy</span><br><span>Email 1:</span> <span>wmgussio#erols.com</span></td>\r\n\t\t<td class=\"sectionContent\"><span>Mobile: </span><span>2404173223</span></td>\r\n\t\t<td class=\"sectionContent\"><span>NY</span><br><span>New York</span><br><span>78642</span></td>\r\n\t\t<td class=\"sectionContent\"><span>MD</span><br><span>Owings Mills</span><br><span>21093</span></td>\r\n\t</tr>\r\n\t\r\n\t<tr><td colspan=\"4\">"
new chunk: ">"
There are 2 surprises here for me:
1) There are not 2 matches that contain the chunks of contact info, even though on rubular I have verified that these chunks should be extracted.
2) All of the \r\n\t (line feeds, tabs, etc.) are showing up in the matches.
Can anyone see the issue here?
Alternatively, if anyone knows of a good free AOL contacts importer, that would be great. I have been using blackbook but it keeps failing for me on AOL and I am attempting to fix it. Unfortunately, AOL has no contacts API yet.
Thank you!
See Can you provide some examples of why it is hard to parse XML and HTML with a regex?
for why this is a bad idea. Use an HTML parser instead.
If you're just extracting information out of XML, it might be easier to use something other than regular expressions. XPath is a good tool for extracting info from XML. I believe there are some libraries available for Ruby that support XPath, maybe try REXML:
http://www.germane-software.com/software/rexml/
http://redhanded.hobix.com/inspect/noXpathOnMessyHtmlIsJustAsEasyInRuby.html
Use a HTML parser such as hpricot will save you lots of headaches :)
sudo gem install hpricot
It's mostly written in C, so it's fast as well
Here is How to use it:
http://wiki.github.com/why/hpricot/hpricot-basics
This is the code that parses that HTML. Feel free to suggest something better:
contacts = []
email, mobile = "",""
names = page.search("//span[#class='fullName']")
# Every contact has a fullName node, so for each fullName node, we grab the chunk of contact info
names.each do |n|
# next_sibling.next_sibling skips:
# <tr>
# <td class=\"sectionHeader\">Contact</td>
# <td class=\"sectionHeader\">Phone</td>
# <td class=\"sectionHeader\">Home</td>
# <td class=\"sectionHeader\">Work</td>
# </tr>
# to give us the actual chunk of contact information
# then taking the children of that chunk gives us rows of contact info
contact_info_rows = n.parent.parent.next_sibling.next_sibling.children
# Iterate through the rows of contact info
contact_info_rows.each do |row|
# Iterate through the contact info in each row
row.children.each do |info|
# Get Email. There are two ".next_siblings" because space after "Email 1" element is processed as a sibling
if info.content.strip == "Email 1:" then email = info.next_sibling.next_sibling.content.strip end
# If the contact info has a screen name but no email, use screenname#aol.com
if (info.content.strip == "Screen Name:" && email == "") then email = info.next_sibling.next_sibling.content.strip + "#aol.com" end
# Get Mobile #'s
if info.content.strip == "Mobile:" then mobile = info.next_sibling.content.strip end
# Maybe we can try and get zips later. Right now the zip field can look like the street address field
# so we can not tell the difference. There is no label node
#zip_match = /\A\D*(\d{5})-?\d{4}\D*\z/i.match(info.content.strip)
#zip_match = /\A\D*(\d{5})[^\d-]*\z/i.match(info.content.strip)
end
end
contacts << { :name => n.content, :email => email, :mobile => mobile }
# clear variables
email, mobile = "", ""
end