I tried the following code:
page.execute_script " $('#{selector}').trigger('mouseenter').click();"
I can't use use jquery with capybara. (unknown error (Selenium::WebDriver::Error::JavascriptError))
Can someone suggest to me what I am missing?
I am using capybara (1.1.2), selenium-webdriver(2.29)
You can't have double quotation marks in strings unless you escape them. In this scenario though, that can get messy rather quickly. Try this
page.execute_script("$('#{selector}').trigger('mouseenter').click();")
Related
I am trying to get auto-corrected spelling from Google's home page using Nokogiri.
For example, if I am typing "hw did" and the correct spelling is "how did", I have to get the correct spelling.
I tried with the xpath and css methods, but in both cases, I get the same empty array.
I got the XPath and CSS paths using FireBug.
Here is my Nokogiri code:
#requ=params[:search]
#requ_url=#requ.gsub(" ","+") //to encode the url(if user inputs space than it should be convet into + )
#doc=Nokogiri::HTML(open("https://www.google.co.in/search?q=#{#requ_url}"))
binding.pry
Here are my XPath and CSS selectors:
Using XPath:
pry(#<SearchController>)> #doc.xpath("/html/body/div[5]/div[2]/div[6]/div/div[4]/div/div/div[2]/div/p/a").inspect
=> "[]"
Using CSS:
pry(#<SearchController>)> #doc.css('html body#gsr.srp div#main div#cnt.mdm div.mw div#rcnt div.col div#center_col div#taw div div.med p.ssp a.spell').inner_text()
=> ""
First, use the right tools to manipulate URLs; They'll save you headaches.
Here's how I'd find the right spelling:
require 'nokogiri'
require 'uri'
require 'open-uri'
requ = 'hw did'
uri = URI.parse('https://www.google.co.in/search')
uri.query = URI.encode_www_form({'q' => requ})
doc = Nokogiri::HTML(open(uri.to_s))
doc.at('a.spell').text # => "how did"
it works fine with "how did",check it with "bnglore" or any one word string,it gives an error. the same i was facing in my previous code. it is showing undefined method `text'
It's not that hard to figure out. They're changing the HTML so you have to change your selector. "Inspect" the suggested word "bangalore" and see where it exists in relation to the previous path. Once you know that, it's easy to find a way to access the word:
doc.at('span.spell').next_element.text # => "bangalore"
Don't trust Google to do things the easy way, or even the best way, or be consistent. Just because they return HTML one way for words with spaces, doesn't mean they're going to do it the same way for a single word. I would do it consistently, but they might be trying to discourage you from mining their pages so don't be surprised if you see variations.
Now, you need to figure out how to write code that knows when to use one selector/method or the other. That's for you to do.
Not sure how to do this in Ruby/Rails only thing I could come up with is gsub. Which is giving me an undefined method.
What I am doing is via jQuery $.post is passing data to jRuby. My problem is, some browsers (most) are converting # to %40, and , to %2C which is throwing a wrench in my works. So I want to catch specific ones to covert them to what they are supposed to be literally when "decoded".
I know this is a novice question, and I am almost positive the answer has to be on the web somewhere. But as I said I keep coming up on pages that suggest regex (which i want to avoid, and am not versed well with). Or others that suggest "gsub".
Your string is URL encoded. Try to use CGI::unescape:
require 'cgi'
CGI::unescape('%40') # => "#"
I'm trying to store regexes in a database but they're not working when used in a .sub(), even though the same regex works when used directly in .sub() as a string.
regex = Class.object.field // Class.object is an active record containing "\w*\s\/\s"
mystring = "first / second"
mystring.sub(/#{regex}/, '')
// => nil
mystring.sub(/\w*\s\/\s/, '')
// => second
Any insight appreciated!
Thanks,
Matt.
Editing to correct class/object terminology (thanks) & correcting my 2nd example as I had shown #{} wrapped around the working regex (cut & paste SNAFU).
To answer your question: It is not quite what kind of thing your Class.object is. If it's an ActiveRecord, it won't work.
Edit: You obviously found that the problem is Rails escaping the regexp.
An ActiveRecord cannot "contain" your regular expression directly; the regexp will be in one of the fields of your record. In which case you'd want to do something like regexp = Class.object.field_containing_the_regexp.
Even if that is not the case, I suspect that the problem is that your regexp is something other than a string. You can quickly test this by using
puts "My regexp: #{regexp}"
The string that you will see in the output will be the one that is used for the regexp.
A String is not a Regexp. You have to create a Regexp object first.
regex = Regexp.new("\w*\s\/\s")
Turns out my regexp didn't cater for all cases - \w didn't account for symbols. After checking in rails console, and seeing the screwey escaping I was alreasdy half-way down the wrong track.
Thanks for the help.
I am creating a parser that wards off against spamming and harvesting of emails from a block of text that comes from tinyMCE (so it may or may not have html tags in it)
I've tried regexes and so far this has been successful:
/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i
problem is, i need to ignore all email addresses with mailto hrefs. for example:
test#mail.com
should only return the second email add.
To get a background of what im doing, im reversing the email addresses in a block so the above example would look like this:
moc.liam#tset
problem with my current regex is that it also replaces the one in href. Is there a way for me to do this with a single regex? Or do i have to check for one then the other? Is there a way for me to do this just by using gsub or do I have to use some nokogiri/hpricot magicks and whatnot to parse the mailtos? Thanks in advance!
Here were my references btw:
so.com/questions/504860/extract-email-addresses-from-a-block-of-text
so.com/questions/1376149/regexp-for-extracting-a-mailto-address
im also testing using this:
http://rubular.com/
edit
here's my current helper code:
def email_obfuscator(text)
text.gsub(/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i) { |m|
m = "<span class='anti-spam'>#{m.reverse}</span>"
}
end
which results in this:
<a target="_self" href="mailto:<span class='anti-spam'>moc.liamg#tset</span>"><span class="anti-spam">moc.liamg#tset</span></a>
Another option if lookbehind doesn't work:
/\b(mailto:)?([A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4})\b/i
This would match all emails, then you can manually check if first captured group is "mailto:" then skip this match.
Would this work?
/\b(?<!mailto:)[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i
The (?<!mailto:) is a negative lookbehind, which will ignore any matches starting with mailto:
I don't have Ruby set up at work, unfortunately, but it worked with PHP when I tested it...
Why not just store all the matched emails in an array and remove any duplicates? You can do this easily with the ruby standard library and (I imagine) it's probably quicker/more maintainable than adding more complexity to your regex.
emails = ["email_one#example.com", "email_one#example.com", "email_two#example.com"]
emails.uniq # => ["email_one#example.com", "email_two#example.com"]
Greetings everyone:
I would love to get some information from a huge collection of Google Search Result pages.
The only thing I need is the URLs inside a bunch of <cite></cite> HTML tags.
I cannot get a solution in any other proper way to handle this problem so now I am moving to ruby.
This is so far what I have written:
require 'net/http'
require 'uri'
url=URI.parse('http://www.google.com.au')
res= Net::HTTP.start(url.host, url.port){|http|
http.get('/#hl=en&q=helloworld')}
puts res.body
Unfortunately I cannot use the recommended hpricot ruby gem (because it misses a make command or something?)
So I would like to stick with this approach.
Now that I can get the response body as a string, the only thing I need is to retrieve whatever is inside the ciite(remove an i to see the true name :)) HTML tags.
How should I do that? using regular expression? Can anyone give me an example?
Here's one way to do it using Nokogiri:
Nokogiri::HTML(res.body).css("cite").map {|cite| cite.content}
I think this will solve it:
res.scan(/<cite>([^<>]*)<\/cite>/imu).flatten
# This one to ignore empty tags:
res.scan(/<cite>([^<>]*)<\/cite>/imu).flatten.select{|x| !x.empty?}
If you're having problems with hpricot, you could also try nokogiri which is very similar, and allows you to do the same things.
Split the string on the tag you want. Assuming only one instance of tag (or specify only one split) you'll have two pieces I'll call head and tail. Take tail and split it on the closing tag (once), so you'll now have two pieces in your new array. The new head is what was between your tags, and the new tail is the remainder of the string, which you may process again if the tag could appear more than once.
An example that may not be exactly correct but you get the idea:
head1, tail1 = str.split('<tag>', 1) # finds the opening tag
head2, tail2 = tail1.split('</tag>', 1) # finds the closing tag