Nokogiri unable to get by CSS class? - ruby-on-rails

I'm trying to get the bio of a Twitter profile using Nokogiri, and have tried everything to get an element using CSS. The follow returns an empty string:
doc = Nokogiri::HTML(open("https://twitter.com/n"))
puts doc.css('.ProfileHeaderCard-bio').text
However, if I try to output everything in the body, the content is indeed there. So the following works:
puts doc.css('body').text
But selecting by a CSS class fails: puts doc.css('.ProfileHeaderCard-bio').text
Any idea why?
Update:
Apparently Twitter changes CSS classes after load, so a browser's source code does something entirely different than what wget showed.

Related

JSON LD recognized by Google, but not Facebook pixel ( Ruby On Rails)

I have implemented a Json-ld dynamic creation process to boost my SEO. The JSON is created through the use of Jbuilder ( code is in a partial), rendered in a script tag with a type of "application/ld+json". All of it is wrapped up in a content_for, so that I can reuse the logic.
Once it has been implemented, I started to get this error in my console: "[Facebook Pixel] - Unable to parse JSON-LD tag. Malformed JSON found: ' "
I tested my Json-LD on the google structured data tool and everything came back ok.
I've added an hand written JSON-LD in my script tag, instead of my aforementioned logic,
everything looked ok. No error was displayed in the console, and Chrome Facebook Pixel
Helper was able to find my JSON-LD.
Bottom line, it appears that using my dynamic logic with the partials create a random " ' ", which makes no sense for me.
Any of you ever had the same issue, or something similar ?
May be templating engine is messing you up. You might consider using the json-ld gem to validate the output as part of continuous integration (you can also semantically validate the content using other gems).
I’ve had success using JSON-LD in Haml, but I just use to_json from a Hash hierarchy which has always worked well for me.

Test PDF content sent with ActionMailer

I'm trying to test specific content inside a pdf document which is being generated via the prawn gem and attached to an ActionMailer email. I'm using RSpec and Capybara for testing.
I've managed to test the filename like this
expect(ActionMailer::Base.deliveries[0].filename).to eq("my_file.pdf")
I thought that I've read somewhere that I have to test the pdf itself like this but it doesn't work.
`expect(ActionMailer::Base.deliveries[0].body.encoded).to have_content(user.first_name)``
When running the test, I get the following error:
Failure/Error: expect(ActionMailer::Base.deliveries[0].body.encoded).to have_content(user.first_name)
expected to find text "John" in "JVBERi0xLjQKJf////8KMSAwIG9iago8PCAvQ3JlYXRvciA8ZmVmZjAwNTAw MDcyMDA2MTAwNzcwMDZlPgovUHJvZHVjZXIgPGZlZmYwMDUwMDA3MjAwNjEw MDc3MDA2ZT4KPj4KZW5kb2JqCjIgMCBvYmoKPDwgL1R5cGUgL0NhdGFsb2cK L1BhZ2VzIDMgMCBSCj4+CmVuZG9iagozIDAgb2JqCjw8IC9UeXBlIC9QYWdl cwovQ291bnQgMQovS2lkcyBbNSAwIFJdCj4+CmVuZG9iago0IDAgb2JqCjw8 IC9MZW5ndGggMzE2NAo+PgpzdHJlYW0KcQpxCi9UcjEgZ3MKMjA2LjAwMCA2 NTYuMDAwIDIwMC4wMDAgMTAwLjAwMCByZQpTClEKCnEKNDAwLjAwMCAwIDAg NTAuMDAwIDEwNi4wMDAgNzA2LjAwMCBjbQovSTEgRG8KUQpxCi9UcjEgZ3MK MjA2LjAwMCA2NTYuMDAwIDIwMC4wMDAgMTAwLjAwMCByZQpTClEKCkJUCjIx OC40ODI1NTg1OTM3NSA2NDIuODk2IFRkCi9GMS4wIDE4IFRmCls8NTQ+IDEx MC44Mzk4NDM3NSA8NjU2OTZjNmU2MTY4NmQ2NTYyNjU3Mzc0OGE3NDY5Njc3 NTZlNjc+XSBUSgpFVAoKCkJUCjM2IDYyMy4zMjk5OTk5OTk5OTk5IFRkCkVU CgoKQlQKMzYgNjAzLjc2Mzk5OTk5OTk5OTkgVGQKRVQKCgpCVAoyNDguODUy
This text continues a long time.
Maybe it's just me, but this doesn't exactly look "testable" to me. Does someone know to do this? Thanks!
Firstly, your PDF-generation code should really be separate from your mailer code, and should have its own tests separate from your mailer tests. Please do this first.
Once you've separated your PDF-generation code and tests you can generate your PDF and test its content using the pdf-inspector gem, which is helpfully maintained by the same folks who make Prawn. Then in your mailer tests you can simply check whether the file is attached, using something like this.
P.S. The reason the email content looks garbled (JVBERi0xLjQ...) is that email attachments are (usually) Base64-encoded. But even if you decoded it, you might not be able to search the PDF content for a plaintext string without a library like pdf-inspector because it might be compressed (I don't know if Prawn does this by default). But really, your PDF code and tests and your email code and tests should be completely separate.

Why it is returning an empty array while it has content?

I am trying to get auto-corrected spelling from Google's home page using Nokogiri.
For example, if I am typing "hw did" and the correct spelling is "how did", I have to get the correct spelling.
I tried with the xpath and css methods, but in both cases, I get the same empty array.
I got the XPath and CSS paths using FireBug.
Here is my Nokogiri code:
#requ=params[:search]
#requ_url=#requ.gsub(" ","+") //to encode the url(if user inputs space than it should be convet into + )
#doc=Nokogiri::HTML(open("https://www.google.co.in/search?q=#{#requ_url}"))
binding.pry
Here are my XPath and CSS selectors:
Using XPath:
pry(#<SearchController>)> #doc.xpath("/html/body/div[5]/div[2]/div[6]/div/div[4]/div/div/div[2]/div/p/a").inspect
=> "[]"
Using CSS:
pry(#<SearchController>)> #doc.css('html body#gsr.srp div#main div#cnt.mdm div.mw div#rcnt div.col div#center_col div#taw div div.med p.ssp a.spell').inner_text()
=> ""
First, use the right tools to manipulate URLs; They'll save you headaches.
Here's how I'd find the right spelling:
require 'nokogiri'
require 'uri'
require 'open-uri'
requ = 'hw did'
uri = URI.parse('https://www.google.co.in/search')
uri.query = URI.encode_www_form({'q' => requ})
doc = Nokogiri::HTML(open(uri.to_s))
doc.at('a.spell').text # => "how did"
it works fine with "how did",check it with "bnglore" or any one word string,it gives an error. the same i was facing in my previous code. it is showing undefined method `text'
It's not that hard to figure out. They're changing the HTML so you have to change your selector. "Inspect" the suggested word "bangalore" and see where it exists in relation to the previous path. Once you know that, it's easy to find a way to access the word:
doc.at('span.spell').next_element.text # => "bangalore"
Don't trust Google to do things the easy way, or even the best way, or be consistent. Just because they return HTML one way for words with spaces, doesn't mean they're going to do it the same way for a single word. I would do it consistently, but they might be trying to discourage you from mining their pages so don't be surprised if you see variations.
Now, you need to figure out how to write code that knows when to use one selector/method or the other. That's for you to do.

How to strip the CSS code in Rails

I have the rails application which accepts the XML output from another application. For some condition the XML tage content come up with CSS code
For example :
<\/sample/> .headermenu{float:left;no-repeat right;font-size:0.75em; padding-bottom:3px}, #div{float:left} This is the test value from another site <\/sample/>
In my ruby application i have parse the XML content and display the content.
It start displaying CSS content like the above. I want to display strip the CSS code if exist in the content.
Is their any way . we can do this please help...
raw method might help you.It outputs data without escaping a string. Check here http://apidock.com/rails/ActionView/Helpers/RawOutputHelper/raw for more details.
I dont know if this is what you are looking for but you can try css parser by the way whenever you need a rails or ruby gem just search for it at rubygems

using 'puts' to get information from external domain

ive just started with ruby on rails the other day and i was wandering is it possible to using the puts function to get the content of a div from a page on an external page.
something like puts "http://www.example.com #about"
would something like this work ? or would you have to get the entire page and then puts that section that you wanted ?
additionaly if the content on the "example.com" #about div is constantly changing would puts constantly update its output or would it only run the script each time the page is refreshed ?
The open-uri library (for fetching the page) and the Nokogiri gem (for parsing and retrieving specific content) can assist with this.
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open('http://www.example.com/'))
puts doc.at('#about').text
puts will not work that way. Ruby makes parsing HTML fairly easy though. Take a look at the Nokogirl library, and you can use xpath queries to get to the div you want to print out. I believe you would need to reopen the file if the div changes, but I'm not positive about that - you can easily test it (or someone here can confirm or reject that statement).

Resources