Ruby Rails Screen Scrape different results in Rails Console - ruby-on-rails

I'm confused about a difference I'm seeing in Nokogiri commands run from Rails Console and what I get from the same commands run in a Rails Helper.
In Rails Console, I am able to capture the data I want with these commands:
endpoint = "https://basketball-reference.com/leagues/BAA_1947_totals.html"
browser = Watir::Browser.new(:chrome)
browser.goto(endpoint)
#doc_season = Nokogiri::HTML.parse(URI.open("https://basketball-reference.com/leagues/BAA_1947_totals.html"))
player_season_table = #doc_season.css("tbody")
rows = player_season_table.css("tr")
rows.search('.thead').each(&:remove) #THIS WORKED
rows[0].at_css("td").try(:text) # Gets single player name
rows[0].at_css("a").attributes["href"].try(:value) # Gets that player page URL
However, my rails helper that is meant to take those commands and fold them into methods:
module ScraperHelper
def target_scrape(url)
browser = Watir::Browser.new(:chrome)
browser.goto(url)
doc = Nokogiri::HTML.parse(browser.html)
end
def league_year_prefix(year, league = 'NBA')
# aba_seasons = 1968..1976
baa_seasons = 1947..1949
baa_seasons.include?(year) ? league_year = "BAA_#{year}" : league_year = "#{league}_#{year}"
end
def players_total_of_season(year, league = 'NBA')
# always the latter year of the season, first year is 1947 no quotes
# ABA is 1968 to 1976
league_year = league_year_prefix(year, league)
#doc_season = target_scrape("http://basketball-reference.com/leagues/#{league_year}_totals.html")
end
def gather_players_from_season
player_season_table = #doc_season.css("tbody")
rows = player_season_table.css("tr")
rows.search('.thead').each(&:remove)
puts rows[0].at_css("td").try(:text)
puts rows[0].at_css("a").attributes["href"].try(:value)
end
end
On that module, I try to emulate the rails console commands and break them into modules. And to test it out (since I don't have any other functionality or views built yet), I run Rails console, include this helper and run the methods.
But I get wildly different results.
in the gather_players_from_season method, I can see that
player_season_table = #doc_season.css("tbody")
Is no longer grabbing the same data it grabbed when run as a command line by line. It also doesn't like the attributes method here:
puts rows[0].at_css("a").attributes["href"].try(:value)
So my first thought is a difference in gems maybe? Watir is launching the headless browser. Nokogiri isn't causing errors as near as I can tell.

Your first thought of comparing the Gem versions is a great idea, but I am noticing a difference between the two code solutions:
In the Rails Console
the code parses the HTML with URI.open: Nokogiri::HTML.parse(URI.open("some html"))
In the ScraperHelper code
the code does not call URI.open, Nokogiri::HTML.parse("some html")
Perhaps that difference will return different values and make the rest of the ScraperHelper return unexpected results.

Related

Problem when scraping data from webpage. Ruby on rails 5

I'm developing a web-scraper. So, I wrote some code and don't understand why the loop doesn't work? How can help me with that?
scraper_service.rb:
browser = Watir::Browser.new
browser.goto('some_link_here')
browser.is(class: /event--head-block/).each do |event|
event.is(class: /event--more/).button.click
puts "Hello world"
binding.pry
end
So, when I executed the code, I didn't see 'hello world' in the console. In addition, when tried to understand are the class 'event--head-block' present on the web-page, I run browser.element(class: /event--head-block/).exists? and that returns true.
Update
I forget to say that there are 8-10 same classes with name 'event--head-block'. Probably it's the reason?

RoR - Validate string on website, no error

In my Ruby on Rails app I have build a function to validate if a piece of javascript is added to a certain website. When I run this code I don't get any errors in my log, but my app says:
We're sorry, but something went wrong.
If you are the application owner check the logs for more information.
But when I check the logs I don't see any errors. The code I have used is the following:
def validate_installation
data = HTTParty.get(self.website)
url = "http://www.smartnotif.com/sn.js"
if data.body.include? url
return true
else
return false
end
end
When I run this code on my local development machine it runs fine, but when I try to runs this production machine on DigitalOcean I have this problem with the same code, no errors.
Try to include
require 'httparty'
Restart rails server
rails s
Also check the permission of log folder, why it is not writing error in log folder
Also try: Use self keyword as you are calling it as class method
def self.validate_installation
data = HTTParty.get(self.website)
url = "http://www.smartnotif.com/sn.js"
if data.body.include? url
return true
else
return false
end
end

Configuration of ContentfulModel gem is lost after initialization

Completely new Rails 4.2.3 application. The only changes to the Gemfile have been removal of Spring, addition of dotenv, and the latest contentful_rails and contentful_model gems as published on rubygems.org.
For unknown reasons, the configuration details defined in the initializer are gone by the time the app comes up. It's the same object (same value for ContentfulModel.configuration.object_id) but the values that were previously correct are now nil.
I added an initializer as shown in the README.
$ cat config/initializers/contentful_model.rb
ContentfulModel.configure do |config|
byebug
config.access_token = ENV['CONTENTFUL_ACCESS_TOKEN']
config.preview_access_token = ENV['CONTENTFUL_PREVIEW_ACCESS_TOKEN']
config.space = ENV['CONTENTFUL_SPACE']
# config.options = {
#extra options to send to the Contentful::Client
# }
end
And I defined one model, Category.
$ cat app/models/category.rb
class Category < ContentfulModel::Base
self.content_type_id = "[category content type string]"
end
So here's what happens when I fire up the Rails console:
$ rails c
[1, 9] in /home/trevor/code/chef/www-contentful-rails/config/initializers/contentful_model.rb
1: ContentfulModel.configure do |config|
2: config.access_token = ENV['CONTENTFUL_ACCESS_TOKEN']
3: config.preview_access_token = ENV['CONTENTFUL_PREVIEW_ACCESS_TOKEN']
4: config.space = ENV['CONTENTFUL_SPACE']
5: # config.options = {
6: #extra options to send to the Contentful::Client
7: # }
8: byebug
=> 9: end
(byebug) ContentfulModel.configuration
#<ContentfulModel::Configuration:0x00000005bc7be0 #access_token="[my actual token string]", #entry_mapping={}, #preview_access_token="[my actual preview token string]", #space="[my actual space]">
(byebug) continue
/home/trevor/.rvm/gems/ruby-2.2.2#www-contentful-rails/gems/actionpack-4.2.3/lib/action_dispatch/http/mime_type.rb:163: warning: already initialized constant Mime::JSON
/home/trevor/.rvm/gems/ruby-2.2.2#www-contentful-rails/gems/actionpack-4.2.3/lib/action_dispatch/http/mime_type.rb:163: warning: previous definition of JSON was here
Loading development environment (Rails 4.2.3)
2.2.2 :001 > ContentfulModel.configuration
=> #<ContentfulModel::Configuration:0x00000005bc7be0 #access_token=nil, #entry_mapping={"[category content type string]"=>Category}, #preview_access_token=nil, #space=nil>
2.2.2 :002 >
I've spent a bunch of time sifting the gem source and stepping through the debugger without results. I've posted an issue for the project on GitHub because I haven't been able to identify the source of the problem and I have to assume its within the gem. Any assistance with how to troubleshoot this further would be very welcome!
The solution was to use the undocumented approach required due to changes a few months ago.
The contentful_rails gem requires contentful_model (and vice versa), and the only configuration documentation for contentful_model is in that project's README, describing the approach in my question. Configuration made in this way was then completely wiped when contentful_rails was initialized, which expected the configuration to be done in its own initializer.
So I have deleted config/initializers/contentful_model.rb and now my config/intializers/contents_rails.rb file looks like:
ContentfulRails.configure do |config|
config.authenticate_webhooks = true # false here would allow the webhooks to process without basic auth
config.webhooks_username = ENV['CONTENTFUL_WEBHOOK_USERNAME']
config.webhooks_password = ENV['CONTENTFUL_WEBHOOK_PASSWORD']
config.access_token = ENV['CONTENTFUL_ACCESS_TOKEN']
config.preview_access_token = ENV['CONTENTFUL_PREVIEW_ACCESS_TOKEN']
config.space = ENV['CONTENTFUL_SPACE']
config.contentful_options = {
#extra options to send to the Contentful::Client
}
end
Of note is that config.options is no longer a thing; it's config.contentful_options.

Mix debugger commands and ruby code evaluation

I'm currently upgrading an old project from and old version of ruby(1.8.7)/rails(3.0) to 1.9.3/3.1 (as a stepping stone to newer versions).
I'm using gems debugger for 1.9.3 and ruby-debug for 1.8.7
When I run , I can run commands like info variables to get the list and values of all currently-scoped variables:
...
#current_phone = nil
#fields = {}
#global = {:source_type=>"pdf"}
#images = []
#index = {}
#lines = []
...
Also I can run arbitrary ruby code - a useful one I've been using is
File.open("/tmp/new_version", "w"){|f|f.write(#fields)}
which is useful for me to quickly compare between the old version and the new version using a file diff program.
Can I link these together so I can write to a file all the output of info variables? It would be sufficient if I could do
tempvar = info variables
or something along those lines, of course, but that gives
*** NameError Exception: undefined local variable or method `variables' for <ClassWhatever>
instance_variables.map { |v| [v, instance_variable_get(v)] }
Not exactly hash map, but you'll be good with it.

HOW TO: View Redis Data inside Rails application (using Soulmate)

I am new with ruby on rails.
Currently I am using Redis/Soulmate for an autocomplete feature. I am starting up a new loader and putting in my appointments model like so:
loader = Soulmate::Loader.new("appointments")
puts loader.inspect
I get the output:
#<Soulmate::Loader:0x007fdca25bd840 #type="appointments">
But if i begin adding to the loader like so:
loader.add("term"=>"randomappointment", "id"=>1)
HOW do i view the output of this command inside my rails application - I want to see the data that I have just input inside the loader (the soulmate hash). I am trying something like this, but nothing is working:
puts soulmate-data:appointments 1 or
puts soulmate-data["appointments"]
NOTE: I can do this in my terminal using
$ redis-cli
hget soulmate-data:appointments 1
which gives the output:
"{\"term\":\"randomappointment\",\"id\":1}"
Any Ideas? Im using Redis 2.8.19, Rails 4.1.6
I figured it out, i did this using the Soulmate::Matcher class like so:
term = "randomappointment"
result = Soulmate::Matcher.new("appointments").matches_for_term(term)

Resources