I am trying to learn how to get data via a screen scrape and then save it to a model. So far I can grab the data. I say this as if I do:
puts home_team
I get all the home teams returned
get_match.rb #grabbing the data
require 'open-uri'
require 'nokogiri'
module MatchGrabber::GetMatch
FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"
def get_fixtures
doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").text
end
end
Then i want to update my model
match_fixtures.rb
module MatchFixtures
class MatchFixtures
include MatchGrabber::GetMatch
def perform
update_fixtures
end
private
def update_fixtures
Fixture.destroy_all
fixtures = get_fixtures
end
def update_db(matches)
matches.each do |match|
fixture = Fixture.new(
home_team: match.first
)
fixture.save
end
end
end
end
So the next step is where I am getting stuck. First of all I need to put the home_team results into an array?
Second part is I am passing matches through my update_db method but that's not correct, what do I pass through here, the results of the home_team from my update_fixtures method or the method itself?
To run the task I do:
namespace :grab do
task :fixtures => :environment do
MatchFixtures::MatchFixtures.new.perform
end
end
But nothing is saved, but that is to be expected.
Steep learning curve here and would appreciate a push in the right direction.
Calling css(".team-home.teams").text does not return the matching DOM elements as an array, but as a single string.
In order to obtain an array of elements, refactor get fixture into something like this:
get_teams
doc = Nokogiri::HTML(open(FIXTURE_URL))
doc.css(".team-home.teams").map { |el| el.text.strip }
end
This will return an array containing the text of the elements matching your selector, stripped out of blank and new line characters. At this point you can loop over the returned array and pass each team as an argument to your model's create method:
get_teams.each { |team| Fixture.create(home_team: team) }
You could just pass the array directly to the update method:
def update_fixtures
Fixture.destroy_all
update_db(get_fixtures)
end
def update_db(matches)
matches.each {|match| Fixture.create(home_team: match.first) }
end
Or do away with the method all together:
def update_fixtures
Fixture.destroy_all
get_fixtures.each {|match| Fixture.create(home_team: match.first) }
end
Related
I have this method in my models/images.rb model. I am starting with testing and having a hard time coming up with tests for it. Would appreciate your help.
def self.tags
t = "db/data.csv"
#arr = []
csvdata = CSV.read(t)
csvdata.shift
csvdata.each do |row|
row.each_with_index do |l, i|
unless l.nil?
#arr << l
end
end
end
#arr
end
First off a word of advice - CSV is probably the worst imaginable data format and is best avoided unless absolutely unavoidable - like if the client insists that manipulating data in MS Excel is a good idea (it is not).
If you have to use CSV don't use a method name like .tags which can confused for a regular ActiveRecord relation.
Testing methods that read from the file system can be quite difficult.
To start with you might want to alter the signature of the method so that you can pass a file path.
def self.tags(file = "db/data.csv")
# ...
end
That way you can pass a fixture file so that you can test it deterministically.
RSpec.describe Image do
describe "tags" do
let(:file) { Rails.root.join('spec', 'support', 'fixtures', 'tags.csv') }
it 'returns an array' do
expect(Image.tags(file)).to eq [ { foo: 'bar' }, { foo: 'baz' } ]
end
end
end
However your method is very ideosyncratic -
def self.tags
t = "db/data.csv"
#arr = []
self.tags makes it a class method yet you are declaring #arr as an instance variable.
Additionally Ruby's enumerable module provides so many methods for manipulating arrays that using an outer variable in a loop is not needed.
def self.tags(file = "db/data.csv")
csv_data = CSV.read(file)
csv_data.shift
csv_data.compact # removes nil elements
end
I am coming from a C# background and trying to learn Ruby and Ruby on Rails. I have the following Car class - note the build_xml method I need in order to build XML in that syntax and then pass to a WebService
class Car
##array = Array.new
#this will allow us to get list of all instances of cars created if needed
def self.all_instances
##array
end
def initialize(id, model_number, engine_size, no_doors)
# Instance variables
#id = id
#model_number = model_number
#engine_size = engine_size
#no_doors = no_doors
##array << self
end
def build_car_xml
car = { 'abc:Id'=> #id, 'abc:ModelNo' => #model_number, 'abc:ES' => #engine_size, 'abc:ND' => #no_doors}
cars = {'abc:Car' => [car] }
end
end
In another class then I was using this as below:
car1 = Car.new('1', 18, 3.0, 4)
request = car1.build_car_xml
This works as expected and the request is formatted how I need and the webservice returns the results. I now want to expand this however so I can pass in an array of cars and produce the request XML - however I am struggling to get this part working.
So far I have been trying the following (for now I am ok with just the Id changing as it is the only parameter required to be unique):
car_array = []
(1..10).each do |i|
car_array << Car.new(i.to_s, 18, 3.0, 4)
end
Am I correct in saying that I would need to define a new build_car_xml method on my Car class that can take an array of cars and then build the xml so my request call would be something like:
request = Car.build_car_xml(car_array)
What i am unsure of is 1) - is this the correct way of doing things in Ruby and 2) how to construct the method so that it is Building the XML in the correct format in the way it was when I call it on the single object - i.e - I need the namespaces added before the actual value.
def build_car_xml(car_array)
#here is where I am unsure how to contruct this method
end
Possible solution ('abc:Car' is a wrong name, should be Cars if you want it to hold an array):
class Car
...
def self.build_cars_xml(cars)
{ 'abc:Car' => cars.map(&:build_car_xml) }
end
def build_car_xml
{ 'abc:Id'=> #id, 'abc:ModelNo' => #model_number, 'abc:ES' => #engine_size, 'abc:ND' => #no_doors }
end
end
cars =
(1..10).map do |i|
Car.new(i.to_s, 18, 3.0, 4)
end
Car.build_cars_xml(cars)
It doesn't meet your requirements as instance build_car_xml doesn't generate Car namespace, but for me it's some inconsistency. Your XML is actually a collection, even if it has just one element, instance method should not be responsible for collection. Car.build_cars_xml([Car.new(...)] looks more logical to me.
I'm trying to mock PriceInspector#get_latest_price below to test OderForm. There are two orders passed in, hence, I need to return two different values when mocking PriceInspector#get_latest_price. It all works fine with the Supplier model (ActiveRecord) but I can't run a mock on the PriceInspector class:
# inside the test / example
expect(Supplier).to receive(:find).and_return(supplier_1) # first call, works
expect(PriceInspector).to receive(:get_latest_price).and_return(price_item_1_supplier_1) # returns nil
expect(Supplier).to receive(:find).and_return(supplier_2) # second call, works
expect(PriceInspector).to receive(:get_latest_price).and_return(price_item_2_supplier_1) # returns nil
class OrderForm
include ActiveModel::Model
def initialize(purchaser)
#purchaser = purchaser
end
def submit(orders)
orders.each do |supplier_id, order_items|
#supplier = Organization.find(supplier_id.to_i)
#order_item = OrderItem.save(
price_unit_price: PriceInspector.new(#purchaser).get_latest_price.price_unit_price
)
[...]
end
end
end
class PriceInspector
def initialize(purchaser)
#purchaser = purchaser
end
def get_latest_price
[...]
end
end
Edit
Here's the updated test code based on Bogieman's answer:
before(:each) do
expect(Organization).to receive(:find).and_return(supplier_1, supplier_2)
price_inspector = PriceInspector.new(purchaser, item_1)
PriceInspector.stub(:new).and_return price_inspector
expect(price_inspector).to receive(:get_latest_price).and_return(price_item_1_supplier_1)
expect(price_inspector).to receive(:get_latest_price).and_return(price_item_2_supplier_2)
end
it "saves correct price_unit_price for first OrderItem", :focus do
order_form.submit(params)
expect(OrderItem.first.price_unit_price).to be_within(0.01).of(price_item_1_supplier_1.price_unit_price)
end
I think this should fix the instance method problem and allow you to check for the two different returns (provided you pass in the purchaser or a double) :
price_inspector = PriceInspector.new(purchaser)
PriceInspector.stub(:new).and_return price_inspector
expect(price_inspector).to receive(:get_latest_price).and_return(price_item_1_supplier_1)
expect(price_inspector).to receive(:get_latest_price).and_return(price_item_2_supplier_1)
Slowly getting there with what i am trying to achieve. I am grabbing data via screen grab and want to save the data to my model, i have two columns, home_team and away_team. So far i grab the data.
FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"
def get_fixtures # Get me all Home and away Teams
doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").map {|h| h.text.strip }
away_team = doc.css(".team-away.teams").map {|a| a.text.strip }
#team_clean = Hash[:home_team => home_team, :away_team => away_team]
#team_clean = Hash[:team_clean => [Hash[:home_team => home_team, :away_team => away_team]]]
end
I have hashed out the two ways of getting the data into a hash, one is a hash and the other is a hash within a hash, I am not sure which one i need (if any?)
So if i want to save the data received from my home_team i run a rake task to do this
def update_fixtures #rake task method
Fixture.destroy_all
get_fixtures.each {|home| Fixture.create(:home_team => home )}
end
What i want to achieve is to be able to save home_team and away_team at the same time. Do i need to access the data within the hash, if so how? Bit lost here, but this is the first time i am attempting this
any help appreciated
Try this,
FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"
def get_fixtures # Get me all Home and away Teams
doc = Nokogiri::HTML(open(FIXTURE_URL))
matches = doc.css('tr.preview')
matches.each do |match|
home_team = match.css('.team-home').text.strip
away_team = match.css('.team-away').text.strip
Fixture.create!(home_team: home_team, away_team: away_team)
end
end
This will loop through the matches and create a new Fixture with away and home teams for each match.
Edit:
Added .text.strip
Edit 2:
This should get you the dates too,
FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"
def get_fixtures # Get me all Home and away Teams
doc = Nokogiri::HTML(open(FIXTURE_URL))
days = doc.css('#fixtures-data h2').each do |h2_tag|
date = Date.parse(h2_tag.text.strip)
matches = h2_tag.xpath('following-sibling::*[1]').css('tr.preview')
matches.each do |match|
home_team = match.css('.team-home').text.strip
away_team = match.css('.team-away').text.strip
Fixture.create!(home_team: home_team, away_team: away_team, date: date)
end
end
end
It's a bit more complicated than the previous code because it has to use some XPath to call the next HTML element after the h2 tag containing the date.
It loops through all the h2 html tags in the div#fixtures-data HTML then grabs the table tag directly below/after each h2.
I am using Nokogiri to grab data from a webpage, so far i can save to one column in the model
def update_fixtures #rake task method
Fixture.destroy_all
get_fixtures.each {|match| Fixture.create(home_team: match )}
end
def get_fixtures # Get me all Home Teams
doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").map {|h| h.text.strip }
end
What I am wondering is the most efficient way to save to 2, 3 or 4 columms at the same time
So as an example I have another column called away_team and I would grad that data in the same way as the home team
away_team = doc.css(".team-away.teams").map {|a| a.text.strip }
is it advisable to put this within the get_fixtures method? and then add to the update_fixtures with something like
def update_fixtures #rake task method
Fixture.destroy_all
get_fixtures.each {|match| Fixture.create(home_team: match, away_team: match )}
end
After trying this the same data gets posted to the home and away columns.Which after reading back i can see why (I think its because match is only grabbing the home_team data?). How can i pass the attributes of the away team along with the home team?
This is all very new so any help provided is appreciated
This isn't the right approach because the variables home_team and away_team both are using the same common match and thus you are getting the same data for both.
Do the following:
UPDATE:
Your model:
attr_accessible :home_team, :away_team
def update_fixtures #rake task method
Fixture.destroy_all
doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").map {|h| h.text.strip }
away_team = doc.css(".team-away.teams").map {|a| a.text.strip }
Fixture.create(home_team: home_team, away_team: away_team)
end