Nokogiri Scraping In Rails - ruby-on-rails

So I have this code in my index action, would love to move it to a model, just a little confused on how to do it.
Original Code
def index
urls = %w[http://cltampa.com/blogs/potlikker http://cltampa.com/blogs/artbreaker http://cltampa.com/blogs/politicalanimals http://cltampa.com/blogs/earbuds http://cltampa.com/blogs/dailyloaf http://cltampa.com/blogs/bedpost]
#final_images = []
#final_urls = []
urls.each do |url|
blog = Nokogiri::HTML(open(url))
images = blog.xpath('//*[#class="postBody"]/div[1]//img/#src')
images.each do |image|
#final_images << image
end
story_path = blog.xpath('//*[#class="postTitle"]/a/#href')
story_path.each do |path|
#final_urls << path
end
end
end
I tested this code in my model and it works perfectly for one url, just not sure how to integrate all of the urls like the original code.
New Code
Model
class Photocloud < ActiveRecord::Base
attr_reader :url, :data
def initialize(url)
#url = url
end
def data
#data ||= Nokogiri::HTML(open(url))
end
def get_elements(path)
data.xpath(path)
end
end
Controller
def index
#scraper = Photocloud.new('http://cltampa.com/blogs/artbreaker')
#photos = #scraper.get_elements('//*[#class="postBody"]/div[1]//img/#src')
#story_urls = #scraper.get_elements('//*[#class="postBody"]/div[1]//img/#src')
end
My main questions are how would I initialize multiple urls and loop through them like my original code. I have tried different things but feel like I have hit a wall. I need to save them to the database, but would like to get this working first. Any help is greatly appreciated.
Updated Controller - WIP
def index
start_urls = %w[http://cltampa.com/blogs/potlikker
http://cltampa.com/blogs/artbreaker
http://cltampa.com/blogs/politicalanimals
http://cltampa.com/blogs/earbuds
http://cltampa.com/blogs/dailyloaf
http://cltampa.com/blogs/bedpost]
#scraper = Photocloud.new(start_urls)
#images =
#paths =
end
Need some help with this part...

It seems that you don't persist scraped images and paths to the database so Photocloud doesn't need to inherit from ActiveRecord::Base - it can be just a plain old ruby object (PORO):
class Photocloud
attr_reader :start_urls
attr_accessor :images, :paths
def initialize(start_urls)
#start_urls = start_urls
#images = []
#paths = []
end
def scrape
start_urls.each do |start_url|
blog = Nokogiri::HTML(open(url))
scrape_images(blog)
scrape_paths(blog)
end
end
private
def scrape_images(blog)
images = blog.xpath('//*[#class="postBody"]/div[1]//img/#src')
images.each do |image|
images << image
end
end
def scrape_paths(blog)
story_path = blog.xpath('//*[#class="postTitle"]/a/#href')
story_path.each do |path|
paths << path
end
end
end
In controller:
scraper = Photocloud.new(start_urls)
scraper.scrape
#images = scraper.images
#paths = scraper.paths
This is only one of the possibilities how you could structure code, of course.

Related

Unable to convert unpermitted parameters to hash

I've followed multiple stack overflow posts regarding this same issue, but I don't think I have the rails proficiency to know how to apply these fixes to my code.
Been following an old railscast show: http://railscasts.com/episodes/217-multistep-forms
And I'm aware that the issue is something to do with permitting objects/hashes but I just don't understand it all.
The error I'm getting is from this line of code in my controller:
session[:zerch_params].deep_merge!(params[:zerch]) if params[:zerch]
Controller
class ZerchesController < InheritedResources::Base
def index
#zerches = Zerch.all
end
def show
#zerch = Zerch.find(params[:id])
end
def new
session[:zerch_params] ||= {}
#zerch = Zerch.new(session[:zerch_params])
#zerch.current_step = session[:zerch_step]
end
def create
session[:zerch_params].deep_merge!(params[:zerch]) if params[:zerch]
#zerch = Zerch.new(session[:zerch_params])
#zerch.current_step = session[:zerch_step]
if #zerch.valid?
if params[:back_button]
#zerch.previous_step
elsif #zerch.last_step?
#zerch.save if #zerch.all_valid?
else
#zerch.next_step
end
session[:zerch_step] = #zerch.current_step
end
if #zerch.new_record?
render "new"
else
session[:zerch_step] = session[:zerch_params] = nil
flash[:notice] = "zerch complete!"
redirect_to #zerch
end
end
private
def zerch_params
params.require(:zerch).permit(:location, :category, :price)
end
end
So I was able to solve this from the model and in the controller:
I still had
attr_accessor
while also having params in private. I removed this, and also in the controller I changed the line of code from this:
session[:zerch_params].deep_merge!(params[:zerch]) if params[:zerch]
to this:
session[:zerch_params].deep_merge!(params.permit![:zerch]) if params[:zerch]

One method for two models. How to pass name of model as variable to controller?

I have two methods in two different controllers (Posts & Boards). They are almost same. The difference is only model-instance-association name. To DRY this I think to write the method in module, but how to share it between Post and Board?
def init_post_comments
#user = current_user
a = #user.posts.pluck(:id) # not very nice...
b=params[:post_ids] ||= []
b = b.map(&:to_i)
follow = b - a
unfollow = a - b
follow.each do |id| # checkbox just checked
#post = Post.find_by_id(id)
if #post.users.empty?
#post.update_attribute(:new_follow, true)
end
#user.posts << #post
end
unfollow.each do |id| # if checkbox was unchecked
#post = Post.find_by_id(id)
remove_post_from_user(#post)# here we destroy association
end
if follow.size > 0
get_post_comments_data
end
redirect_to :back
end
UPDATE Ok, if I'll move the methods to model's concern how I should work with associations here? Here #user.posts.pluck(:id) and here #user.boards.pluck(:id) with what I can replace posts and boards so it can work with both of them?
So, I did it! I don't know if it's right way, but I DRY this code.
Two controllers:
posts_controller.rb
def init_comments
if Post.comments_manipulator(current_user, params[:post_ids] ||= []) > 0
#posts = Post.new_post_to_follow
code = []
#posts.each do |post|
group = post.group
code = code_constructor('API.call')
end
Post.comments_init(get_request(code), #posts)
end
redirect_to :back
end
boards_controller.rb
def init_comments
if Board.comments_manipulator(current_user, params[:board_ids] ||= []) > 0
#boards = Board.new_board_to_follow
code = []
#boards.each do |board|# подготовка запроса
group = board.group
code = code_constructor('API.call')
end
Board.comments_init(get_request(code), #boards)
end
redirect_to :back
end
As you can see they are absolutely same.
In models board.rb and post.rb - include CommentsInitializer
And in models\concerns
module CommentsInitializer
extend ActiveSupport::Concern
module ClassMethods
def comments_manipulator(user, ids)
relationship = self.name.downcase + 's'
a = user.send(relationship).pluck(:id)
b = ids.map(&:to_i)
follow = b - a
unfollow = a - b
follow.each do |id| # start to follow newly checked obj
#obj = self.find_by_id(id)
if #obj.users.empty?
#obj.update_attribute(:new_follow, true)
end
user.send(relationship) << #obj
end
unfollow.each do |id| # remove from following
#obj = self.find_by_id(id)
remove_assoc_from_user(#obj, user)#destroy relation with current user
end
follow.size
end
def comments_init(comments, objs)
i = 0
objs.each do |obj| # updating comments data
if comments[i]['count'] == 0
obj.update(new_follow: false)
else
obj.update(new_follow: false, last_comment_id: comments[i]['items'][0]['id'])
end
i += 1
end
end
def remove_assoc_from_user(obj, user)
user = user.id
if user
obj.users.delete(user)
end
end
end
My code works. If you know how to make it better please answer!

better way to build association in controller

I need a link in a show method of a parent class for creating associated models, so I have the code:
link_to "incomplete", new_polymorphic_path(part_c.underscore, :survey_id => survey.id)
in a helper.
This links to a part, which has new code like this:
# GET /source_control_parts/new
def new
get_collections
if params[:survey_id]
#s = Survey.find(params[:survey_id])
if #s.blank?
#source_control_part = SourceControlPart.new
else
#source_control_part = #s.create_source_control_part
end
else
#source_control_part = SourceControlPart.new
end
end
I know this is not very DRY. How can I simplify this? Is there a RAILS way?
How about this:
def new
get_collections
get_source_control_part
end
private
def get_source_control_part
survey = params[:survey_id].blank? ? nil : Survey.find(params[:survey_id])
#source_control_part = survey ? survey.create_source_control_part : SourceControlPart.new
end

How should i transform this concern in service object?

I have a concern allowing me to give the back end user the ability to sort elements. I use it for a few different elements. The rails community seems to be pretty vocal against concern and callbacks, i'd like to have a few pointers on how to better model the following code :
require 'active_support/concern'
module Rankable
extend ActiveSupport::Concern
included do
validates :row_order, :presence => true
scope :next_rank, lambda { |rank| where('row_order > ?',rank).order("row_order asc").limit(1)}
scope :previous_rank, lambda { |rank| where('row_order < ?',rank).order("row_order desc").limit(1)}
scope :bigger_rank, order("row_order desc").limit('1')
before_validation :assign_rank
end
def invert(target)
a = self.row_order
b = target.row_order
self.row_order = target.row_order
target.row_order = a
if self.save
if target.save
true
else
self.row_order = a
self.save
false
end
else
false
end
end
def increase_rank
return false unless self.next_rank.first && self.invert(self.next_rank.first)
end
def decrease_rank
return false unless self.previous_rank.first && self.invert(self.previous_rank.first)
end
private
def assign_default_rank
if !self.row_order
if self.class.bigger_rank.first
self.row_order = self.class.bigger_rank.first.row_order + 1
else
self.row_order=0
end
end
end
end
I think a Concern is a good choice for what you are trying to accomplish (particularly with validations and scopes because ActiveRecord does those two very well). However, if you did want to move things out of the Concern, apart from validations and scopes, here is a possibility. Just looking at the code it seems like you have a concept of rank which is represented by an integer but can become it's own object:
class Rank
def initialize(rankable)
#rankable = rankable
#klass = rankable.class
end
def number
#rankable.row_order
end
def increase
next_rank ? RankableInversionService.call(#rankable, next_rank) : false
end
def decrease
previous_rank ? RankableInversionService.call(#rankable, previous_rank) : false
end
private
def next_rank
#next_rank ||= #klass.next_rank.first
end
def previous_rank
#previous_rank ||= #klass.previous_rank.first
end
end
To extract out the #invert method we could create a RankableInversionService (referenced above):
class RankableInversionService
def self.call(rankable, other)
new(rankable, other).call
end
def initialize(rankable, other)
#rankable = rankable
#other = other
#original_rankable_rank = rankable.rank
#original_other_rank = other.rank
end
def call
#rankable.rank = #other.rank
#other.rank = #rankable.rank
if #rankable.save && #other.save
true
else
#rankable.rank = #original_rankable_rank
#other.rank = #original_other_rank
#rankable.save
#other.save
false
end
end
end
To extract out the callback you could have a RankableUpdateService which will assign the default rank prior to saving the object:
class RankableUpdateService
def self.call(rankable)
new(rankable).call
end
def initialize(rankable)
#rankable = rankable
#klass = rankable.class
end
def call
#rankable.rank = bigger_rank unless #rankable.ranked?
#rankable.save
end
private
def bigger_rank
#bigger_rank ||= #klass.bigger_rank.first.try(:rank)
end
end
Now you concern becomes:
module Rankable
extend ActiveSupport::Concern
included do
# validations
# scopes
end
def rank
#rank ||= Rank.new(self)
end
def rank=(rank)
self.row_order = rank.number; #rank = rank
end
def ranked?
rank.number.present?
end
end
I'm sure there are issues with this code if you use it as is, but you get the concept. Overall I think the only thing that might be good to do here is extracting out a Rank object, other than that it might be too much complexity that the concern encapsulates pretty nicely.

How to parse variable in wunderground api url with HTTPParty?

Use wunderground API to show weather forecast on my city pages.
city_controller.rb
def show
#region = Region.find(params[:region_id])
#city = City.find(params[:id])
#weather_lookup = WeatherLookup.new
end
weather_lookup.rb
class WeatherLookup
attr_accessor :temperature, :icon, :condition
def fetch_weather
HTTParty.get("http://api.wunderground.com/api/a8135a01b8230bfb/hourly10day/lang:NL/q/IT/#{#city.name}.xml")
end
def initialize
weather_hash = fetch_weather
end
def assign_values(weather_hash)
hourly_forecast_response = weather_hash.parsed_response['response']['hourly_forecast']['forecast'].first
self.temperature = hourly_forecast_response['temp']['metric']
self.condition = hourly_forecast_response['condition']
self.icon = hourly_forecast_response['icon_url']
end
def initialize
weather_hash = fetch_weather
assign_values(weather_hash)
end
end
show.html.haml(city)
= #weather_lookup.temperature
= #weather_lookup.condition.downcase
= image_tag #weather_lookup.icon
To fetch to correct weather forecast i thought that i can place the #city variable in the HTTParty.get URL as i did in the example, But i get the error message undefined method `name'
What am I doing wrong here?
If you need the city in WeatherLookup you are going to need to pass it to the initializer. Instance variables are only bound to their respective views.
#weather_lookup = WeatherLookup.new(#city)
attr_accessor :city # optional
def initialize(city)
#city = city
weather_hash = fetch_weather
end

Resources