Best way to write a flexible importer module - ruby-on-rails

A user can import his data from other websites. All he needs to do is type in his username on the foreign website and we'll grab all pictures and save it into his own gallery. Some of the pictures needs to be transformed with rMagick (rotating,watermarking), that depends on the importer (depends on which website the user chooses to import data from)
We are discussing the sexiest and most flexible way to do so. We are using carrierwave, but we will change to paperclip in case it fits us more.
Importer Structure
The current structure does looks like (its roughly pseudocode)
module Importer
class Website1
def grab_pictures
end
end
class Website2
def grab_pictures
end
end
end
class ImporterJob
def perform(user, type, foreign_username)
pictures = Importer::type.grab_pictures(foreign_username)
pictures.each do |picture|
user.pictures.create picture
end
end
end
We struggle with the decision, whats the best return of the importer.
Solution1:
The Importer is returning an array of strings with URLs ["http://...", "http://...", "http://..."].
That array we can easily loop and tell carrierwave/paperclip to remote_download the images. After that, we'll run a processor to transform the pictures, if we need to.
def get_picture_urls username
pictures = []
page = get_html(username)
page.scan(/\/p\/\d{4}-\d{2}\/#{username}\/[\w\d]{32}-thumb.jpg/).each do |path|
pictures << path
end
pictures.uniq.collect{|x| "http://www.somewebsite.com/#{x.gsub(/medium|thumb/, "big")}"}
end
this actually returns an array ["url_to_image", "url_to_image", "url_to_image"]
Then in the Picture.after_create, we call something to remove the Watermark on that Image.
Solution2:
grab_pictures is downloading each picture to an tempfile and transform it. it will return an array of tempfiles [tempfile, tempfile, tempfile]
code for that is:
def read_pictures username
pictures = []
page = get_html(username)
page.scan(/\/p\/\d{4}-\d{2}\/#{username}\/[a-z0-9]{32}-thumb.jpg/).each do |path|
pictures << path
end
pictures.uniq.map { |pic_url| remove_logo(pic_url) }
end
def remove_logo pic_url
big = Magick::Image.from_blob(#agent.get(pic_url.gsub(/medium.jpg|thumb.jpg/, 'big.jpg')).body).first
# ... do some transformation and watermarking
file = Tempfile.new(['tempfile', '.jpg'])
result.write(file.path)
file
end
This actually returns an array of [Tempfile, Tempfile, Tempfile]
Summary
The result will be the same for the user - but internally we are discovering 2 different ways of data handling.
We want to keep logic where it belongs and work as generic as possible.
Can you guys help us with choosing the right way? Longterm we want to have around 15 differnt Importers.

I've had a similar situation to this recently - I recommend an array of strings for several reasons:
Familiarity: How often are you working with tempfiles? What about the other developers on your team? How easy is it to manipulate strings vs manipulating tempfiles?
Flexibility: Now you want to just process the picture, but maybe in the future you'll need to keep track of the picture id for each picture from the external site. That's trivial with an array of strings. With an array of tempfiles, it's more difficult (just how much depends, but the fact is it will be more difficult). That of course goes for other as-yet-unknown objectives as well.
Speed: It's faster and uses less disk space to process an array of strings than a group of files. That's perhaps a small issue, but if you get flooded with a lot of photos at the same time, it could be a consideration depending on your environment.
Ultimately, the best thing I can say is start with strings, make a few importers, and then see how it looks and feels. Pretend you're a project manager or a client - start making strange, potentially unreasonable demands of the data you've collected. How easy will it be for you to meet those demands with your current implementation? Would it be easier if you were using tempfiles?

I am doing this for a similar project, where I have to browse and get information on different websites. On each of those websites I have to reach for same goal by performing roughly the same actions, and they are off-course all structured differently.
The solution is inspired from the basic principles of OOP:
Main class: handle the high level operations, handle database operations, handle images operation, manage errors
class MainClass
def import
# Main method, prepare the download and loop through each images
log_in
go_to_images_page
images = get_list_of_images
images.each do |url|
begin
image_record = download_image url
transform_image image_record
rescue
manage_error
end
end
display_logs
send_emails
end
def download_image(url)
# Once the specific class returned the images url, this common method
# Is responsible for downloading and creating database record
record = Image.new picture: url
record.save!
record
end
def transform_image(record)
# Transformation is common so this method sits in the main class
record.watermark!
end
# ... the same for all commom methods (manage_error, display_logs, ...)
end
Specific classes (one per targeted website) : handle low lovel operations and return data to the main class. The only interraction this class must have is with the website, meaning no database access and no error management as much as possible (don't get stuck by your design ;))
Note: In my design I simply inherit from the MainClass, but you can use module inclusion if you prefer.
class Target1Site < MainClass
def log_in
# Perform specific action in website to log the use in
visit '/log_in'
fill_in :user_name, with: ENV['user_name']
...
end
def go_to_images_page
# Go to specific url
visit '/account/gallery'
end
def get_list_of_images
# Use specific css paths
images = all :css, 'div#image-listing img'
images.collect{|i| i['src']}
end
# ...
end

I solved a similar problem... I had to import from a xls file, different resource types using:
The Importer class (ResourcesGroupsImporter).
A base mapper class (ResourceMapper) It acts as interface for specific mappers. It has common methods for all resources and raises NotImplementedError encouraging you to implement those methods when you adds a new resource type.
One mapper by resource type (DetentionsPollMapper, FrontCycleMapper). Each one, implements specific logic for an specific resource.
Implementation example:
The importer...
class ResourcesGroupsImporter
attr_reader :group
attr_reader :mappers
def initialize(_source, _resources_group)
#group = _resources_group
#source = _source
#xls = Roo::Spreadsheet.open(#source.path, extension: :xlsx)
#mappers = Resource::RESOURCEABLE_CLASSES.map { |klass| resource_mapper(klass) }
end
def import
ActiveRecord::Base.transaction do
self.mappers.each { |mapper| create_resource(mapper) }
relate_source_with_group unless self.has_errors?
raise ActiveRecord::Rollback if self.has_errors?
end
end
def has_errors?
!self.mappers.select { |mapper| mapper.has_errors? }.empty?
end
private
def resource_mapper(_class)
"#{_class}Mapper".constantize.new(#xls, #group)
end
def create_resource(_mapper)
return unless _mapper.resource
_mapper.load_resource_attributes
_mapper.resource.complete
_mapper.resource.force_validation = true
if _mapper.resource.save
create_resource_items(_mapper)
else
_mapper.load_general_errors
end
end
def create_resource_items(_mapper)
_mapper.set_items_sheet
columns = _mapper.get_items_columns
#xls.each_with_index(columns) do |data, index|
next if data == columns
break if data.values.compact.size.zero?
item = _mapper.build_resource_item(data)
_mapper.add_detail_errors(index, item.errors.messages) unless item.save
end
end
def relate_source_with_group
#group.reload
#group.source = #source
#group.save!
end
end
The interface...
class ResourceMapper
attr_reader :general_errors
attr_reader :detailed_errors
attr_reader :resource
def initialize(_xls, _resource_group)
#xls = _xls
#resource = _resource_group.resourceable_by_class_type(resource_class)
end
def resource_class
raise_implementation_error
end
def items_sheet_number
raise_implementation_error
end
def load_resource_attributes
raise_implementation_error
end
def get_items_columns
raise_implementation_error
end
def build_resource_item(_xls_item_data)
resource_items.build(_xls_item_data)
end
def raise_implementation_error
raise NotImplementedError.new("#{caller[0]} method not implemented on inherited class")
end
def has_errors?
!self.general_errors.nil? || !self.detailed_errors.nil?
end
def resource_items
self.resource.items
end
def human_resource_name
resource_class.model_name.human
end
def human_resource_attr(_attr)
resource_class.human_attribute_name(_attr)
end
def human_resource_item_attr(_attr)
"#{resource_class}Item".constantize.human_attribute_name(_attr)
end
def load_general_errors
#general_errors = self.resource.errors.messages
end
def add_detail_errors(_xls_row_idx, _error)
#detailed_errors ||= []
#detailed_errors << [ _xls_row_idx+1, _error ]
end
def set_items_sheet
#xls.default_sheet = items_sheet
end
def general_sheet
sheet(0)
end
def items_sheet
sheet(self.items_sheet_number)
end
def sheet(_idx)
#xls.sheets[_idx]
end
def general_cell(_col, _row)
#xls.cell(_col, _row, general_sheet)
end
end
Specific mapper types...
class DetentionsPollMapper < ResourceMapper
def items_sheet_number
6
end
def resource_class
DetentionsPoll
end
def load_resource_attributes
self.resource.crew = general_cell("N", 3)
self.resource.supervisor = general_cell("N", 4)
end
def get_items_columns
{
issue: "Problema identificado",
creation_date: "Fecha",
workers_count: "N° Trabajadores esperando",
detention_hours_string: "HH Detención",
lost_hours: "HH perdidas",
observations: "Observación"
}
end
def build_resource_item(_xls_item_data)
activity = self.resource.activity_by_name(_xls_item_data[:issue])
data = {
creation_date: _xls_item_data[:creation_date],
workers_count: _xls_item_data[:workers_count],
detention_hours_string: _xls_item_data[:detention_hours_string],
lost_hours: _xls_item_data[:lost_hours],
observations: _xls_item_data[:observations],
activity_id: !!activity ? activity.id : nil
}
resource_items.build(data)
end
end
class FrontCycleMapper < ResourceMapper
def items_sheet_number
8
end
def resource_class
FrontCycle
end
def load_resource_attributes
self.resource.front = general_cell("S", 3)
end
def get_items_columns
{
task: "Tarea",
start_time_string: "Hora",
task_type: "Tipo de Tarea",
description: "Descripción"
}
end
def build_resource_item(_xls_item_data)
activity = self.resource.activity_by_name_and_category(
_xls_item_data[:task], _xls_item_data[:task_type])
data = {
description: _xls_item_data[:description],
start_time_string: _xls_item_data[:start_time_string],
activity_id: !!activity ? activity.id : nil
}
resource_items.build(data)
end
end

A helper have to provide a way to access pict as you prefer.
However saving "http://...", "http://...", "http://..." this kind of strings, is a lack of security.
I 'd preferd hash like this: domain_name = {"name_on_url.jpg" =>path_on_disk, ...}
To ensure flexibility of access.

Related

How can I refactor this ruby code using the Open/Closed principle or Strategy pattern

How can I refactor this ruby code using the Open/Closed principle or Strategy pattern ?
I know that the main thought is 'software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification' but how can I use this in practice?
class PaymentService
def initialize(payment, payment_type)
#payment = payment
#payment_type = payment_type
end
def process
result = case payment_type
when 'first'
process_first
when 'second'
process_second
end
payment.save(result)
end
def process_first
'process_first'
end
def process_second
'process_second'
end
end
In this example, instead of passing a payment_type you can build an object with a class that processes a payment:
class FirstPayment
def process
'process_first'
end
end
class SecondPayment
def process
'process_second'
end
end
class PaymentService
def initialize(payment, payment_strategy)
#payment = payment
#payment_strategy = payment_strategy
end
def process
result = #payment_stategy.process
payment.save(result)
end
end
PaymentService.new(payment, FirstPayment.new)
As a result, PaymentService behaviour can be extended by passing a new strategy (for example, ThirdPayment), but the class doesn't need to be modified, if the logic of processing the first or the second payments is changed.

How to pass dynamic params in Rails?

I want some of my model attributes to predefined dynamically. I have various models.And now I want My Bill model to create objects using other model instances.
Models :
leave.rb # belongs_to :residents
resident.rb # has_many:leaves,has_many:bills,has_one:account
bill.rb # belongs_to:residents
rate_card.rb # belongs_to:hostel
account.rb # belongs_to:resident
hostel.rb
now here is my bills controller create method :
def create
#bill = Resident.all.each { |resident| resident.bills.create(?) }
if #bill.save
flash[:success]="Bills successfully generated"
else
flash[:danger]="Something went wrong please try again !"
end
end
I want to build bill using all of the models eg:
resident.bills.create(is_date:using form,to_date:using form,expiry_date:using form,amount:30*(resident.rate_card.diet)+resident.rate_card.charge1+resident.rate_card.charge2)+(resident.account.leaves)*10+resident.account.fine)
///////Is this possible ?
And how to use strong params here ?
Pls help me out thxx..
I think the Rails way for this logic you want is with callbacks if you want calculated attributes either on create, update or delete, meaning attributes that depend on other models. For instance:
class Bill < ActiveRecord::Base
...
before_create :set_amount
...
protected
def set_amount
self.amount = 30 * self.resident.rate_card.diet + self.resident.rate_card.charge1 + self.resident.rate_card.charge2 + (self.resident.account.leaves) * 10 + self.resident.account.fine
end
end
If you want this logic to be used when updating the record also, then you should use before_save instead of before_create.
After you do this, you should accept the usual params (strong) of Bill model, as in:
def bill_params
params.require(:bill).permit(:is_date, :to_date, :expiry_date)
end
So your create call would be like:
resident.bills.create(bill_params)
Also, be wary of your create action, you should probably create a method either on your Bill or your Resident model that uses transactions to create all bills at the same time because you probably want either every bill created or none. This way you won't have the Resident.all.each logic in your BillsController.
create takes a hash, you can:
create_params = { amount: 30*(resident.rate_card.diet) }
create_params[:some_field] = params[:some_field]
# and so on
resident.bills.create(create_params)
or:
obj = resident.bills.build(your_strong_parameters_as_usual)
obj.amount = # that calculation
obj.save!
I'm confused at your syntax of your controller. #bill is being set to the value of a loop, which feels off. Each loops return the enumerable you cycle through, so you'll end up with #bill = Resident.all with some bills being created on the side.
What your controller really wants to know is, did my many new bills save correctly?
This seems like a perfect place to use a ruby object (or, colloquially, a Plain Old Ruby Object, as opposed to an ActiveRecord object) to encapsulate the specifics of this bill-generator.
If I'm reading this right, it appears that you are generating many bills at once, based on form-inputted data like:
is_date
to_date
expiry_date
...as well as some data about each individual resident.
Here's the model I'd create:
app/models/bill_generator.rb
class BillGenerator
include ActiveModel::Model
# This lets you do validations
attr_accessor :is_date, :to_date, :expiry_date
# This lets your form builder see these attributes when you go form.input
attr_accessor :bills
# ...for the bills we'll be generating in a sec
validates_presence_of :is_date, :to_date, :expiry_date
# You can do other validations here. Just an example.
validate :bills_are_valid?
def initialize(attributes = {})
super # This calls the Active Model initializer
build_new_bills # Called as soon as you do BillGenerator.new
end
def build_new_bills
#bills = []
Resident.all.each do |r|
#bills << r.bills.build(
# Your logic goes here. Not sure what goes into a bill-building...
# Note that I'm building (which means not-yet-saved), not creating
)
end
def save
if valid?
#bills.each { |b| b.save }
true
else
false
end
end
private
def bills_are_valid?
bill_validity = true
#bills.each do |b|
bill_validity = false unless b.valid?
end
bill_validity
end
end
Why all this mess? Because in your controller you can do...
app/controllers/bill_controller.rb
def create
#bill_generator = BillGenerator.new(bill_generator_params)
if #bill_generator.save?
# Redirect to somewhere with a flash?
else
# Re-render the form with a flash?
end
end
def bill_generator_params
params.require(:bill_generator).permit(:is_date, :to_date, :expiry_date)
# No extra garbage. No insecurity by letting all kinds of crud through!
end
...like a BillGenerator is any old object. Did it save? Great. It didn't, show the form again.
Now, my BillGenerator won't just be copy-and-paste. Your 'build_new_bills' probably will have some of that math you alluded to, which I'll leave to you.
Let me know what you think!
you can do it by using params.permit! as this allows any parameters to be passed. here's an example:
def create
...
#bill = Resident.all.each { |resident| resident.bills.create(any_params) }
end
private
def any_params
params.permit!
end
be careful with this of course, as you are opening this up to potential exploits.

Single Table Inheritance or Type Table

I am facing a design decision I cannot solve. In the application a user will have the ability to create a campaign from a set of different campaign types available to them.
Originally, I implemented this by creating a Campaign and CampaignType model where a campaign has a campaign_type_id attribute to know which type of campaign it was.
I seeded the database with the possible CampaignType models. This allows me to fetch all CampaignType's and display them as options to users when creating a Campaign.
I was looking to refactor because in this solution I am stuck using switch or if/else blocks to check what type a campaign is before performing logic (no subclasses).
The alternative is to get rid of CampaignType table and use a simple type attribute on the Campaign model. This allows me to create Subclasses of Campaign and get rid of the switch and if/else blocks.
The problem with this approach is I still need to be able to list all available campaign types to my users. This means I need to iterate Campaign.subclasses to get the classes. This works except it also means I need to add a bunch of attributes to each subclass as methods for displaying in UI.
Original
CampaignType.create! :fa_icon => "fa-line-chart", :avatar=> "spend.png", :name => "Spend Based", :short_description => "Spend X Get Y"
In STI
class SpendBasedCampaign < Campaign
def name
"Spend Based"
end
def fa_icon
"fa-line-chart"
end
def avatar
"spend.png"
end
end
Neither way feels right to me. What is the best approach to this problem?
A not very performant solution using phantom methods. This technique only works with Ruby >= 2.0, because since 2.0, unbound methods from modules can be bound to any object, while in earlier versions, any unbound method can only be bound to the objects kind_of? the class defining that method.
# app/models/campaign.rb
class Campaign < ActiveRecord::Base
enum :campaign_type => [:spend_based, ...]
def method_missing(name, *args, &block)
campaign_type_module.instance_method(name).bind(self).call
rescue NameError
super
end
def respond_to_missing?(name, include_private=false)
super || campaign_type_module.instance_methods(include_private).include?(name)
end
private
def campaign_type_module
Campaigns.const_get(campaign_type.camelize)
end
end
# app/models/campaigns/spend_based.rb
module Campaigns
module SpendBased
def name
"Spend Based"
end
def fa_icon
"fa-line-chart"
end
def avatar
"spend.png"
end
end
end
Update
Use class macros to improve performance, and keep your models as clean as possible by hiding nasty things to concerns and builder.
This is your model class:
# app/models/campaign.rb
class Campaign < ActiveRecord::Base
include CampaignAttributes
enum :campaign_type => [:spend_based, ...]
campaign_attr :name, :fa_icon, :avatar, ...
end
And this is your campaign type definition:
# app/models/campaigns/spend_based.rb
Campaigns.build 'SpendBased' do
name 'Spend Based'
fa_icon 'fa-line-chart'
avatar 'spend.png'
end
A concern providing campaign_attr to your model class:
# app/models/concerns/campaign_attributes.rb
module CampaignAttributes
extend ActiveSupport::Concern
module ClassMethods
private
def campaign_attr(*names)
names.each do |name|
class_eval <<-EOS, __FILE__, __LINE__ + 1
def #{name}
Campaigns.const_get(campaign_type.camelize).instance_method(:#{name}).bind(self).call
end
EOS
end
end
end
end
And finally, the module builder:
# app/models/campaigns/builder.rb
module Campaigns
class Builder < BasicObject
def initialize
#mod = ::Module.new
end
def method_missing(name, *args)
value = args.shift
#mod.send(:define_method, name) { value }
end
def build(&block)
instance_eval &block
#mod
end
end
def self.build(module_name, &block)
const_set module_name, Builder.new.build(&block)
end
end

How can I run some code on all objects retrieved from ActiveRecord?

I want to initialize some attributes in retrieved objects with values received from an external API. after_find and after_initialize callbacks won't work for me as this way I have to call the API for each received object, which is is quite slow. I want something like the following:
class Server < ActiveRecord::Base
attr_accessor :dns_names
...
after_find_collection do |servers|
all_dns_names = ForeignLibrary.get_all_dns_entries
servers.each do |s|
s.dns_names = all_dns_names.select{|r| r.ip == s.ip}.map{|r| r.fqdn}
end
end
end
Please note that caching is not a solution, as I need to always have current data, and the data may be changed outside the application.
You'd want to have a class method that enhances each server found with your data. so, something like:
def index
servers = Server.where(condition: params[:condition]).where(second: params[:second])
#servers = Server.with_domains_names(servers)
end
class Server
def self.with_domain_names(servers)
all_dns_names = ForeignLibrary.get_all_dns_entries
servers.each do |s|
s.dns_names = all_dns_names.select{|r| r.ip == s.ip}.map{|r| r.fqdn}
end
end
end
This way, the ForeignLibrary.get_all_dns_entries only gets run once, and you can enhance your servers with that extra information.
If you wanted to do this every time you initialize a server object, I'd simply delegate rather than use after_initialize. So you'd effectively store the all dns entries in a global variable, and then cache it for a period of time. ForeignLibrary.get_all_dns_entries call. So, it would be something like:
class Server
def dns_names
ForeignLibrary.dns_for_server(self)
end
end
class ForeignLibrary
def self.reset
##all_dns_names = nil
end
def self.dns_for_server(server)
all_dns_names.select{|r| r.ip == server.ip}.map{|r| r.fqdn}
end
def self.all_dns_names
Mutex.new.synchronize do
##all_dns_names ||= call_the_library_expensively
end
end
end
(I also used a mutex here since we are doing ||= with class variables)
to use it, you would:
class ApplicationController
before_filter do
ForeignLibrary.reset #ensure every page load has the absolute latest data
end
end

About strategy pattern and Rails

I want to incorporate the strategy pattern in my application.
I have stored under lib the following classes.
class Network
def search
raise "NO"
end
def w_read
raise "NO"
end
#...
end
AND
class FacebookClass < Network
def search
# FacebookClass specific...
end
def w_read
raise OneError.new("...")
end
end
AND
class TwitterClass < Network
def search
# TwitterClass specific...
end
def w_read
# TwitterClass specific...
end
def write
# TwitterClass specific...
end
end
Now I want to call the method search of TwitterClass from app/model/network_searcher.rb. How can I do that? Did I implemented the strategy pattern here successfully?
Going by the example in the Wikipedia, I think your app/model/network_searcher should be something like this
class NetworkSearcher
def initialize(search_class)
#search_class = search_class
end
def search_social
#search_class.search
end
def w_read_social
#search_class.w_read
end
def write_social
#search_class.write
end
end
Then in controller or where you want to invoke it, you can call like this:
search_class = TwitterClass.new # or FacebookClass.new
network_searcher = NetworkSearch.new(search_class)
network_searcher.search_social # or network_searcher.w_read_social or network_searcher.write_social
Also if you are keeping these classes in lib, for Rails 3, inorder to get these classes autoloaded, you need to add this line to config/application.rb
config.autoload_paths += %W(#{config.root}/lib)
and also follow the naming convention for the filenames in Rails (for example TwitterClass should be named twitter_class.rb). Otherwise you will have to require these files wherever you are using these classes.
The strategy pattern is used to allow the algorithm to use to be selected at runtime. Without more details it's hard to say if this is appropriate to your problem. Assuming that it is then what you need is a way to set the search on your model and you can then use the selected algorithm elsewhere in your model. e.g.
class TheInformation
attr_writer :searcher
def other_method
..
# can use the selected searcher here
#searcher.search
..
end
end
Does that help?

Resources