Extract and validate data from attachment - ruby-on-rails

I'm using paperclip to manage my file uploads in rails.
From the attachments users give me, I'd like to extract some data to associate with the model the attachment is associated with.
has_attached_file :resume, #...
# ...
def extract_resume_summary
path_to_resume = self.resume.queued_for_write[:original].path
extracted = parse_resume_file(path_to_resume)
self.number_of_jobs = extracted.number_of_jobs
self.highest_level_of_education = extracted.highest_level_of_education
rescue ResumeParseError => e
#problem_with_resume = e.message
end
I'm having some trouble figuring out exactly when and where to do this.
I could use a custom Paperclip::Processor:
class ::Paperclip::Summary < ::Paperclip::Processor
def make
#attachment.instance.extract_resume_summary
Tempfile.new('unused')
end
end
# ...
has_attached_file :resume,
:styles => { :summary => {} },
:processors => [ :summary ] }, #...
But the fit isn't great. I think processors are intended to create new files (which I don't need, and thus the spurious Tempfile).
Also my extraction might fail, which means my user gave me bad data. I want that to be a validation-time problem, so I can report it along with other validation errors, and post-processing happens strictly AFTER validation.
I've tried hacking it in at initialization:
validate :successfully_parses_resume
def successfully_parses_resume
errors.add(:resume, #problem_with_resume) if #problem_with_resume
end
def initialize(attributes=nil, options={})
super
extract_resume_summary
end
But I'm not quite sure that's right either, as that's not only when the file is uploaded, but also when I read the model in later. To say nothing of the havoc that could happen if I assumed #resume= or #[:resume]= was auto updating the extracted data too.
I think in an ideal world, I'd just subclass Paperclip::Attachment and make my extracted data peers of resume_file_name, resume_file_size, resume_content_type, resume_created_at, extracting that at the same time the mime-type is calculated and the file size is calculated. But looking at the source, those are pretty hard-coded throughout.
Is there another way to do this that I'm overlooking?

The solution I figured out was to wrap the attachment's setter. That's what'll be called during initialize and will give me a chance to detect problems before validation.
The only trick is that since the attachment's setter is created by has_attached_file and not inherited from ActiveRecord::Base, I can't just use super, I need to get an explicit reference to the version defined by has_attached_file (either by alias or, my preference, by instance_method):
has_attached_file :resume
old_setter = instance_method :resume=
define_method :resume= do |file|
old_setter.bind(self).call(file)
begin
extracted = parse_resume_file(resume.path)
self.number_of_jobs = extracted.number_of_jobs
self.highest_level_of_education = extracted.highest_level_of_education
rescue ResumeParseError => e
#problem_with_resume = e.message
end
end

Related

Rails; Fetch records within initializer

I've been wondering it is common to fetch records within initializer?
Here this is an example for service object to fetch records and generated pdf receipt file.
Input is invoice uuid, and fetch the related records such as card detail, invoice items within initialier.
class Pdf::GenerateReceipt
include Service
attr_reader :invoice, :items, :card_detail
def initialize(invoice_uuid)
#invoice ||= find_invoice!(invoice_uuid) # caching
#items = invoice.invoice_items
#card_detail = card_detail
end
.....
def call
return ReceiptGenerator.new(
id: invoice.uuid, # required
outline: outline, # required
line_items: line_items, # required
customer_info: customer_info
)
rescue => e
false, e
end
.....
def card_detail
card_metadata = Account.find(user_detail[:id]).credit_cards.primary.last
card_detail = {}
card_detail[:number] = card_metadata.blurred_number
card_detail[:brand] = card_metadata.brand
card_detail
end
end
Pdf::GenerateReceipt.('28ed7bb1-4a3f-4180-89a3-51cb3e621491') # => then generate pdf
The problem is if the records not found, this generate an error.
I could rescue within the initializer, however that seems not common.
How could I work around this in more ruby way?
This is mostly opinion and anecdotal, but I prefer to deal with casting my values as far up the chain as possible. So i would find the invoice before this object and pass it in as an argument, same with the card_detail.
If you do that in this class, it will limit the responsibility to coordinating those two objects, which is way easier to test but also adds another layer that you have to reason about in the future.
So how i would handle, split this into 4 separate things
Invoice Finder thing
Card Finder thing
Pdf Generator that takes invoice and card as arguments
Finally, something to orchestrate the 3 actions above
Hope this helps.
Addition: Check out the book confident ruby by avdi grimm. It's really great for outlining handling this type of scenario.

Rails: Continue to save other attributes despite validation error in one of them

I am using Delayed_Job to process Paperclip images retrieved from an Amazon S3 bucket.
The basic issue I am trying to resolve is to be able to save in the DB when a validation error occurred in processing the Paperclip image.
The basic code I have is the following:
class Provider < ActiveRecord::Base
after_save :queue_image_processing, if: Proc.new {|provider| provider.image_s3_key}
has_attached_file :image, styles: {original: "1000x1000", medium: "530x530#", thumb: "300x300#"}, default_url: "/default-avatar_:style.png"
validates_attachment_content_type :image, content_type: /\Aimage\/.*\Z/
def self.process_image(id)
provider = Provider.find(id)
s3_image_object = S3_BUCKET.objects[provider.image_s3_key]
begin
provider.image = s3_image_object.public_url
provider.image_s3_key = nil
provider.image_started_processing_at = nil
provider.error = false
provider.save!
rescue ActiveRecord::RecordInvalid
logger.info "$$$$$ Record Invalid called $$$$"
provider.image_s3_key = nil
provider.image_started_processing_at = nil
provider.error = true
provider.save
end
end
private
def queue_image_processing
Provider.delay.process_image(id)
end
end
When the image is not of the correct format (e.g. because of spoofing of file extension), a Rollback is performed, the save! throws ::RecordInvalid. This is all good and the debug text in logger.info gets shown.
The problem I have is that the rest of the code in the rescue clause does not seem to be performed (i.e. those attributes are not stored in the DB).
This code is just one of the several tries I did, but my basic need is to be able to write the other attributes that did not receive a validation error in the DB. As you can see I need to be careful as I have an after_save callback and need to avoid that I run into this call back being called multiple times (and hence the condition on running this call back).
All the help greatly appreciated! Thanks in advance
PS: Rails 4.0
Actually I found the way to do it, thanks to the handy cheat sheet on the Different ways to set attributes in Rails by David Verhasselt.
It might not be the purest way of doing it, but hey, it works.
The way to do it is to use model.update_columns:
def self.process_image(id)
provider = Provider.find(id)
s3_image_object = S3_BUCKET.objects[provider.image_s3_key]
begin
provider.image_s3_key = nil
provider.image_started_processing_at = nil
provider.error = false
provider.image = s3_image_object.public_url
provider.save!
rescue ActiveRecord::RecordInvalid
logger.info "$$$$$ Record Invalid called $$$$"
provider.update_columns(image_s3_key: nil, image_started_processing_at: nil, error: true)
end
end
This has the added benefit (in my case, but might not fit all cases) that the updated_at is not updated, the validations are not performed and the callbacks are not called. In my case it good because the record is not updated after a user action, but rather only due to a failure.
Hope this helps others.

How to write short, clean rspec tests for method with many model calls?

I'm having trouble coming up with some tests for a method I want to write.
The method is going to take a hash of some data and create a bunch of associated models with it. The problem is, I'm having a hard time figuring out what the best practice for writing this sort of test is.
For example, the code will:
Take a hash that looks like:
{
:department => 'CS',
:course_title => 'Algorithms',
:section_number => '01B'
:term => 'Fall 2012',
:instructor => 'Bob Dylan'
}
And save it to the models Department, Course, Section, and Instructor.
This will take many calls to model.find_or_create, etc.
How could I go about testing each separate purpose of this method, e.g.:
it 'should find or create department' do
# << Way too many stubs here for each model and all association calls
dept = mock_model(Department)
Department.should_receive(:find_or_create).with(:name => 'CS').and_return(dept)
end
Is there a way to avoid the massive amounts of stubs to keep each test FIRST (fast independent repeatable self-checking timely) ? Is there a better way to write this method and/or these tests? I'd really prefer to have short, clean it blocks.
Thank you so much for any help.
Edit:
The method will probably look like this:
def handle_course_submission(param_hash)
department = Department.find_or_create(:name => param_hash[:department])
course = Course.find_or_create(:title => param_hash[:course_title])
instructor = Instructor.find_or_create(:name => param_hash[:instructor])
section = Section.find_or_create(:number => param_hash[:section_number], :term => param_hash[:term])
# Maybe put this stuff in a different method?
course.department = department
section.course = course
section.instructor = instructor
end
Is there a better way to write the method? How would I write the tests? Thanks!
For passing an array of sections to be created:
class SectionCreator
# sections is the array of parameters
def initialize(sections)
#sections = sections
end
# Adding the ! here because I think you should use the save! methods
# with exceptions as mentioned in one of my myriad comments.
def create_sections!
#sections.each do |section|
create_section!(section)
end
end
def create_section!(section)
section = find_or_create_section(section[:section_number], section[:term])
section.add_course!(section_params)
end
# The rest of my original example goes here
end
# In your controller or wherever...
def action
SectionCreator.new(params_array).create_sections!
rescue ActiveRecord::RecordInvalid => ex
errors = ex.record.errors
render json: errors
end
Hopefully this covers it all.
My first thought is that you may be suffering from a bigger design flaw. Without seeing the greater context of your method it is hard to give much advice. However, in general it is good to break the method up into smaller pieces and follow the single level of abstraction principle.
http://www.markhneedham.com/blog/2009/06/12/coding-single-level-of-abstraction-principle/
Here is something you could try although as mentioned before this is definitely still not ideal:
def handle_course_submission(param_hash)
department = find_or_create_department(param_hash[:department])
course = find_or_create_course(param_hash[:course_title])
# etc.
# call another method here to perform the actual work
end
private
def find_or_create_department(department)
Department.find_or_create(name: department)
end
def find_or_create_course(course_title)
Course.find_or_create(title: course_title)
end
# Etc.
In the spec...
let(:param_hash) do
{
:department => 'CS',
:course_title => 'Algorithms',
:section_number => '01B'
:term => 'Fall 2012',
:instructor => 'Bob Dylan'
}
end
describe "#save_hash" do
before do
subject.stub(:find_or_create_department).as_null_object
subject.stub(:find_or_create_course).as_null_object
# etc.
end
after do
subject.handle_course_submission(param_hash)
end
it "should save the department" do
subject.should_receive(:find_or_create_department).with(param_hash[:department])
end
it "should save the course title" do
subject.should_receive(:find_or_create_course).with(param_hash[:course_title])
end
# Etc.
end
describe "#find_or_create_department" do
it "should find or create a Department" do
Department.should_receive(:find_or_create).with("Department Name")
subject.find_or_create_department("Department Name")
end
end
# etc. for the rest of the find_or_create methods as well as any other
# methods you add
Hope some of that helped a little. If you post more of your example code I may be able to provide less generalized and possibly useful advice.
Given the new context provided, I would split the functionality up amongst your models a little more. Again, this is really just the first thing that comes to mind and could definitely be improved upon. It seems to me like the Section is the root object here. So you could either add a Section.create_course method or wrap it in a service object like so:
Updated this example to use exceptions
class SectionCreator
def initialize(param_hash)
number = param_hash.delete(:section_number)
term = param_hash.delete(:term)
#section = find_or_create_section(number, term)
#param_hash = param_hash
end
def create!
#section.add_course!(#param_hash)
end
private
def find_or_create_section(number, term)
Section.find_or_create(number: number, term: term)
end
end
class Section < ActiveRecord::Base
# All of your current model stuff here
def add_course!(course_info)
department_name = course_info[:department]
course_title = course_info[:course_title]
instructor_name = param_hash[:instructor]
self.course = find_or_create_course_with_department(course_title, department_name)
self.instructor = find_or_create_instructor(instructor_name)
save!
self
end
def find_or_create_course_with_department(course_title, department_name)
course = find_or_create_course(course_title)
course.department = find_or_create_department(department_name)
course.save!
course
end
def find_or_create_course(course_title)
Course.find_or_create(title: course_title)
end
def find_or_create_department(department_name)
Department.find_or_create(name: department_name)
end
def find_or_create_instructor(instructor_name)
Instructor.find_or_create(name: instructor_name)
end
end
# In your controller (this needs more work but..)
def create_section_action
#section = SectionCreator.new(params).create!
rescue ActiveRecord::RecordInvalid => ex
flash[:alert] = #section.errors
end
Notice how adding the #find_or_create_course_with_department method allowed us to add the association of the department in there while keeping the #add_course method clean. That is why I like to add those methods even though they sometimes seem superflous like in the case of the #find_or_create_instructor method.
The other advantage of breaking out the methods in this fashion is that they become easier to stub in tests as I showed in my first example. You can easily stub all of these methods to make sure the database isn't actually being hit and your tests run fast while at the same time guarantee through the test expectations that the functionality is correct.
Of course, a lot of this comes down to personal preference on how you want to implement it. In this case the service object is probably unnecessary. You could just as easily have implemented that as the Section.create_course method I referenced earlier like so:
class Section < ActiveRecord::Base
def self.create_course(param_hash)
section = find_or_create(number: param_hash.delete(:section_number), term: param_hash.delete(:term))
section.add_course(param_hash)
section
end
# The rest of the model goes here
end
As to your final question, you can definitely stub out methods in RSpec and then apply expectations like should_receive on top of those stubs.
It's getting late so let me know if I missed anything.

Add http(s) to URL if it's not there?

I'm using this regex in my model to validate an URL submitted by the user. I don't want to force the user to type the http part, but would like to add it myself if it's not there.
validates :url, :format => { :with => /^((http|https):\/\/)?[a-z0-9]+([-.]{1}[a-z0-9]+).[a-z]{2,5}(:[0-9]{1,5})?(\/.)?$/ix, :message => " is not valid" }
Any idea how I could do that? I have very little experience with validation and regex..
Use a before filter to add it if it is not there:
before_validation :smart_add_url_protocol
protected
def smart_add_url_protocol
unless url[/\Ahttp:\/\//] || url[/\Ahttps:\/\//]
self.url = "http://#{url}"
end
end
Leave the validation you have in, that way if they make a typo they can correct the protocol.
Don't do this with a regex, use URI.parse to pull it apart and then see if there is a scheme on the URL:
u = URI.parse('/pancakes')
if(!u.scheme)
# prepend http:// and try again
elsif(%w{http https}.include?(u.scheme))
# you're okay
else
# you've been give some other kind of
# URL and might want to complain about it
end
Using the URI library for this also makes it easy to clean up any stray nonsense (such as userinfo) that someone might try to put into a URL.
The accepted answer is quite okay.
But if the field (url) is optional, it may raise an error such as undefined method + for nil class.
The following should resolve that:
def smart_add_url_protocol
if self.url && !url_protocol_present?
self.url = "http://#{self.url}"
end
end
def url_protocol_present?
self.url[/\Ahttp:\/\//] || self.url[/\Ahttps:\/\//]
end
Preface, justification and how it should be done
I hate it when people change model in a before_validation hook. Then when someday it happens that for some reason models need to be persisted with save(validate: false), then some filter that was suppose to be always run on assigned fields does not get run. Sure, having invalid data is usually something you want to avoid, but there would be no need for such option if it wasn't used. Another problem with it is that every time you ask from a model is it valid these modifications also take place. The fact that simply asking if a model is valid may result in the model getting modified is just unexpected, perhaps even unwanted. There for if I'd have to choose a hook I'd go for before_save hook. However, that won't do it for me since we provide preview views for our models and that would break the URIs in the preview view since the hook would never get called. There for, I decided it's best to separate the concept in to a module or concern and provide a nice way for one to apply a "monkey patch" ensuring that changing the fields value always runs through a filter that adds a default protocol if it is missing.
The module
#app/models/helpers/uri_field.rb
module Helpers::URIField
def ensure_valid_protocol_in_uri(field, default_protocol = "http", protocols_matcher="https?")
alias_method "original_#{field}=", "#{field}="
define_method "#{field}=" do |new_uri|
if "#{field}_changed?"
if new_uri.present? and not new_uri =~ /^#{protocols_matcher}:\/\//
new_uri = "#{default_protocol}://#{new_uri}"
end
self.send("original_#{field}=", new_uri)
end
end
end
end
In your model
extend Helpers::URIField
ensure_valid_protocol_in_uri :url
#Should you wish to default to https or support other protocols e.g. ftp, it is
#easy to extend this solution to cover those cases as well
#e.g. with something like this
#ensure_valid_protocol_in_uri :url, "https", "https?|ftp"
As a concern
If for some reason, you'd rather use the Rails Concern pattern it is easy to convert the above module to a concern module (it is used in an exactly similar way, except you use include Concerns::URIField:
#app/models/concerns/uri_field.rb
module Concerns::URIField
extend ActiveSupport::Concern
included do
def self.ensure_valid_protocol_in_uri(field, default_protocol = "http", protocols_matcher="https?")
alias_method "original_#{field}=", "#{field}="
define_method "#{field}=" do |new_uri|
if "#{field}_changed?"
if new_uri.present? and not new_uri =~ /^#{protocols_matcher}:\/\//
new_uri = "#{default_protocol}://#{new_uri}"
end
self.send("original_#{field}=", new_uri)
end
end
end
end
end
P.S. The above approaches were tested with Rails 3 and Mongoid 2.
P.P.S If you find this method redefinition and aliasing too magical you could opt not to override the method, but rather use the virtual field pattern, much like password (virtual, mass assignable) and encrypted_password (gets persisted, non mass assignable) and use a sanitize_url (virtual, mass assignable) and url (gets persisted, non mass assignable).
Based on mu's answer, here's the code I'm using in my model. This runs when :link is saved without the need for model filters. Super is required to call the default save method.
def link=(_link)
u=URI.parse(_link)
if (!u.scheme)
link = "http://" + _link
else
link = _link
end
super(link)
end
Using some of the aforementioned regexps, here is a handy method for overriding the default url on a model (If your ActiveRecord model has an 'url' column, for instance)
def url
_url = read_attribute(:url).try(:downcase)
if(_url.present?)
unless _url[/\Ahttp:\/\//] || _url[/\Ahttps:\/\//]
_url = "http://#{_url}"
end
end
_url
end
I had to do it for multiple columns on the same model.
before_validation :add_url_protocol
def add_url_protocol
[
:facebook_url, :instagram_url, :linkedin_url,
:tiktok_url, :youtube_url, :twitter_url, :twitch_url
].each do |url_method|
url = self.send(url_method)
if url.present? && !(%w{http https}.include?(URI.parse(url).scheme))
self.send("#{url_method.to_s}=", 'https://'.concat(url))
end
end
end
I wouldn't try to do that in the validation, since it's not really part of the validation.
Have the validation optionally check for it; if they screw it up it'll be a validation error, which is good.
Consider using a callback (after_create, after_validation, whatever) to prepend a protocol if there isn't one there already.
(I voted up the other answers; I think they're both better than mine. But here's another option :)

Rails Paperclip, DRY configuration

In order to DRY up my code for attachments pictures, I created an initializer to override the #default_options variable used by Paperclip.
This way, I don't have to specify again and again the url, path and storage I want.
I'd like to go a step further and include the validation in it but I can't make it work...
Any Idea?
EDIT 1: I want at least to validate both presence and size.
EDIT 2: Part of my current code
module Paperclip
class Attachment
def self.default_options
if Rails.env != "production"
#default_options = {
:url => "/assets/:class/:attachment/:id/:style/:normalized_name",
:path => ":rails_root/public/assets/:class/:attachment/:id/:style/:normalized_name",
:default_style => :original,
:storage => :filesystem,
:whiny => Paperclip.options[:whiny] || Paperclip.options[:whiny_thumbnails]
}
else
...
end
end
end
normalized_name is an outside function, feat: http://blog.wyeworks.com/2009/7/13/paperclip-file-rename
EDIT 3:
This blog: http://omgsean.com/2009/02/overriding-paperclip-defaults-for-your-entire-rails-app/ presnents the default_options hash with a validations key.
So it could be possible, not found yet though.
You will not be able to move the validations into a default_options hash (as these validations are performed outside the attachment class (inside a paperclip module). My thought is that if you have the same validations across all your models, you might need to look into using inheritance to decrease code duplication. I would advise against moving validations into an initializer.

Resources