ActiveStorage Image import from Remote URL via activerecord-import gem - ruby-on-rails

I am trying to import list of products via CSV, there is an image column with remote url. I am using activerecord import gem, when I use the setup below I don't get any errors or logs but images are not being created.
require 'open-uri'
file = open(row['image'])
product = Product.new(
name: row['name'],
sku: row['sku'],
brand_id: row['brand']
)
product.image.attach(io: file, filename: row['sku']+ '.jpg', content_type: 'image/jpg')
products << product
end
importing = Product.import products, recursive: true
am I missing something? Or is there a better way to handle this.
I was able to import image with regular csv parse. Still not sure, why activerecord import didn't work. I think I will ask this in the gem's github.
csv_text = resp.body
csv = CSV.parse(csv_text, :headers => true, :encoding => 'utf-8')
csv.each do |row|
file = open(row['image'])
t = Product.new
t.name = row['name']
t.image.attach(io: file, filename: row['name']+ '.jpg', content_type: 'image/jpg')
t.save
puts "#{t.name} saved"
end

Giving these prompts as an answer then:
The part of the code you shared (assuming other parts behave as expected) seems ok and like it is doing what it should but since your images are not created ...
(removed guesses part)
Edit: after your update revealing some context, my guess is that the first version (with ActiveRecord import) didn't work as the import gem doesn't invoke callbacks and/or probably skips something that is required for ActiveStorage to trigger the upload of the file attachment.

Related

ActiveStorage how to prevent duplicate file uploads ; find by filename

I am parsing email attachments and uploading them to ActiveStorage in S3.
We would like it ignore duplicates but i cannot see to query by these attributes.
class Task < ApplicationRecord
has_many_attached :documents
end
then in my email webhook job
attachments.each do |attachment|
tempfile = open(attachment[:url], http_basic_authentication: ["api", ENV.fetch("MAILGUN_API_KEY")])
# i'd like to do something like this
next if task.documents.where(filename: tempfile.filename, bytesize: temfile.bytesize).exist?
# this is what i'm currently doing
task.documents.attach(
io: tempfile,
filename: attachment[:name],
content_type: attachment[:content_type]
)
end
Unfortunately if someone forwards the same files, we've got duplicated and often more.
Edit with current solution:
tempfile = open(attachment[:url], http_basic_authentication: ["api", ENV.fetch("MAILGUN_API_KEY")])
md5_digest = Digest::MD5.file(tempfile).base64digest
# if this digest already exists as attached to the file then we're all good.
next if ActiveStorage::Blob.joins(:attachments).where({
checksum: md5_digest,
active_storage_attachments: {name: 'documents', record_type: 'Task', record_id: task.id
}).exists?
Rails utilizes 2 tables for storing attachment data; active_storage_attachments and active_storage_blobs
The active_storage_blobs table houses a checksum of the uploaded file.
You can easily join this table to verify the existence of a file.
Going from #gustavo's answer I came up with the following:
attachments.each do |attachment|
tempfile = TempFile.new
tempfile.write open(attachment[:url], http_basic_authentication: ["api", ENV.fetch("MAILGUN_API_KEY")])
checksum = Digest::MD5.file(tempfile.path).base64digest
if task.documents.joins(:documents_blobs).exists?(active_storage_blobs: {checksum: checksum})
tempfile.unlink
next
end
#... Your attachment saving code here
end
Note: Remember to require 'tempfile' in the class where you are using this
What happens if they change the filename anyway (which happens many times with things like filename(2).xlsx) but the content is the same?
Maybe a better approach would be to compare the checksum? I believe that the ActiveStorage object will already store that, for saved files. You could do something like:
attachments.each do |attachment|
tempfile = open(attachment[:url], http_basic_authentication: ["api", ENV.fetch("MAILGUN_API_KEY")])
checksum = Digest::MD5.file(tempfile.path).base64digest
# i'd like to do something like this
next if task.documents.where(checksum: checksum).exist?
#...
end
That way you know it is the same physical file regardless of the incoming filename.

Rails 5.2 Import XLSX with ActiveStorage and Creek

I have a model called ImportTemp which is used for store imported XLSX file to database. I'm using ActiveStorage for storing the files.
this is the model code:
class ImportTemp < ApplicationRecord
belongs_to :user
has_one_attached :file
has_one_attached :log_result
end
this is my import controller code:
def import
# Check filetype
case File.extname(params[:file].original_filename)
when ".xlsx"
# Add File to ImportFile model
import = ImportTemp.new(import_type: 'UnitsUpload', user: current_user)
import.file.attach(params[:file])
import.save
# Import unit via sidekiq with background jobs
ImportUnitWorker.perform_async(import.id)
# Notice
flash.now[:notice] = "We are processing your xlsx, we will inform you after it's done via notifications."
# Unit.import_file(xlsx)
else flash.now[:error] = t('shared.info.unknown')+": #{params[:file].original_filename}"
end
end
after upload the xlsx file, then the import will be processed in sidekiq. This is the worker code (still doesn't do the import actually) :
class ImportUnitWorker
include Sidekiq::Worker
sidekiq_options retry: false
def perform(file_id)
import_unit = ImportTemp.find(file_id)
# Open the uploaded xlsx to Creek
creek = Creek::Book.new(Rails.application.routes.url_helpers.rails_blob_path(import_unit.file, only_path: true))
sheet = creek.sheets[0]
puts "Opening Sheet #{sheet.name}"
sheet.rows.each do |row|
puts row
end
units = []
# Unit.import(units)
end
but after i tried it, it gives me error:
Zip::Error (File /rails/active_storage/blobs/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBCZz09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--3960b6ba5b55f7004e09967d16dfabe63f09f0a9/2018-08-10_10_39_audit_gt.xlsx not found)
but if i tried to open it with my browser, which is the link looks like this:
http://localhost:3000/rails/active_storage/blobs/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBCZz09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--3960b6ba5b55f7004e09967d16dfabe63f09f0a9/2018-08-10_10_39_audit_gt.xlsx
it's working and the xlsx is downloaded. My question is what's wrong with it? why the file is not found in the sidekiq?
I ended up using Tempfile as suggested by George Claghorn. I don't know if this is the best solution or best practices, but it works for me now. I'm going to use this solution while waiting Rails 6 stable to come out with ActiveStorage::Blob#open feature.
def perform(file_id)
import = ImportTemp.find(file_id)
temp_unit = Tempfile.new([ 'unit_import_temp', '.xlsx' ], :encoding => 'ascii-8bit')
units = []
begin
# Write xlsx from ImportTemp to Tempfile
temp_unit.write(import.file.download)
# Open the temp xlsx to Creek
book = Creek::Book.new(temp_unit.path)
sheet = book.sheets[0]
sheet.rows.each do |row|
# Skip the header
next if row.values[0] == 'Name' || row.values[1] == 'Abbreviation'
cells = row.values
# Add cells to new Unit
unit = Unit.new(name: cells[0], abbrev: cells[1], desc: cells[2])
units << unit
end
# Import the unit
Unit.import(units)
ensure
temp_unit.close
temp_unit.unlink # deletes the temp file
end
end
Rails.application.routes.url_helpers.rails_blob_path doesn’t return the path to the file on disk. Rather, it returns a path that can be combined with a hostname to produce an URL for downloading the file, for use in links.
You have two options:
If you’d prefer to keep ImportUnitWorker indifferent to the storage service in use, “download” the file to a tempfile on disk. Switch to Rails master and use ActiveStorage::Blob#open:
def perform(import_id)
import = ImportTemp.find(import_id)
units = []
import.file.open do |file|
book = Creek::Book.new(file.path)
sheet = creek.sheets[0]
# ...
end
Unit.import(units)
end
If you don’t mind ImportWorker knowing that you use the disk service, ask the service for the path to the file on disk. ActiveStorage::Service::DiskService#path_for(key) is private in Rails 5.2, so either forcibly call it with send or upgrade to Rails master, where it’s public:
def perform(import_id)
import = ImportTemp.find(import_id)
units = []
path = ActiveStorage::Blob.service.send(:path_for, import.file.key)
book = Creek::Book.new(path)
sheet = creek.sheets[0]
# ...
Unit.import(units)
end
The answer now seems to be (unless I am missing something):
Creek::Book.new file.service_url, check_file_extension: false, remote: true

How to handle a file_as_string (generated by Prawn) so that it is accepted by Carrierwave?

I'm using Prawn to generate a PDF from the controller of a Rails app,
...
respond_to do |format|
format.pdf do
pdf = GenerateReportPdf.new(#object, view_context)
send_data pdf.render, filename: "Report", type: "application/pdf", disposition: "inline"
end
end
This works fine, but I now want to move GenerateReportPdf into a background task, and pass the resulting object to Carrierwave to upload directly to S3.
The worker looks like this
def perform
pdf = GenerateReportPdf.new(#object)
fileString = ???????
document = Document.new(
object_id: #object.id,
file: fileString )
# file is field used by Carrierwave
end
How do I handle the object returned by Prawn (?????) to ensure it is a format that can be read by Carrierwave.
fileString = pdf.render_file 'filename' writes the object to the root directory of the app. As I'm on Heroku this is not possible.
file = pdf.render returns ArgumentError: string contains null byte
fileString = StringIO.new( pdf.render_file 'filename' ) returns TypeError: no implicit conversion of nil into String
fileString = StringIO.new( pdf.render ) returns ActiveRecord::RecordInvalid: Validation failed: File You are not allowed to upload nil files, allowed types: jpg, jpeg, gif, png, pdf, doc, docx, xls, xlsx
fileString = File.open( pdf.render ) returns ArgumentError: string contains null byte
....and so on.
What am I missing? StringIO.new( pdf.render ) seems like it should work, but I'm unclear why its generating this error.
It turns out StringIO.new( pdf.render ) should indeed work.
The problem I was having was that the filename was being set incorrectly and, despite following the advise below on Carrierwave's wiki, a bug elsewhere in the code meant that the filename was returning as an empty string. I'd overlooked this an assumed that something else was needed
https://github.com/carrierwaveuploader/carrierwave/wiki/How-to:-Upload-from-a-string-in-Rails-3
my code ended up looking like this
def perform
s = StringIO.new(pdf.render)
def s.original_filename; "my file name"; end
document = Document.new(
object_id: #object.id
)
document.file = s
document.save!
end
You want to create a tempfile (which is fine on Heroku as long as you don't expect it to persist across requests).
def perform
# Create instance of your Carrierwave Uploader
uploader = MyUploader.new
# Generate your PDF
pdf = GenerateReportPdf.new(#object)
# Create a tempfile
tmpfile = Tempfile.new("my_filename")
# set to binary mode to avoid UTF-8 conversion errors
tmpfile.binmode
# Use render to write the file contents
tmpfile.write pdf.render
# Upload the tempfile with your Carrierwave uploader
uploader.store! tmpfile
# Close the tempfile and delete it
tmpfile.close
tmpfile.unlink
end
Here's a way you can use StringIO like Andy Harvey mentioned, but without adding a method to the StringIO intstance's eigenclass.
class VirtualFile < StringIO
attr_accessor :original_filename
def initialize(string, original_filename)
#original_filename = original_filename
super(string)
end
end
def perform
pdf_string = GenerateReportPdf.new(#object)
file = VirtualFile.new(pdf_string, 'filename.pdf')
document = Document.new(object_id: #object.id, file: file)
end
This one took me couple of days, the key is to call render_file controlling the filepath so you can keep track of the file, something like this:
in one of my Models e.g.: Policy i have a list of documents and this is just the method for updating the model connected with the carrierwave e.g.:PolicyDocument < ApplicationRecord mount_uploader :pdf_file, PdfDocumentUploader
def upload_pdf_document_file_to_s3_bucket(document_type, filepath)
policy_document = self.policy_documents.where(policy_document_type: document_type)
.where(status: 'processing')
.where(pdf_file: nil).last
policy_document.pdf_file = File.open(file_path, "r")
policy_document.status = 's3_uploaded'
policy_document.save(validate:false)
policy_document
rescue => e
policy_document.status = 's3_uploaded_failed'
policy_document.save(validate:false)
Rails.logger.error "Error uploading policy documents: #{e.inspect}"
end
end
in one of my Prawn PDF File Generators e.g.: PolicyPdfDocumentX in here please note how im rendering the file and returning the filepath so i can grab from the worker object itself
def generate_prawn_pdf_document
Prawn::Document.new do |pdf|
pdf.draw_text "Hello World PDF File", size: 8, at: [370, 462]
pdf.start_new_page
pdf.image Rails.root.join('app', 'assets', 'images', 'hello-world.png'), width: 550
end
end
def generate_tmp_file(filename)
file_path = File.join(Rails.root, "tmp/pdfs", filename)
self.generate_prawn_pdf_document.render_file(file_path)
return filepath
end
in the "global" Worker for creating files and uploading them in the s3 bucket e.g.: PolicyDocumentGeneratorWorker
def perform(filename, document_type, policy)
#here we create the instance of the prawn pdf generator class
pdf_generator_class = document_type.constantize.new
#here we are creating the file, but also `returning the filepath`
file_path = pdf_generator_class.generate_tmp_file(filename)
#here we are simply updating the model with the new file created
policy.upload_pdf_document_file_to_s3_bucket(document_type, file_path)
end
finally how to test, run rails c and:
the_policy = Policies.where....
PolicyDocumentGeneratorWorker.new.perform('report_x.pdf', 'PolicyPdfDocumentX',the_policy)
NOTE: im using meta-programming in case we have multiple and different file generators, constantize.new is just creating new prawn pdf doc generator instance so is similar to PolicyPdfDocument.new that way we can only have one pdf doc generator worker class that can handle all of your prawn pdf documents so for instance if you need a new document you can simply PolicyDocumentGeneratorWorker.new.perform('report_y.pdf', 'PolicyPdfDocumentY',the_policy)
:D
hope this helps someone to save some time

Rails: Upload CSV file without header

I followed railscasts #396 Importing CSV and implemented CSV upload in my rails project.
This is my view file:
<%= form_tag import_customers_path, multipart: true do %>
<%= file_field_tag :file %>
<%= submit_tag "Import" %>
<% end %>
This is my controller action:
def import
current_user.customers.import(params[:file])
redirect_to customers_path, notice: "Users imported."
end
And these are my model methods:
def self.to_csv(options = {})
CSV.generate(options) do |csv|
csv << column_names
all.each do |customer|
csv << customer.attributes.values_at(*column_names)
end
end
end
def self.import(file)
CSV.foreach(file.path, headers: true) do |row|
Customer.create! row.to_hash
end
end
Here I don't want user to include header in CSV. When I replace headers: true with headers: false, I get error:
NoMethodError in CustomersController#import
undefined method `to_hash' for ["abc#wer.com"]:Array
Can anybody tell how to upload CSV files without need of header line?
As far as upload and handling of the CSV file goes, you're very, very close. You just have an issue with reading the rows of data to populate the database with, via the Customer.create! call
It looks like you've been testing with a CSV file that only has a single line of data. With the headers: true, that single line was converted to headers and subsequently ignored in the CSV.foreach iterator. So, in effect, you had no data in the file, and no iterations occurred. If you had two rows of data in the input file, you'd have encountered the error, anyway.
Now, when you use headers: false, that line of data is treated as data. And that's where the issue lies: handling the data isn't done correctly.
Since there's no schema in your question, I'll assume a little bit of leeway on fields; you should be able to extrapolate pretty easily to make it work in your situation. This code shows how it works:
CSV.parse(csv_data, headers: false) do |row|
hash = {
first_name: row[0],
last_name: row[1],
age: row[2],
phone: row[3],
address: row[4]
}
Customer.create!(hash)
end
If you wanted a CSV version with headers, this would work well in this case, and has the benefit of not allowing arbitrary access to columns that shouldn't be assigned from an outside source:
CSV.parse(csv_data, headers: true, header_converters: :symbol) do |row|
hash = {
first_name: row[:first_name],
surname: row[:last_name],
birth_year: Date.today - row[:age],
phone: row[:phone],
street_address: row[:address]
}
Customer.create!(hash)
end
Note that the Customer#to_csv in your model is not quite correct, either. First, it creates the CSV file with a header, so you wouldn't be able to export and then import again with this implementation. Next, the header fields variable column_names is not actually defined in this code. Finally, the code doesn't control the order of columns written to the CSV, which means that the headers and values could possibly go out of sync. A correct (non-header) version of this is very simple:
csv_data = CSV.generate do |csv|
csv.each do |customer|
csv << [customer.first_name, customer.last_name, customer.age, customer.phone, customer.address]
end
end
The header-based version is this:
csv_data = CSV.generate do |csv|
csv << ["First Name","Last Name","Age","Phone","Address"]
csv.each do |customer|
csv << [customer.first_name, customer.last_name, customer.age, customer.phone, customer.address]
end
end
Personally, I'd use the header-based version, because it's far more robust, and it's easy to understand which columns are which. If you've ever received a headerless CSV file and had to figure out how to make sense of it without any keys, you'd know why the header is important.
You could just load the CSV file into an array of arrays and remove the first row:
data = CSV.read("path/to/file.csv")
data = data[1..-1]
However this will store the data as an array of values only.
When you use headers: true it uses a hash where the keys are the column header names.

How can I save the response created by my Rails application?

There is CSV-export of some objects (such as tasks, contacts, etc) in my application. It just renders CSV-file like this:
respond_to do |format|
format.html
format.csv { render text: Task.to_csv } # I have self.to_csv def in model
end
It generates a CSV file when I go to '/tasks.csv' without a problem.
Now I want to export all the objects and zip them. I'm using rubyzip gem to create zip-files. Now my code for creating zip-file with all the CSVs looks like that:
Zip::ZipFile.open("#{path_to_file}.zip", Zip::ZipFile::CREATE) do |zipfile|
zipfile.file.open("tasks.csv", "w") { |f| f << open("http://#{request.host}:#{request.port.to_s}/tasks.csv").read }
# the same lines for contacts and other objects
end
But it seems that there is something wrong with it because it's executing for a long time (I'm getting Timeout::Error even if there is just one line in CSV) and the resulting zip-archive contains something broken.
How can I save my "/tasks.csv", "/contacts.csv", etc as a file on server (inside of zip-archive in this case)?
I did it! The code is:
Zip::ZipFile.open("#{path_to_file}.zip", Zip::ZipFile::CREATE) do |zipfile|
zipfile.file.open("tasks.csv", "w") do |f|
CSV.open(f, "w") do |csv|
CSV.parse(Task.to_csv) { |row| csv << row }
end
end
end

Resources