save and show scraping image with nokogiri rails - ruby-on-rails

UPDATE 2
i use nokogiri to download content and images to a rails app. For this, i write the script/scraping.rb file:
require 'open-uri'
page_url = "<site_url>"
url = "<page_of_site_url>"
b = Nokogiri::HTML(open(url))
b.css(".col").each do |item|
img_url = item.at(".field-content a img")['src'].split('?')[0]
root_img_url = URI.join(page_url,img_url).to_s
file = File.open(File.join(Rails.root, 'app', 'assets', 'images', File.basename(root_img_url)), 'wb') do |f|
f.write(open(root_img_url).read)
end
Book.create(
:cover => File.basename(root_img_url),
:title => item.at_css(".views-field-title a").text,
:onsaledate => item.at_css(".date-display-single")['content'])
end
end
then i run the script in the terminal:
rails runner script/scraping.rb
Seconds later, i have my database with the content info, and the images in the folder assets/images.
Thanks to #JoshLewis for his interest and special thanks to #mario_chavez for his help.

Related

RubyZip : Unable to find path of file stored in Active Storage

I am using Rails 5.2 and ActiveStorage to let my users upload files in my app. I have all the uploaded files displayed in a table and I can choose to select which ones I want to download. After selecting, I will be able to download all of them in a zip file. To zip the files, I am using Rubyzip and I can't get to work properly.
I have tried two ways :
1 - I tried this way and faced this error No such file or directory # rb_sysopen - /rails/active_storage/blobs/..../2234.pyThis is my controller :
def batch_download
if params["record"].present?
ids = params["record"].to_unsafe_h.map(&:first)
if ids.present?
folder_path = "#{Rails.root}/public/downloads/"
zipfile_name = "#{Rails.root}/public/archive.zip"
FileUtils.remove_dir(folder_path) if Dir.exist?(folder_path)
FileUtils.remove_entry(zipfile_name) if File.exist?(zipfile_name)
Dir.mkdir("#{Rails.root}/public/downloads")
Record.where(id: ids).each do |attachment|
open(folder_path + "#{attachment.file.filename}", 'wb') do |file|
file << open("#{rails_blob_path(attachment.file)}").read
end
end
input_filenames = Dir.entries(folder_path).select {|f| !File.directory? f}
Zip::File.open(zipfile_name, Zip::File::CREATE) do |zipfile|
input_filenames.each do |attachment|
zipfile.add(attachment,File.join(folder_path,attachment))
end
end
send_file(File.join("#{Rails.root}/public/", 'archive.zip'), :type => 'application/zip', :filename => "#{Time.now.to_date}.zip")
end
else
redirect_back fallback_location: root_path
end
end
2 - Secondly I tried to follow the rubyzip documentation and the error was a bit different.
No such file or directory # rb_file_s_lstat - /rails/active_storage/blobs/..../2234.py
if ids.present?
folder = []
input_filenames = []
Record.where(id: ids).each do |attachment|
input_filenames.push("#{attachment.file.filename}")
pre_path = "/rails/active_storage/blobs/"
path_find = "#{rails_blob_path(attachment.file)}"
folder.push(pre_path + path_find.split('/')[4])
end
container = Hash[folder.zip(input_filenames)]
zipfile_name = "/Users/fahimabdullah/Documents/archive.zip"
Zip::File.open(zipfile_name, Zip::File::CREATE) do |zipfile|
# input_filenames.each do |filename|
container.map do |path, filename|
zipfile.add(filename, File.join(path, filename))
end
zipfile.get_output_stream("myFile") { |f| f.write "myFile contains just this" }
end
I expect it to download a zip file containing all the files inside. And this is my first question so please excuse me if the question was too long. Thank you.
I just had a similar problem but saw this hasn't been answered yet. Even though it's a little older and you probably already solved this, here is my approach. The trick here was to create a tempfile in between
def whatever
zip_file = Tempfile.new('invoices.zip')
Zip::File.open(zip_file.path, Zip::File::CREATE) do |zipfile|
invoices.each do |invoice|
next unless invoice.attachment.attached?
overlay = Tempfile.new(['overlay', '.pdf'])
overlay.binmode
overlay.write(invoice.attachment.download)
overlay.close
overlay.path
zipfile.add(invoice.filename, File.join(overlay.path))
end
end
invoices_zip = File.read(zip_file.path)
UserMailer.with(user: user).invoice_export(invoices_zip, 'invoices.zip').deliver_now
ensure
zip_file.close
zip_file.unlink
end

Export large mount of data in a zip

I'm exporting some data from my server to the client.
It's an zip archive but when the amount of data is to big : TimeOut !
#On my controller
def export
filename = 'my_archive.zip'
temp_file = Tempfile.new(filename)
begin
Zip::OutputStream.open(temp_file) { |zos| }
Zip::File.open(temp_file.path, Zip::File::CREATE) do |zip|
#videos.each do |v|
video_file_name = v.title + '.mp4'
zip.add(video_file_name, v.source.file.path(:original))
end
end
zip_data = File.read(temp_file.path)
send_data(zip_data, :type => 'application/zip', :filename => filename)
ensure
temp_file.close
temp_file.unlink
end
end
I'm using PaperClip to attach my video on my app.
Is there any way to create and upload the zip (with a stream?) without a too long wait?
You could try the zipline gem. It claims to be "Hacks on Hacks on Hacks" so heads up! Looks very easy to use though, worth a shot.

Invalid encoding with rqrcode

I'm having an invalid encoding error that doesn't let me save the image to a carrierwave uploader.
require 'rqrcode_png'
img = RQRCode::QRCode.new( 'test', :size => 4, :level => :h ).to_img.to_s
img.valid_encoding?
=> false
I'm not sure if this is what you're looking for, in my case I needed to associate the generated QR code with a Rails model using carrierwave, what I ended up doing was saving the image to a temp file, associating that file with the model and afterwards deleting the temp file, here's my code:
def generate_qr_code!
tmp_path = Rails.root.join('tmp', "some-filename.png")
tmp_file = RQRCode::QRCode.new(self.hash_value).to_img.resize(200,200).save(tmp_path)
# Stream is handed closed, we need to reopen it
File.open(tmp_file.path) do |file|
self.qr_code = file
end
File.delete(tmp_file.path)
self.save!
end

Migrating paperclip S3 images to new url/path format

Is there a recommended technique for migrating a large set of paperclip S3 images to a new :url and :path format?
The reason for this is because after upgrading to rails 3.1, new versions of thumbs are not being shown after cropping (previously cached version is shown). This is because the filename no longer changes (since asset_timestamp was removed in rails 3.1). I'm using :fingerprint in the url/path format, but this is generated from the original, which doesn't change when cropping.
I was intending to insert :updated_at in the url/path format, and update attachment.updated_at during cropping, but after implementing that change all existing images would need to be moved to their new location. That's around half a million images to rename over S3.
At this point I'm considering copying them to their new location first, then deploying the code change, then moving any images which were missed (ie uploaded after the copy), but I'm hoping there's an easier way... any suggestions?
I had to change my paperclip path in order to support image cropping, I ended up creating a rake task to help out.
namespace :paperclip_migration do
desc 'Migrate data'
task :migrate_s3 => :environment do
# Make sure that all of the models have been loaded so any attachments are registered
puts 'Loading models...'
Dir[Rails.root.join('app', 'models', '**/*')].each { |file| File.basename(file, '.rb').camelize.constantize }
# Iterate through all of the registered attachments
puts 'Migrating attachments...'
attachment_registry.each_definition do |klass, name, options|
puts "Migrating #{klass}: #{name}"
klass.find_each(batch_size: 100) do |instance|
attachment = instance.send(name)
unless attachment.blank?
attachment.styles.each do |style_name, style|
old_path = interpolator.interpolate(old_path_option, attachment, style_name)
new_path = interpolator.interpolate(new_path_option, attachment, style_name)
# puts "#{style_name}:\n\told: #{old_path}\n\tnew: #{new_path}"
s3_copy(s3_bucket, old_path, new_path)
end
end
end
end
puts 'Completed migration.'
end
#############################################################################
private
# Paperclip Configuration
def attachment_registry
Paperclip::AttachmentRegistry
end
def s3_bucket
ENV['S3_BUCKET']
end
def old_path_option
':class/:id_partition/:attachment/:hash.:extension'
end
def new_path_option
':class/:attachment/:id_partition/:style/:filename'
end
def interpolator
Paperclip::Interpolations
end
# S3
def s3
AWS::S3.new(access_key_id: ENV['S3_KEY'], secret_access_key: ENV['S3_SECRET'])
end
def s3_copy(bucket, source, destination)
source_object = s3.buckets[bucket].objects[source]
destination_object = source_object.copy_to(destination, {metadata: source_object.metadata.to_h})
destination_object.acl = source_object.acl
puts "Copied #{source}"
rescue Exception => e
puts "*Unable to copy #{source} - #{e.message}"
end
end
Didn't find a feasible method for migrating to a new url format. I ended up overriding Paperclip::Attachment#generate_fingerprint so it appends :updated_at.

Storing image using open URI and paperclip having size less than 10kb

I want to import some icons from my old site. The size of those icons is less than 10kb. So when I am trying to import the icons its returning stringio.txt file.
require "open-uri"
class Category < ActiveRecord::Base
has_attached_file :icon, :path => ":rails_root/public/:attachment/:id/:style/:basename.:extension"
def icon_from_url(url)
self.icon = open(url)
end
end
In rake task.
category = Category.new
category.icon_from_url "https://xyz.com/images/dog.png"
category.save
Try:
def icon_from_url(url)
extname = File.extname(url)
basename = File.basename(url, extname)
file = Tempfile.new([basename, extname])
file.binmode
open(URI.parse(url)) do |data|
file.write data.read
end
file.rewind
self.icon = file
end
To override the default filename of a "fake file upload" in Paperclip (stringio.txt on small files or an almost random temporary name on larger files) you have 2 main possibilities:
Define an original_filename on the IO:
def icon_from_url(url)
io = open(url)
io.original_filename = "foo.png"
self.icon = io
end
You can also get the filename from the URI:
io.original_filename = File.basename(URI.parse(url).path)
Or replace :basename in your :path:
has_attached_file :icon, :path => ":rails_root/public/:attachment/:id/:style/foo.png", :url => "/:attachment/:id/:style/foo.png"
Remember to alway change the :url when you change the :path, otherwise the icon.url method will be wrong.
You can also define you own custom interpolations (e.g. :rails_root/public/:whatever).
You are almost there I think, try opening parsed uri, not the string.
require "open-uri"
class Category < ActiveRecord::Base
has_attached_file :icon, :path =>:rails_root/public/:attachment/:id/:style/:basename.:extension"
def icon_from_url(url)
self.icon = open(URI.parse(url))
end
end
Of course this doesn't handle errors
You can also disable OpenURI from ever creating a StringIO object, and force it to create a temp file instead. See this SO answer:
Why does Ruby open-uri's open return a StringIO in my unit test, but a FileIO in my controller?
In the past, I found the most reliable way to retrieve remote files was by using the command line tool "wget". The following code is mostly copied straight from an existing production (Rails 2.x) app with a few tweaks to fit with your code examples:
class CategoryIconImporter
def self.download_to_tempfile (url)
system(wget_download_command_for(url))
##tempfile.path
end
def self.clear_tempfile
##tempfile.delete if ##tempfile && ##tempfile.path && File.exist?(##tempfile.path)
##tempfile = nil
end
def self.set_wget
# used for retrieval in NrlImage (and in future from other sies?)
if !##wget
stdin, stdout, stderr = Open3.popen3('which wget')
##wget = stdout.gets
##wget ||= '/usr/local/bin/wget'
##wget.strip!
end
end
def self.wget_download_command_for (url)
set_wget
##tempfile = Tempfile.new url.sub(/\?.+$/, '').split(/[\/\\]/).last
command = [ ##wget ]
command << '-q'
if url =~ /^https/
command << '--secure-protocol=auto'
command << '--no-check-certificate'
end
command << '-O'
command << ##tempfile.path
command << url
command.join(' ')
end
def self.import_from_url (category_params, url)
clear_tempfile
filename = url.sub(/\?.+$/, '').split(/[\/\\]/).last
found = MIME::Types.type_for(filename)
content_type = !found.empty? ? found.first.content_type : nil
download_to_tempfile url
nicer_path = RAILS_ROOT + '/tmp/' + filename
File.copy ##tempfile.path, nicer_path
Category.create(category_params.merge({:icon => ActionController::TestUploadedFile.new(nicer_path, content_type, true)}))
end
end
The rake task logic might look like:
[
['Cat', 'cat'],
['Dog', 'dog'],
].each do |name, icon|
CategoryIconImporter.import_from_url {:name => name}, "https://xyz.com/images/#{icon}.png"
end
This uses the mime-types gem for content type discovery:
gem 'mime-types', :require => 'mime/types'

Resources