Extracting excel data after unzipping using rubyzip - ruby-on-rails

I'm trying to get spreadsheet data from zipped .xlsx files. I'm using rubyzip to access the contents of the zipfile
Zip::File.open(file_path) do |zip_file|
zip_file.each do |entry|
*process entry*
end
end
My problem is that rubyzip gives a Zip::Entry object, which, I cant get to work with gems like roo or creek.
I've done something similar, but with .csv file. This was as simple as CSV.parse(entry.get_input_stream.read). However, that just gives me a string of encoded gibberish when using it on an .xlsx file.
I've looked around and the closest answer I got was temporarily extracting the files, but I want to avoid doing this since the files can get pretty large.
Does anyone have any suggestions? Thanks in advance.

So what you need to do is convert the stream into an IO object that Roo can understand.
To determine if the object passed to Roo::Spreadsheet.open is a "stream" Roo uses the following method:
def is_stream?(filename_or_stream)
filename_or_stream.respond_to?(:seek)
end
Since a Zip::InputStream does not respond to seek you cannot use this object directly. To get around this we simply need an object that does respond to seek (like a StringIO)
We can just read the input stream into the StringIO directly:
stream = StringIO.new(entry.get_input_stream.read)
Or the Zip library also provides a method to copy a Zip::InputStream to another IO object through the IOExtras module, which I think reads fairly nicely as well.
Knowing all of the above we can implement as follows:
Zip::File.open(file_path) do |zip_file|
zip_file.each do |entry|
# make sure Roo can handle the file (at least based on the extension)
ext = File.extname(entry.name)&.to_sym
next unless Roo::CLASS_FOR_EXTENSION[ext]
# stream = StringIO.new(entry.get_input_stream.read)
::Zip::IOExtras.copy_stream(stream = StringIO.new, entry.get_input_stream)
spreadsheet = Roo::Spreadsheet.open(stream, extension: ext)
# process file
end
end

Related

How can I get an image from a direct download link in ruby?

I'm pulling records from a website and attempting to store one of the images it returns (a BMP file). The issue is, the website only returns a direct download link, no preview. A lot like this link (but instead mine is a BMP not a PDF)
There's no preview, just an immediate download.
There doesn't appear to be a way to generate a different link, and I don't know how to handle this url with rails! I just need to save it to my project/local file tree. Any ideas?
You have to perform 3 operations: get the contents of the url, write them on disk and attach them to you model.
Get contents of url
Ruby has various libraries to handle HTTP get requests. There is of course the standard library Net::HTTP and other HTTP client gems as max said. I have used some of them, my favorite choice is http.rb but you can choose whatever you like.
Write stream to disk
You should choose a folder and a filename and write the stream.
Attach data to model
There are several gems to deal with attachments too. If you prefer ActiveStorage you can check method attach
A naive implementation may look like this:
# You can use and other HTTP gem or standard Net::HTTP
gem 'http'
require 'http'
url = 'https://www.ruby-lang.org/images/header-ruby-logo.png'
# You can set timeouts and other options here
response = HTTP.follow(max_hops: 2).get(url)
# You can check for statuses or other responses
return if response.status != 200 || response.content_type.nil?
# You can grab filename from url or set another filename
filename = SecureRandom.hex
path = File.join('tmp', filename)
# Write stream somewhere
file = File.open(path, 'wb')
response.body.each do |chunk|
file.write(chunk)
end
# Suppose you use ActiveStorage, you can use the `attach` method
your_model.you_attribute.attach(
io: File.open(path),
filename: filename,
content_type: response.content_type.mime_type
)

Errno::ENOENT: No such file or directory in finding duration of video from S3

Using gem paperclip-ffmpeg in rails for videos.It works fine but when I find the duration of video in seconds it gives me an error
Errno::ENOENT: No such file or directory - the file 'http://getpayad-dev.s3.amazonaws.com/ads/videos/000/000/014/original/Ufone_Tarzan_commercial_%28Ufone_Network_Quality%29_most_Funny_Ad.mp4?1451555000' does not exist
from /home/des0071/.rvm/gems/ruby-2.2.1/gems/streamio-ffmpeg-1.0.0/lib/ffmpeg/movie.rb:11:in `initialize
My code is
movie = FFMPEG::Movie.new("#{self.video.url}")
Well, the FFMPEG::Movie.new definition is found here: streamio-ffmpeg/movie.rb
raise Errno::ENOENT, "the file '#{path}' does not exist" unless File.exists?(path)
#ruby 2.2.0p0 (2014-12-25 revision 49005)
File.exists?("http://getpayad-dev.s3.amazonaws.com/ads/videos/000/000/014/original/Ufone_Tarzan_commercial_%28Ufone_Network_Quality%29_most_Funny_Ad.mp4?1451555000")
=> false
The problem is with ruby's File Class. So I tried this:
File.exists?("http://www.google.com")
=> false
OK, so either google isn't online or File can't take URI as a parameter.
A File is an abstraction of any file object accessible by the program and is closely associated with class IO File includes the methods of module FileTest as class methods, allowing you to write (for example) File.exist?("foo").
Class: File Ruby 2.2.0
So, File Class is really a child of IO, what does IO say?
Many of the examples in this section use the File class, the only standard subclass of IO. The two classes are closely associated. Like the File class, the Socket library subclasses from IO (such as TCPSocket or UDPSocket).
Class:IO Ruby 2.2.0
It looks like the reason for the error is due to inheritance, or because the gem is not designed to stream the file over http.
As the previous answer suggestions, ffmpeg doesn't appear to be able to fetch the file over HTTP -- it's expecting a local file.
Depending upon how your files are encoded, they may have metadata at the beginning or end of the file which contains this information.
Subsequently a potential approach is to grab the first or last ~100KB of the file and check for the MOOV atom/metadata there.

How to implement a user creating a file via form and then downloading it in rails?

I am building an online translation platform. When a job is done being translated and saved, I want the user to be able to download the translated version as a text file. It is currently saved in a string in the model called "target_text".
I know in ruby I can use this method:
File.open("translation.txt", 'w') {|f| f.write("my translated string") }
I am assuming I could tack the location for the file to be saved in front of the "translation.txt", but I am not sure what folder within my app I should specify?
Furthermore I want this file to be attached to the "job" object in the same way that paperclip can attach files, the difference being it's initiated server side. How should I go about this?
I have googled all over looking for an answer to this, and I want to make sure I do it in the cleanest way possible. I would really appreciate even directions to a good place to look to understand this concept.
I don't quite understand the question, but I hope this could help...
Instead of using
File.open("translation.txt", 'w') {|f| f.write("my translated string") }
try using the following
Tempfile.open(['translation', '.txt'], Rails.root.join('tmp')) do |file|
# this will create a temp file in RAILS_ROOT/tmp/ folder
# you can replace the 'translation' text part to any auto generated text for example
# Tempfile.open([#user.id.to_s, '_translation.txt'] will create
# RAILS_ROOT/tmp/1_translation.1fe2ed.txt
# the 1fe2ed is generated by Tempfile to avoid conflicting
begin
file << "my translated string"
# this creates the file
# add all the processing you need here... cause the next ensure block
# will close and delete this temp file... so that the tmp dir doesn't get big.
# you can for example add the file to paperclip attachment
#user.translation = file
# assuming that user has paperclip attachment called translation
ensure
# close and delete file
file.close
file.unlink
end
end
also check the Tempfile docs... this is the practice i've been using... not sure if it's the best or not.. but it didn't create any issues so far
(even with paperclip s3 storage)

How do you access the raw content of a file uploaded with Paperclip / Ruby on Rails?

I'm using Paperclip / S3 for file uploading. I upload text-like files (not .txt, but they are essentially a .txt). In a show controller, I want to be able to get the contents of the uploaded file, but don't see contents as one of its attributes. What can I do here?
attachment_file_name: "test.md", attachment_content_type: "application/octet-stream", attachment_file_size: 58, attachment_updated_at: "2011-06-22 01:01:40"
PS - Seems like all the Paperclip tutorials are about images, not text files.
In Paperclip 3.0.1 you could just use the io_adapter which doesn't require writing an extra file to (and removing from) the local file system.
Paperclip.io_adapters.for(attachment.file).read
#jon-m answer needs to be updated to reflect the latest changes to paperclip, in order for this to work needs to change to something like:
class Document
has_attached_file :revision
def revision_contents(path = 'tmp/tmp.any')
revision.copy_to_local_file :original, path
File.open(path).read
end
end
A bit convoluted as #jwadsack mentioned using Paperclip.io_adapters.for method accomplishes the same and seems like a better, cleaner way to do this IMHO.
To access the file you can use the path method:
csv_file.path
http://rdoc.info/gems/paperclip/Paperclip/Attachment#path-instance_method
This can be used along with for example the CSV reader.
Here's how I access the raw contents of my attachment:
class Document
has_attached_file :revision
def revision_contents
revision.copy_to_local_file.read
end
end
Please note, I've omitted my paperclip configuration options and any sort of error handling.
You would need to load the contents of the file (using Rubys File.open) into a variable before you show it. This may be an expensive operation if your app gets lots of use, so it may be worthwhile reading the contents of the file and putting it into a text column in your database after uploading it.
Attachment already inherits from IOStream. http://rdoc.info/github/thoughtbot/paperclip/master/Paperclip/Attachment
So it should just be "#{attachment}" or <% RDiscount.new(attachment).to_html %> or send_data(attachment). However you wanted to display the data.
This is a method I used for upload from paperclip to active storage and should provide some guidance on temporarily working with a file in memory. Note: This should only be used for relatively small files.
Written for gem paperclip 6.1.0
Where I have a simple model
class Post
has_attached_file :image
end
Working with a temp file in ruby so we do not have to worry about closing the file
Tempfile.create do |tmp_file|
post.image.copy_to_local_file(nil, tmp_file.path)
post.image_temp.attach(
io: tmp_file,
filename: post.image_file_name,
content_type: post.image_content_type
)
end

Importing old data with Rails and Paperclip

I'm using paperclip for attachments in my application. I'm writing an import script for a bunch of old data, but I don't know how to create paperclip objects from files on disk. My first guess is to create mock CGI multipart objects, but that seems like a bit of a crude solution, and my initial attempt failed, I think because I didn't get the to_tempfile method right.
Is there a Right Way to do this? It seems like something that should be fairly easy.
I know that I've done the same thing, and I believe that I just created a File object from the path to each file, and assigned it to the image attribute. Paperclip will run on that file:
thing.image = File.new("/path/to/file.png")
thing.save
This works great for local files but it doesn't work as well for remote files. I have an app that uses paperclip for uploading images. Those images are getting stored on amazon s3. Anyway, I had some old data that I needed to import so I tried the following:
thing.image = open('http://www.someurl.com/path/to/image.jpg')
thing.save
If the file is small (say, less than 10K) then openuri returns a stringio object and my file would get stored on s3 as stringio.txt
If the file is larger than around 10K, then openuri returns a TempFile object. But the filename on s3 ends up being unique, but not really relating to the original filename of image.jpg
I was able to fix the problem by doing the following:
remote_photo = open('http://www.someurl.com/path/to/image.jpg')
def remote_photo.original_filename;base_uri.path.split('/').last; end
thing.image = remote_photo
thing.save

Resources