Rails: open() returns StringIO instead of Tempfile - ruby-on-rails

I have two valid URL's to two images.
When I run open() on the first URL, it returns an object of type Tempfile (which is what the fog gem expects to upload the image to AWS).
When I run open() on the second URL, it returns an object of type StringIO (which causes the fog gem to crash and burn).
Why is open() not returning a Tempfile for the second URL?
Further, can open() be forced to always return Tempfile?
From my Rails Console:
2.2.1 :011 > url1
=> "https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xpf1/v/t1.0-1/c0.0.448.448/10298878_10103685138839040_6456490261359194847_n.jpg?oh=e2951e1a1b0a04fc2b9c0a0b0b191ebc&oe=56195EE3&__gda__=1443959086_417127efe9c89652ec44058c360ee6de"
2.2.1 :012 > url2
=> "https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xfa1/v/t1.0-1/c0.17.200.200/1920047_10153890268465074_1858953512_n.jpg?oh=5f4cdf53d3e59b8ce4702618b3ac6ce3&oe=5610ADC5&__gda__=1444367255_396d6fdc0bdc158e4c2e3127e86878f9"
2.2.1 :013 > t1 = open(url1)
=> #<Tempfile:/var/folders/58/lpjz5b0n3yj44vn9bmbrv5180000gn/T/open-uri20150720-24696-1y0kvtd>
2.2.1 :014 > t2 = open(url2)
=> #<StringIO:0x007fba9c20ae78 #base_uri=#<URI::HTTPS https://fbcdn-profile-a.akamaihd.net/hprofile-ak-xfa1/v/t1.0-1/c0.17.200.200/1920047_10153890268465074_1858953512_n.jpg?oh=5f4cdf53d3e59b8ce4702618b3ac6ce3&oe=5610ADC5&__gda__=1444367255_396d6fdc0bdc158e4c2e3127e86878f9>, #meta={"last-modified"=>"Tue, 25 Feb 2014 19:47:06 GMT", "content-type"=>"image/jpeg", "timing-allow-origin"=>"*", "access-control-allow-origin"=>"*", "content-length"=>"7564", "cache-control"=>"no-transform, max-age=1209600", "expires"=>"Mon, 03 Aug 2015 22:01:40 GMT", "date"=>"Mon, 20 Jul 2015 22:01:40 GMT", "connection"=>"keep-alive"}, #metas={"last-modified"=>["Tue, 25 Feb 2014 19:47:06 GMT"], "content-type"=>["image/jpeg"], "timing-allow-origin"=>["*"], "access-control-allow-origin"=>["*"], "content-length"=>["7564"], "cache-control"=>["no-transform, max-age=1209600"], "expires"=>["Mon, 03 Aug 2015 22:01:40 GMT"], "date"=>["Mon, 20 Jul 2015 22:01:40 GMT"], "connection"=>["keep-alive"]}, #status=["200", "OK"]>
This is how I'm using fog:
tempfile = open(params["avatar"])
user.avatar.store!(tempfile)

I assume you are using Ruby's built-in open-uri library that allows you to download URLs using open().
In this case Ruby is only obligated to return an IO object. There is no guarantee that it will be a file. My guess is that Ruby makes a decision based on memory consumption: if the download is large, it puts it into a file to save memory; otherwise it keeps it in memory with a StringIO.
As a workaround, you could write a method that writes the stream to a tempfile if it is not already downloaded to a file:
def download_to_file(uri)
stream = open(uri, "rb")
return stream if stream.respond_to?(:path) # Already file-like
Tempfile.new.tap do |file|
file.binmode
IO.copy_stream(stream, file)
stream.close
file.rewind
end
end
If you're looking for a full-featured gem that does something similar, take a look at "down": https://github.com/janko-m/down

The open uri library has 10K size limit for choose either StringIO or Tempfile.
My suggestion for you is change to constant OpenURI::Buffer::StringMax, that used for open uri set default
In your initializer you could make this:
require 'open-uri'
OpenURI::Buffer.send :remove_const, 'StringMax' if OpenURI::Buffer.const_defined?('StringMax')
OpenURI::Buffer.const_set 'StringMax', 0

This doesn't answer my question - but it provides a working alternative using the httparty gem:
require "httparty"
File.open("file.jpg", "wb") do |tempfile|
tempfile.write HTTParty.get(params["avatar"]).parsed_response
user.avatar.store!(tempfile)
end

Related

How do I parse an Excel file that will give me data exactly as it appears visually?

I'm on Rails 5 (Ruby 2.4). I want to read an .xls doc and I would like to get the data into CSV format, just as it appears in the Excel file. Someone recommended I use Roo, and so I have
book = Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
text = sheet.to_csv
arr_of_arrs = CSV.parse(text)
However what is getting returned is not the same as what I see in the spreadsheet. For isntance, a cell in the spreadsheet has
16:45.81
and when I get the CSV data from above, what is returned is
"0.011641319444444444"
How do I parse the Excel doc and get exactly what I see? I don't care if I use Roo to parse or not, just as long as I can get CSV data that is a representation of what I see rather than some weird internal representation. For reference the file type I was parsing givies this when I run "file name_of_file.xls" ...
Composite Document File V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1252, Author: Dwight Schroot, Last Saved By: Dwight Schroot, Name of Creating Application: Microsoft Excel, Create Time/Date: Tue Sep 21 17:05:21 2010, Last Saved Time/Date: Wed Oct 13 16:52:14 2010, Security: 0
You need to save the custom formula in a text format on the .xls side. If your opening the .xls file from the internet this won't work but this will fix your problem if you can manipulate the file. You can do this using the function =TEXT(A2, "mm:ss.0") A2 is just the cell I'm using as an example.
book = ::Roo::Spreadsheet.open(file_location)
puts book.cell('B', 2)
=> '16.45.8'
If manipulating the file is not an option you could just pass a custom converter to CSV.new() and convert the decimal time back to the correct format you need.
require 'roo-xls'
require 'csv'
CSV::Converters[:time_parser] = lambda do |field, info|
case info[:header].strip
when "time" then begin
# 0.011641319444444444 * 24 hours * 3600 seconds = 1005.81
parse_time = field.to_f * 24 * 3600
# 1005.81.divmod(60) = [16, 45.809999999999999945]
mm, ss = parse_time.divmod(60)
# returns "16:45.81"
time = "#{mm}:#{ss.round(2)}"
time
rescue
field
end
else
field
end
end
book = ::Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
csv = CSV.new(sheet.to_csv, headers: true, converters: [:time_parser]).map {|row| row.to_hash}
puts csv
=> {"time "=>"16:45.81"}
{"time "=>"12:46.0"}
Under the hood roo-xls gem uses the spreadsheet gem to parse the xls file. There was a similar issue to yours logged here, but it doesn't appear that there was any real resolution. Internally xls stores 16:45.81 as a Number and associates some formatting with it. I believe the issue has something to do with the spreadsheet gem not correctly handling the cell format.
I did try messing around with adding a format mm:ss.0 by following this guide but I couldn't get it to work, maybe you'll have more luck.
You can use converters option. It seems looking like this:
arr_of_arrs = CSV.parse(text, {converters: :date_time})
http://ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html
Your problem seems to be with the way you're parsing (reading) the input file.
roo parses only Excel 2007-2013 (.xlsx) files. From you question, you want to parse .xls, which is a different format.
Like the documentation says, use the roo-xls gem instead.

How do I create a checksum of carrierwave upload to verify the download?

How do I create a checksum (MD5, sha512, whatever) of a file when I upload it, so that when I download (using cache_stored_file!), I can verify that it is indeed the original file that was uploaded?
The Ruby Digest module can help with this.
One way solution would be to read the file on upload and assign it a unique digest with a before_create callback. I would add it as a column on the file table in your database.
Here's some output from IRB to show how it would work:
2.2.2 :001 > require 'digest'
=> true
2.2.2 :002 > f = File.read 'test.rb'
=> "Original content\n"
2.2.2 :003 > Digest::SHA256.hexdigest(f)
=> "646722e7ee99e28d618142b9d3a1bfcbe2196d8332ae632cc867ae5d1c8c57b5"
# (... file modified ...)
2.2.2 :004 > f = File.read 'test.rb'
=> "Original content with more content\n"
2.2.2 :005 > Digest::SHA256.hexdigest(f)
=> "c29f2f77c0777a78dbdf119bf0a58b470c098635dfc8279542e4c49d6f20e62c"
You can use this digest in your download method to check the integrity of the file. If you read the file again, produce a digest, and it matches the original digest, you can be confident the file hasn't been altered since it was uploaded.
Ruby Digest Module
md5 = Digest::MD5.file('path_to_file').hexdigest
This would read file in blocks and avoid reading the whole file in RAM which is done in File.read()
For SHA checksum
Digest::SHA2.hexdigest( File.read("/path/to/my/file.txt") );
OR
Digest::SHA2.file(myFile).hexdigest
=> "fa5880ac744f3c05c649a864739530ac387c8c8b0231a7d008c27f0f6a2753c7"
More details for SHA checksum generation SHA Checksum

Open URI Wrong Output

I am trying to download images from the web and upload them back to Cloudinary. The code I have works for some images, but not for others. I have isolated the problem down to this line (it requires open-uri):
image = open(params[:product_image][:main])
For this image, it works fine. image is
#<Tempfile:/var/folders/49/bmhbmmzj5fl31dm9j6m6gxr00000gn/T/open-uri20150526-7662-1b676ws>
and cloudinary accepts this. However, when I try to pull this image, image becomes
#<StringIO:0x007fa0267c8f80 #base_uri=#<URI::HTTP:0x007fa0267c92c8 URL:http://www.spiresources.net/WebImages/480/swatch/CELW.JPG>,
#meta={"date"=>"Tue, 26 May 2015 22:17:47 GMT", "server"=>"Apache/2.2.22 (Ubuntu)",
"last-modified"=>"Mon, 29 Jun 2009 00:00:00 GMT", "etag"=>"\"44700f-c35-46d715f090000\"",
"accept-ranges"=>"bytes", "content-length"=>"3125", "content-type"=>"image/jpeg"}, #metas={"date"=>["Tue, 26 May 2015 22:17:47 GMT"], "server"=>["Apache/2.2.22 (Ubuntu)"],
"last-modified"=>["Mon, 29 Jun 2009 00:00:00 GMT"], "etag"=>["\"44700f-c35-46d715f090000\""], "accept-ranges"=>["bytes"],
"content-length"=>["3125"], "content-type"=>["image/jpeg"]}, #status=["200", "OK"]>
which cloudinary rejects and raises an error of "No conversion of StringIO to string". Why does open-uri return different objects for what would seem like similar images? How can I make open-uri return a tempfile or at least turn my StringIO to a tempfile?
You can simply give the URL to the Cloudinary upload method. Then Cloudinary will fetch the remote resource directly.

Parse a date in rails

I have a date (Which is actually parsed from a PDF) and it could be any of the following format:
MM/DD/YYYY
MM/DD/YY
M/D/YY
October 15, 2007
Oct 15, 2007
Is there any gem or function available in rails or ruby to parse my date?
Or I need to parse it using regex?
BTW I'm using ruby on rails 3.2.
You can try Date.parse(date_string).
You might also use Date#strptime if you need a specific format:
> Date.strptime("10/15/2013", "%m/%d/%Y")
=> Tue, 15 Oct 2013
For a general solution:
format_str = "%m/%d/" + (date_str =~ /\d{4}/ ? "%Y" : "%y")
date = Date.parse(date_str) rescue Date.strptime(date_str, format_str)
I find the chronic gem very easy to use for time parsing and it should work for you. i tried the examples you gave.
https://github.com/mojombo/chronic

Download file from PostgreSQL bytea escape

I have some issue to allow users download file, which stored in PostgreSQL bytea escaped field (http://www.postgresql.org/docs/current/interactive/datatype-binary.html).
1.9.3p385 :023 > data = PG::Connection.unescape_bytea(m[:data])
=> "JVBERi0xLjMKJcTl8uXrp/Og0MTGCjQgMCBvYmoKPDwgL0xlbmd0aCA1IDAg\r\nUiAvRmlsdGVyIC9GbGF0ZURlY29kZSA+Pgpzd..."
1.9.3p385 :023 > data.encoding.name
=> "ASCII-8BIT"
1.9.3p385 :023 > data.bytesize
=> 3878164
But when I used "send_data" or "send_file" with tempfile, I getting file in invalid format (this is pdf file). It much bigger, than original and not opening by pdf readers.
This data in field is mime part of email. If I build raw email from all this parts (using boundary as separator), this email will contain valid pdf attachment.
How should I convert this data to bytes to allow user download this file separately?
See the following: http://rubyforge.org/tracker/index.php?func=detail&aid=27845&group_id=234&atid=967
The syntax is something like PGConn.unescape_bytea($field);
Depending on your version of pg, you may need to upgrade that gem

Resources