Retrieve file from GridFS and pass as regular IO::File - ruby-on-rails

I am storing, amongst others, Excel files using GridFs. I'd like to use the Spreadsheet gem to parse these.
I've tried this, but it (obviously!) did not work:
1.9.3p194 :036 > db = Mongo::Connection.new.db(Mongoid.database.name)
1.9.3p194 :037 > grid = Mongo::GridFileSystem.new(db)
1.9.3p194 :038 > f = grid.open('test1.xls', 'r')
=> #<GridIO _id: 500ef7cdc5ebb515c9000005>
1.9.3p194 :039 > Spreadsheet.open(f)
NoMethodError: undefined method `flush' for #<GridIO _id: 500ef7cdc5ebb515c9000005>
Would you have a good suggestion to 'transform' or 'wrap' the GridIO class into an IO::File -like instance so that I can pass the Excel file to the Spreadsheet open method.
The spreadsheet open method takes either an IO instance or a String specifying the path on disk (the latter not being useful when using GridFS):
(Object) open(io_or_path, mode = "rb+", &block)
Thanks!

It looks like this is a desired feature of the ruby driver that doesn't exist yet.
https://jira.mongodb.org/browse/RUBY-368
You could probably pass a block to Spreadsheet.open, as is suggested in the jira ticket:
db = Mongo::Connection.new.db(Mongoid.database.name)
Spreadsheet.open('filename', 'w') do |f|
gridfs = Mongo::GridFileSystem.new(db)
gridfs_file = gridfs.open('test1.xls', 'r')
f.write(gridfs_file.read()) until gridfs_file.eof?
end

Emily's pointer was quite helpful. For the time being, ended up using a temporary file:
Tempfile.open(["test", ".xls"]) do |fh|
gridfs = Mongo::GridFileSystem.new(Mongoid.database)
gridfs_file = gridfs.open('test1.xls', 'r')
fh.binmode
fh.write(gridfs_file.read)
#xls = Excel.new(fh)
fh.close
end

Related

Unknown Attribute when importing from CSV

I am trying to do the following in IRB:
file = CSV.read('branches.csv', headers:true)
file.each do |branch|
Branch.create(attributes:branch.to_hash)
end
branches.csv contains one header entitled business_name which should map onto the attribute for Branch of the same name, but I see the error:
ActiveRecord::UnknownAttributeError: unknown attribute 'business_name' for Branch.
Strangely, doing Branch.create(business_name:'test') works just fine with no issues.
Update:
I think this has something to do with the encoding of the text in the UTF-8 CSV produced by Excel as suggested in the comments below. Not sure if this IRB gives any clues... but our header title business_name != "business_name"
2.3.3 :348 > file = CSV.read('x.csv', headers:true)
#<CSV::Table mode:col_or_row row_count:165>
2.3.3 :349 > puts file.first.to_hash.first.first
business_name
2.3.3 :350 > file = CSV.read('x.csv', headers:true)
#<CSV::Table mode:col_or_row row_count:165>
2.3.3 :351 > puts file.first.to_hash.first.first == "business_name"
false
Just skip the attributes: part. It is not needed at all, because branch.to_hash already returns exactly the format you describe in your last sentence.
file = CSV.read('branches.csv', headers:true)
file.each { |branch| Branch.create(branch.to_hash) }

rails saving recognized link from text to database with rinku

I have a rails app with chat. In the chat for the messages I use rinku gem to recognize links which works well. On the top of this I would like to save the links as message.link without the rest of the text around it from the message.body.
So for example in the code below the user sent the message.body "hi there www.stackoverflow.com" and I would like to save only the "www.stackoverflow.com" as message.link. How can I do that?
view
<p><%= find_links(message.body) %></p>
controller
def find_links(message_body)
Rinku.auto_link(message_body, mode=:all, 'target="_blank"', skip_tags=nil).html_safe
end
it will appear in the DOM as:
<p>hey there http://stackoverflow.com/</p>
and will appear in the db as message.body:
"hey there http://stackoverflow.com/"
UPDATE:
messages controller
require "uri"
def create
.....
if message.save
link = URI.extract(message.body)
update_attribute(message.link = link)
end
You need regular expression to identify URLs from text. Try following regular expression:
/(https?:\/\/(?:www\.|(?!www))[^\s\.]+\.[^\s]{2,}|www\.[^\s]+\.[^\s]{2,})/
Working demo: http://rubular.com/r/bHQdFHZYFH
2.1.2 :001 > str = "hey hi hello www.google.com https://stackoverflow.com http://tech-brains.blogspot.in"
2.1.2 :002 > regexp = /(https?:\/\/(?:www\.|(?!www))[^\s\.]+\.[^\s]{2,}|www\.[^\s]+\.[^\s]{2,})/
2.1.2 :003 > str.scan(regexp)
=> [["www.google.com"], ["https://stackoverflow.com"], ["http://tech-brains.blogspot.in"]]
You can use Ruby code:
2.0.0-p247 :001 > require "uri"
=> true
2.0.0-p247 :002 > URI.extract("hey there http://stackoverflow.com/")
=> ["http://stackoverflow.com/"]
Hope it helps!

Ruby - checking if file is a CSV

I have just wrote a code where I get a csv file passed in argument and treat it line by line ; so far, everything is okay. Now, I would like to secure my code by making sure that what we receive in argument is a .csv file.
I saw in the Ruby doc that it exist a == "--file" option but using it generate an error : the way I understood it, it seems this option only work for the txt files.
Is there a method specific that allowed to check if my file is a csv ? Here some of my code :
if ARGV.empty?
puts "j'ai rien reçu"
# option to check, don't work
elsif ARGV[0].shift == "--file"
# my code so far, whithout checking
else CSV.foreach(ARGV.shift) do |row|
etc, etc...
I think it is unpossible to make a real safe test without additional information.
Just some notes what you can do:
You get a filename in a variable filename.
First, check if it is a file:
File.exist?
Then you could check, if the encoding is correct:
raise "Wrong encoding" unless content.valid_encoding?
Has your csv always the same number of columns? And do you have only one liner?
This can be a possibility to make the next check:
content.each_line{|line|
return false if line.count(sep) < columns - 1
}
This check can be modified for your case, e.g. if you have always an exact number of rows.
In total you can define something like:
require 'csv'
#columns defines the expected numer of columns per line
def csv?(filename, sep: ';', columns: 3)
return false unless File.exist?(filename) #"No file"
content = File.read(filename, :encoding => 'utf-8')
return false unless content.valid_encoding? #"Wrong encoding"
content.each_line{|line|
return false if line.count(sep) < columns - 1
}
CSV.parse(content, :col_sep => sep)
end
if csv = csv?('test.csv')
csv.each do |row|
p row
end
end
You can use ruby-filemagic gem
gem install ruby-filemagic
Usage:
$ irb
irb(main):001:0> require 'filemagic'
=> true
irb(main):002:0> fm = FileMagic.new
=> #<FileMagic:0x7fd4afb0>
irb(main):003:0> fm.file('foo.zip')
=> "Zip archive data, at least v2.0 to extract"
irb(main):004:0>
https://github.com/ricardochimal/ruby-filemagic
Use File.extname() to check the origin file
File.extname("test.rb") #=> ".rb"

Stream Closed IO Error when using CSV Library

I am trying to get an array of hashes from parsing a CSV file using CSV library.
I currently have this method which works:
def rows
rows = []
CSV.foreach(#csv_file.path, headers: true) do |row|
rows << row.to_hash
end
rows
end
but when I change it to this I get the stream closed error.
def rows
CSV.foreach(#csv_file.path, headers: true).map(&:to_hash)
end
thanks
If you look at the source code of ::foreach :
def self.foreach(path, options = Hash.new, &block)
encoding = options.delete(:encoding)
mode = "rb"
mode << ":#{encoding}" if encoding
open(path, mode, options) do |csv|
csv.each(&block)
end
end
It internally, opening the file using CSV::open, with a block. So, once the block is closed, the IO object got closed, internally. Now as you are trying to access the closed IO object, you are getting the error.
From the doc of CSV::open
This method works like Ruby’s open() call, in that it will pass a CSV object to a provided block and close it when the block terminates,...
The IO object returned by ::foreach is actually returned by the CSV::open, within the method def self.foreach ....
Example :
2.1.0 :016 > require 'csv'
=> true
2.1.0 :017 > CSV.open("Gemfile")
=> <#CSV io_type:File io_path:"Gemfile" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
2.1.0 :018 > CSV.open("Gemfile") { |c| c }
=> <#CSV io_type:File io_path:"Gemfile" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"">
2.1.0 :019 > CSV.open("Gemfile") { |c| c }.read
IOError: closed stream

Full url for an image-path in Rails 3

I have an Image, which contains carrierwave uploads:
Image.find(:first).image.url #=> "/uploads/image/4d90/display_foo.jpg"
In my view, I want to find the absolute url for this. Appending the root_url results in a double /.
root_url + image.url #=> http://localhost:3000//uploads/image/4d90/display_foo.jpg
I cannot use url_for (that I know of), because that either allows passing a path, or a list of options to identify the resource and the :only_path option. Since I do't have a resource that can be identified trough "controller"+"action" I cannot use the :only_path option.
url_for(image.url, :only_path => true) #=> wrong amount of parameters, 2 for 1
What would be the cleanest and best way to create a path into a full url in Rails3?
You can also set CarrierWave's asset_host config setting like this:
# config/initializers/carrierwave.rb
CarrierWave.configure do |config|
config.storage = :file
config.asset_host = ActionController::Base.asset_host
end
This ^ tells CarrierWave to use your app's config.action_controller.asset_host setting, which can be defined in one of your config/envrionments/[environment].rb files. See here for more info.
Or set it explicitly:
config.asset_host = 'http://example.com'
Restart your app, and you're good to go - no helper methods required.
* I'm using Rails 3.2 and CarrierWave 0.7.1
try path method
Image.find(:first).image.path
UPD
request.host + Image.find(:first).image.url
and you can wrap it as a helper to DRY it forever
request.protocol + request.host_with_port + Image.find(:first).image.url
Another simple method to use is URI.parse, in your case would be
require 'uri'
(URI.parse(root_url) + image.url).to_s
and some examples:
1.9.2p320 :001 > require 'uri'
=> true
1.9.2p320 :002 > a = "http://asdf.com/hello"
=> "http://asdf.com/hello"
1.9.2p320 :003 > b = "/world/hello"
=> "/world/hello"
1.9.2p320 :004 > c = "world"
=> "world"
1.9.2p320 :005 > d = "http://asdf.com/ccc/bbb"
=> "http://asdf.com/ccc/bbb"
1.9.2p320 :006 > e = "http://newurl.com"
=> "http://newurl.com"
1.9.2p320 :007 > (URI.parse(a)+b).to_s
=> "http://asdf.com/world/hello"
1.9.2p320 :008 > (URI.parse(a)+c).to_s
=> "http://asdf.com/world"
1.9.2p320 :009 > (URI.parse(a)+d).to_s
=> "http://asdf.com/ccc/bbb"
1.9.2p320 :010 > (URI.parse(a)+e).to_s
=> "http://newurl.com"
Just taking floor's answer and providing the helper:
# Use with the same arguments as image_tag. Returns the same, except including
# a full path in the src URL. Useful for templates that will be rendered into
# emails etc.
def absolute_image_tag(*args)
raw(image_tag(*args).sub /src="(.*?)"/, "src=\"#{request.protocol}#{request.host_with_port}" + '\1"')
end
There's quite a bunch of answers here. However, I didn't like any of them since all of them rely on me to remember to explicitly add the port, protocol etc. I find this to be the most elegant way of doing this:
full_url = URI( root_url )
full_url.path = Image.first.image.url
# Or maybe you want a link to some asset, like I did:
# full_url.path = image_path("whatevar.jpg")
full_url.to_s
And what is the best thing about it is that we can easily change just one thing and no matter what thing that might be you always do it the same way. Say if you wanted to drop the protocol and and use the The Protocol-relative URL, do this before the final conversion to string.
full_url.scheme = nil
Yay, now I have a way of converting my asset image urls to protocol relative urls that I can use on a code snippet that others might want to add on their site and they'll work regardless of the protocol they use on their site (providing that your site supports either protocol).
I used default_url_options, because request is not available in mailer and avoided duplicating hostname in config.action_controller.asset_host if haven't specified it before.
config.asset_host = ActionDispatch::Http::URL.url_for(ActionMailer::Base.default_url_options)
You can't refer to request object in an email, so how about:
def image_url(*args)
raw(image_tag(*args).sub /src="(.*?)"/, "src=\"//#{ActionMailer::Base.default_url_options[:protocol]}#{ActionMailer::Base.default_url_options[:host]}" + '\1"')
end
You can actually easily get this done by
root_url[0..-2] + image.url
I agree it doesn't look too good, but gets the job done.. :)
I found this trick to avoid double slash:
URI.join(root_url, image.url)

Resources