Download file from Slack in Ruby - ruby-on-rails

I'm trying to work with the Slack API in Ruby. They have this snippet as an example on their site
def fetch_and_compose_image(file, channel)
filename = file.timestamp
if file.filetype == "jpg"
File.open("./tmp/#{filename}", 'wb') do |f|
f << fetch_image(file.url_private)
end
fd = FaceDetection.new
if fd.process_image
file_id = upload(file, channel)
add_reactions(file_id, fd)
end
end
end
What I don't understand is, how are they adding the fetched image to 'f', and then somehow uploading the file with the variable 'file'. Where does 'f' come into play?

If you are talking about this block
File.open("./tmp/#{filename}", 'wb') do |f|
f << fetch_image(file.url_private)
end
then it is writing the file in binary mode(that's what wb is doing there) , then it is writing the content to file and then closing the file , same can be achieved with
to_write_file = File.open("./tmp/#{filename}", 'wb')
to_write_file << fetch_image(file.url_private)
to_write_file.close
but the first method is good way of defining it.

So, here is the fetch_image method
def fetch_image(url)
res = RestClient.get(url, { "Authorization" => "Bearer #{#team.access_token}" })
if res.code == 200
return res.body
else
raise 'Download failed'
end
end
Which is needed in your controller/model.
This is a rough example, I admit, but basically, the file download is the first part of this script. You want to use this part:
filename = file.timestamp
if file.filetype == "jpg"
File.open("./tmp/#{filename}", 'wb') do |f|
f << fetch_image(file.url_private)
end
end
You can then do something with f which is the file you downloaded. You can also use
file_path = open(file.url_private).path
to download the file.
In the provided example, they seem to use a model called FaceDetection and upload the file file to preform other tasks.
I hope this makes sense and helps.

Related

Rails: Helper method behaving differently between console and application

I am trying to write a helper method that can download a CSV file from S3 storage, read the first few rows of the file and then save those first few rows to a new local file.
All is working well when I include the helper in the rails console and call the methods on the object, but when calling it in exactly the same way through the controller, the local file contains all of the rows from the S3 file, rather than just the first few.
My code, in the helper file (I've replaced AWS credentials with comments for the purpose of posting the question):
def download_file(data_source)
s3 = Aws::S3::Client.new(#API keys etc.)
File.open(data_source.file.data['id'], 'wb') do |file|
reap = s3.get_object({ bucket:#Bucket Name, key: 'store/' + data_source.file.data['id'] }, target: file)
end
end
def reduce_csv(filename)
data = CSV.open(filename, 'r') { |csv| csv.first(3) }
csv_string = CSV.generate do |csv|
data.each do |d|
csv << d
end
end
File.open('test.csv', 'wb') do |file|
file << csv_string
end
end
def make_small_data_source(data_source)
download_file(data_source)
reduce_csv(data_source.file.data['id'])
end
And in the controller:
if #data_source.save
make_small_data_source(#data_source)
Any ideas would be much appreciated!

Check if SVG exists in Rails

I have the following code as a helper that I want to use to check if an image exists and if so then return it's raw SVG data:
def svg(name: 'default')
file = asset_path('images/' + name + '.svg')
if( File.file?(file) )
file = File.open(file, 'rb')
contents = file.read
file.close
contents.html_safe
end
end
However the file always comes back false... (the svg exists!)
Is the way I'm getting the file incorrect?

How to handle a file_as_string (generated by Prawn) so that it is accepted by Carrierwave?

I'm using Prawn to generate a PDF from the controller of a Rails app,
...
respond_to do |format|
format.pdf do
pdf = GenerateReportPdf.new(#object, view_context)
send_data pdf.render, filename: "Report", type: "application/pdf", disposition: "inline"
end
end
This works fine, but I now want to move GenerateReportPdf into a background task, and pass the resulting object to Carrierwave to upload directly to S3.
The worker looks like this
def perform
pdf = GenerateReportPdf.new(#object)
fileString = ???????
document = Document.new(
object_id: #object.id,
file: fileString )
# file is field used by Carrierwave
end
How do I handle the object returned by Prawn (?????) to ensure it is a format that can be read by Carrierwave.
fileString = pdf.render_file 'filename' writes the object to the root directory of the app. As I'm on Heroku this is not possible.
file = pdf.render returns ArgumentError: string contains null byte
fileString = StringIO.new( pdf.render_file 'filename' ) returns TypeError: no implicit conversion of nil into String
fileString = StringIO.new( pdf.render ) returns ActiveRecord::RecordInvalid: Validation failed: File You are not allowed to upload nil files, allowed types: jpg, jpeg, gif, png, pdf, doc, docx, xls, xlsx
fileString = File.open( pdf.render ) returns ArgumentError: string contains null byte
....and so on.
What am I missing? StringIO.new( pdf.render ) seems like it should work, but I'm unclear why its generating this error.
It turns out StringIO.new( pdf.render ) should indeed work.
The problem I was having was that the filename was being set incorrectly and, despite following the advise below on Carrierwave's wiki, a bug elsewhere in the code meant that the filename was returning as an empty string. I'd overlooked this an assumed that something else was needed
https://github.com/carrierwaveuploader/carrierwave/wiki/How-to:-Upload-from-a-string-in-Rails-3
my code ended up looking like this
def perform
s = StringIO.new(pdf.render)
def s.original_filename; "my file name"; end
document = Document.new(
object_id: #object.id
)
document.file = s
document.save!
end
You want to create a tempfile (which is fine on Heroku as long as you don't expect it to persist across requests).
def perform
# Create instance of your Carrierwave Uploader
uploader = MyUploader.new
# Generate your PDF
pdf = GenerateReportPdf.new(#object)
# Create a tempfile
tmpfile = Tempfile.new("my_filename")
# set to binary mode to avoid UTF-8 conversion errors
tmpfile.binmode
# Use render to write the file contents
tmpfile.write pdf.render
# Upload the tempfile with your Carrierwave uploader
uploader.store! tmpfile
# Close the tempfile and delete it
tmpfile.close
tmpfile.unlink
end
Here's a way you can use StringIO like Andy Harvey mentioned, but without adding a method to the StringIO intstance's eigenclass.
class VirtualFile < StringIO
attr_accessor :original_filename
def initialize(string, original_filename)
#original_filename = original_filename
super(string)
end
end
def perform
pdf_string = GenerateReportPdf.new(#object)
file = VirtualFile.new(pdf_string, 'filename.pdf')
document = Document.new(object_id: #object.id, file: file)
end
This one took me couple of days, the key is to call render_file controlling the filepath so you can keep track of the file, something like this:
in one of my Models e.g.: Policy i have a list of documents and this is just the method for updating the model connected with the carrierwave e.g.:PolicyDocument < ApplicationRecord mount_uploader :pdf_file, PdfDocumentUploader
def upload_pdf_document_file_to_s3_bucket(document_type, filepath)
policy_document = self.policy_documents.where(policy_document_type: document_type)
.where(status: 'processing')
.where(pdf_file: nil).last
policy_document.pdf_file = File.open(file_path, "r")
policy_document.status = 's3_uploaded'
policy_document.save(validate:false)
policy_document
rescue => e
policy_document.status = 's3_uploaded_failed'
policy_document.save(validate:false)
Rails.logger.error "Error uploading policy documents: #{e.inspect}"
end
end
in one of my Prawn PDF File Generators e.g.: PolicyPdfDocumentX in here please note how im rendering the file and returning the filepath so i can grab from the worker object itself
def generate_prawn_pdf_document
Prawn::Document.new do |pdf|
pdf.draw_text "Hello World PDF File", size: 8, at: [370, 462]
pdf.start_new_page
pdf.image Rails.root.join('app', 'assets', 'images', 'hello-world.png'), width: 550
end
end
def generate_tmp_file(filename)
file_path = File.join(Rails.root, "tmp/pdfs", filename)
self.generate_prawn_pdf_document.render_file(file_path)
return filepath
end
in the "global" Worker for creating files and uploading them in the s3 bucket e.g.: PolicyDocumentGeneratorWorker
def perform(filename, document_type, policy)
#here we create the instance of the prawn pdf generator class
pdf_generator_class = document_type.constantize.new
#here we are creating the file, but also `returning the filepath`
file_path = pdf_generator_class.generate_tmp_file(filename)
#here we are simply updating the model with the new file created
policy.upload_pdf_document_file_to_s3_bucket(document_type, file_path)
end
finally how to test, run rails c and:
the_policy = Policies.where....
PolicyDocumentGeneratorWorker.new.perform('report_x.pdf', 'PolicyPdfDocumentX',the_policy)
NOTE: im using meta-programming in case we have multiple and different file generators, constantize.new is just creating new prawn pdf doc generator instance so is similar to PolicyPdfDocument.new that way we can only have one pdf doc generator worker class that can handle all of your prawn pdf documents so for instance if you need a new document you can simply PolicyDocumentGeneratorWorker.new.perform('report_y.pdf', 'PolicyPdfDocumentY',the_policy)
:D
hope this helps someone to save some time

Carrierwave: Process Temp file and then upload via fog

I am processing a pdf uploaded by an user by extracting the text from it and saving the output in an text file for processing later.
Locally I store the pdf in my public folder but when I work on Heroku I need to use S3.
I thought that the pdf path was the problem, so I included
if Rails.env.test? || Rails.env.cucumber?
But still I receive
ArgumentError (input must be an IO-like object or a filename):
Is there a way of temporarily storing the pdf in my root/tmp folder on Heroku, get the text from it, and then after that is done, upload the document to S3?
def convert_pdf
if Rails.env.test? || Rails.env.cucumber?
pdf_dest = File.join(Rails.root, "public", #application.document_url)
else
pdf_dest = #application.document_url
end
txt_file_dest = Rails.root + 'tmp/pdf-parser/text'
document_file_name = /\/uploads\/application\/document\/\d{1,}\/(?<file_name>.*).pdf/.match(#application.document_url)[:file_name]
PDF::Reader.open(pdf_dest) do |reader|
File.open(File.join(txt_file_dest, document_file_name + '.txt'), 'w+') do |f|
reader.pages.each do |page|
f.puts page.text
end
end
end
end
You're going to want to set up a custom processor in your uploader. And on top of that, since the output file (.txt) isn't going to have the same extension as the input file (.pdf), you're going to want to change the filename. The following belongs in your Uploader:
process :convert_to_text
def convert_to_text
temp_dir = Rails.root.join('tmp', 'pdf-parser', 'text')
temp_path = temp_dir.join(filename)
FileUtils.mkdir_p(temp_dir)
PDF::Reader.open(current_path) do |pdf|
File.open(temp_path, 'w') do |f|
pdf.pages.each do |page|
f.puts page.text
end
end
end
File.unlink(current_path)
FileUtils.cp(temp_path, current_path)
end
def filename
super + '.txt' if original_filename.present?
end
I haven't run this code, so there are probably some bugs, but that should give you the idea at least.

How to edit docx with nokogiri and rubyzip

I'm using a combination of rubyzip and nokogiri to edit a .docx file. I'm using rubyzip to unzip the .docx file and then using nokogiri to parse and change the body of the word/document.xml file but ever time I close rubyzip at the end it corrupts the file and I can't open it or repair it. I unzip the .docx file on desktop and check the word/document.xml file and the content is updated to what I changed it to but all the other files are messed up. Could someone help me with this issue? Here is my code:
require 'rubygems'
require 'zip/zip'
require 'nokogiri'
zip = Zip::ZipFile.open("test.docx")
doc = zip.find_entry("word/document.xml")
xml = Nokogiri::XML.parse(doc.get_input_stream)
wt = xml.root.xpath("//w:t", {"w" => "http://schemas.openxmlformats.org/wordprocessingml/2006/main"}).first
wt.content = "New Text"
zip.get_output_stream("word/document.xml") {|f| f << xml.to_s}
zip.close
I ran into the same corruption problem with rubyzip last night. I solved it by copying everything to a new zip file, replacing files as necessary.
Here's my working proof of concept:
#!/usr/bin/env ruby
require 'rubygems'
require 'zip/zip' # rubyzip gem
require 'nokogiri'
class WordXmlFile
def self.open(path, &block)
self.new(path, &block)
end
def initialize(path, &block)
#replace = {}
if block_given?
#zip = Zip::ZipFile.open(path)
yield(self)
#zip.close
else
#zip = Zip::ZipFile.open(path)
end
end
def merge(rec)
xml = #zip.read("word/document.xml")
doc = Nokogiri::XML(xml) {|x| x.noent}
(doc/"//w:fldSimple").each do |field|
if field.attributes['instr'].value =~ /MERGEFIELD (\S+)/
text_node = (field/".//w:t").first
if text_node
text_node.inner_html = rec[$1].to_s
else
puts "No text node for #{$1}"
end
end
end
#replace["word/document.xml"] = doc.serialize :save_with => 0
end
def save(path)
Zip::ZipFile.open(path, Zip::ZipFile::CREATE) do |out|
#zip.each do |entry|
out.get_output_stream(entry.name) do |o|
if #replace[entry.name]
o.write(#replace[entry.name])
else
o.write(#zip.read(entry.name))
end
end
end
end
#zip.close
end
end
if __FILE__ == $0
file = ARGV[0]
out_file = ARGV[1] || file.sub(/\.docx/, ' Merged.docx')
w = WordXmlFile.open(file)
w.force_settings
w.merge('First_Name' => 'Eric', 'Last_Name' => 'Mason')
w.save(out_file)
end
I stumbled accross the post and know nothing about ruby or nokogiri but ...
It looks like you are reziping the new content incorrectly.
I don't know about rubyzip, but you need a way to tell it to update the entry word/document.xml
and then resave/rezip the file.
It looks like you are just overwriting the entry with new data wich of course is going to be a different size and totally screw up the rest of the zip file.
I give an example for excel in this post Parse text file and create an excel report
which may be of use even though i am using a different zip library and VB (Im still doing exactly what you are trying to do, my code is about half way down)
here is the part that applies
Using z As ZipFile = ZipFile.Read(xlStream.BaseStream)
'Grab Sheet 1 out of the file parts and read it into a string.
Dim myEntry As ZipEntry = z("xl/worksheets/sheet1.xml")
Dim msSheet1 As New MemoryStream
myEntry.Extract(msSheet1)
msSheet1.Position = 0
Dim sr As New StreamReader(msSheet1)
Dim strXMLData As String = sr.ReadToEnd
'Grab the data in the empty sheet and swap out the data that I want
Dim str2 As XElement = CreateSheetData(tbl)
Dim strReplace As String = strXMLData.Replace("<sheetData/>", str2.ToString)
z.UpdateEntry("xl/worksheets/sheet1.xml", strReplace)
'This just rezips the file with the new data it doesnt save to disk
z.Save(fiRet.FullName)
End Using
According to the official Github documentation, you should Use write_buffer instead open. There's also a code example at the link.

Resources