Writing temp files to Heroku from an S3 hosted file - ruby-on-rails

I have a rails app hosted on Heroku. Here's the situation: a user should be able to upload a PDF (an instance of Batch) to our app using s3; a user should also be able to take the s3 web address of the uploaded PDF and split it up into more PDFs using HyPDF by specifying the file path and the desired pages to be split out (to create instances of Essay).
All of this is happening in the same POST request to /essays.
Here's the code I've been working with today:
def create
if params[:essay].class == String
batch_id = params[:batch_id].gsub(/[^\d]/, '').to_i
break_up_batch(params, batch_id)
redirect_to Batch.find(batch_id), notice: 'Essays were successfully created.'
else
#essay = Essay.new(essay_params)
respond_to do |format|
if #essay.save
format.html { redirect_to #essay, notice: 'Essay was successfully created.' }
format.json { render :show, status: :created, location: #essay }
else
format.html { render :new }
format.json { render json: #essay.errors, status: :unprocessable_entity }
end
end
end
end
# this is a private method
def break_up_batch(params, batch_id)
essay_data = []
# create a seperate essay for each grouped essay
local_batch = File.open(Rails.root.join('tmp').to_s + "temppdf.pdf" , 'wb') do |f|
f.binmode
f.write HTTParty.get(Batch.find(batch_id).document.url).parsed_response
f.path
end
params["essay"].split("~").each do |data|
data = data.split(" ")
hypdf_url = HyPDF.pdfextract(
local_batch,
first_page: data[1].to_i,
last_page: data[2].to_i,
bucket: 'essay101',
public: true
)
object = {student_name: data[0], batch_id: batch_id, url: hypdf_url[:url]}
essay_data << object
end
essay_data.each {|essay| Essay.create(essay)}
File.delete(local_batch)
end
I can't get the file to show up on Heroku, and I'm checking with heroku run bash and ls tmp. So when the method is run, a blank file is uploaded to S3. I've written some jQuery to populate a hidden field which is why there's the funky splitting in the middle of the code.

Because of Heroku's ephemeral filesystem, I'd highly recommend getting that file off your filesystem as fast as possible. Perhaps using the following:
User uploads to S3 (preferably direct: https://devcenter.heroku.com/articles/direct-to-s3-image-uploads-in-rails)
Kick off a background worker to fetch the file and do the processing necessary in-memory
If the user needs to be informed when the file is properly processed, set a "status" field in your DB and allow the front-end app to poll the web server for updates. Show "Processing" to the user until the background worker changes its status.
This method also allows your web process to respond quickly without tying up resources, and potentially triggering an H12 (request timeout) error.

Turns out using the File class wasn't the right way to go about it. But using Tempfile works!
def break_up_batch(params, batch_id, current_user)
essay_data = []
# create a seperate essay for each grouped essay
tempfile = Tempfile.new(['temppdf', '.pdf'], Rails.root.join('tmp'))
tempfile.binmode
tempfile.write HTTParty.get(Batch.find(batch_id).document.url).parsed_response
tempfile.close
save_path = tempfile.path
params["essay"].split("~").each do |data|
data = data.split(" ")
hypdf_url = HyPDF.pdfextract(
save_path,
first_page: data[1].to_i,
last_page: data[2].to_i,
bucket: 'essay101',
public: true
)
object = {student_name: data[0], batch_id: batch_id, url: hypdf_url[:url]}
essay_data << object
end
essay_data.each do |essay|
saved_essay = Essay.create(essay)
saved_essay.update_attributes(:company_id => current_user.company_id) if current_user.company_id
end
tempfile.unlink
end

Related

Redirect to another endpoint with large data - Rails/Ruby

I have a doubt about showing a generated CSV file to the user (with a large amount of data). So here is the task I have to do.
App: I have a film that has many characters.
Task:
allow users to upload characters via CSV (ok, done)
if there are errors, show them for each row (ok, done)
in the results page, also show a link to a new CSV file only with the remaining characters - the ones that couldn’t be created (I’m stuck here)
Here is part of my code (upload method):
def upload
saved_characters = []
characters_with_errors = []
errors = {}
begin
CSV.parse(params[:csv].read, **csv_options) do |row|
row_hash = clear_input(row.to_h)
new_character = Character.new(row_hash)
if new_character.save
add_images_to(new_character, row)
saved_characters << new_character
else
characters_with_errors << new_character
errors[new_character.name] = new_character.errors.full_messages.join(', ')
end
end
rescue CSV::MalformedCSVError => e
errors = { 'General error': e.message }.merge(errors)
end
#upload = {
errors: errors,
characters: saved_characters,
characters_with_errors: characters_with_errors
}
end
The issue: large amount of data
In the end, the upload.html.erb almost everything works fine, it shows the results and errors per column BUT I’m not sure how create a link on this page to send the user to the new CSV file (only with characters with errors). If the link sends the user to another method / GET endpoint (for the view with CSV format), how can I send such a large amount of data (params won’t work because they will get too long)? What would be the best practice here?
You can use a session variable to store the data, and then redirect to a new action to download the file. In the new action, you can get the data from the session variable, and then generate the CSV file.
For example, In the upload action, you can do something like this:
session[:characters_with_errors] = characters_with_errors
redirect_to download_csv_path
In the download_csv action, you can do something like this:
characters_with_errors = session[:characters_with_errors]
session[:characters_with_errors] = nil
respond_to do |format|
format.csv { send_data generate_csv(characters_with_errors) }
end
In the generate_csv method, you can do something like this:
def generate_csv(characters_with_errors)
CSV.generate do |csv|
csv << ['name', 'age' ]
characters_with_errors.each do |character|
csv << [character.name, character.age]
end
end
end
Another option, you can use a temporary file to store the data and then send the user to the new CSV file. Here is an example:
def upload
saved_characters = []
characters_with_errors = []
errors = {}
begin
CSV.parse(params[:csv].read, **csv_options) do |row|
row_hash = clear_input(row.to_h)
new_character = Character.new(row_hash)
if new_character.save
add_images_to(new_character, row)
saved_characters << new_character
else
characters_with_errors << new_character
errors[new_character.name] = new_character.errors.full_messages.join(', ')
end
end
rescue CSV::MalformedCSVError => e
errors = { 'General error': e.message }.merge(errors)
end
#upload = {
errors: errors,
characters: saved_characters,
characters_with_errors: characters_with_errors
}
respond_to do |format|
format.html
format.csv do
# Create a temporary file
tmp = Tempfile.new('characters_with_errors')
# Write the CSV data to the temporary file
tmp.write(characters_with_errors.to_csv)
# Send the user to the new CSV file
send_file tmp.path, filename: 'characters_with_errors.csv'
# Close the temporary file
tmp.close
end
end
end

aws-sdk for Ruby v2: check success status after I PUT object in S3 bucket

I'm using the aws-sdk v2 for Ruby and I find the methods available on objects really limiting. I created a bucket like so:
client = Aws::S3::Client.new(region: 'us-west-2')
s3 = Aws::S3::Resource.new(client: client)
S3_BUCKET = s3.bucket(ENV['AWS_BUCKET'])
I've found that the only methods available to write an object to my bucket is put. However, I don't see a 'success_action_status' available with this method. I've deployed my app to Elastic Beanstalk. Locally, I can write to this bucket, but when I try and write to my eb app, it's not working and I am working blind trying to figure out what's happening. Any info to help determine where my PUT request is going wrong would be helpful.
Here's what my method looks like now:
def create
username = params[:user][:user_alias]
key = "uploads/#{username}"
obj = S3_BUCKET.object(key)
obj.put({
acl: 'public-read',
body: params[:user][:image_uri],
})
#user = User.new(user_params)
if #user.save
render json: #user, status: :created, location: #user
else
render json: #user.errors, status: :unprocessable_entity
end
end
Here's the documentation I'm referring to for PUT methods: http://docs.aws.amazon.com/sdkforruby/api/Aws/S3/Object.html#put-instance_method
It doesn't seem like you're puting the image file, only the uri. The docs say this about the put method's body option:
body: source_file, # file/IO object, or string data
You'll have to read or open the image file in order to actually upload the image:
obj.put({
acl: 'public-read',
body: File.open(params[:user][:image_uri], ?r),
})
Then you can check the success status with exists?

Save docx render with a specific path

I want to save a docx that i create. I use the gem htmltoword.
The render is a function in the gem I think.
In my controller (but doesn't work) :
respond_to do |format|
format.html
format.docx do
#filepath = "#{Rails.root}/app/template/#{#cvmodif.nom}.docx"
render docx: 'show', filename: 'show.docx'
send_file(#filepath, :type => 'application/docx', :disposition => 'attachment')
end
end
I have a link. When i click on it, the docx is downloaded corectly. But i want to save it too in a custom path.
<%= link_to 'WORD', cv_path(#cvmodif, :format => 'docx') %>
How can I do that?
Both render and send_file do the same thing: generate a document and send it as an attachment.
If you want to save the document you have to do it manually before sending:
respond_to do |format|
format.docx do
# Generate the document
my_html = '<html><head></head><body><p>Hello</p></body></html>'
file_path = "test-#{Time.now.sec}.docx"
document = Htmltoword::Document.create(my_html)
# Save it in the custom file
File.open(file_path, "wb") do |out|
out << document
end
# Send the custom file
send_file(file_path, :type => 'application/docx', :disposition => 'attachment')
end
end
P.S. According to Htmltodoc source code in the version 0.4.4 there is a function create_and_save, but in the currently distributed gem this function is missing. If this scenario is often used in your application I'd recommend you to create a common method for this purposes.
UPDATE
Then there is no straightforward solution, because in this case sending of a file is a part of rendering process which is the last step of page's loading and runs deeply inside Htmltoword.
The most correct solution is to make this a Htmltoword's feature. (Create feature request or even implement it by yourself).
But for the moment you can take renderer of *.docx files from the library and add minimal changes to achieve your goals.
Create a file RailsApp/config/initializers/application_controller.rb.
Add this code of docx renderer taken from github
ActionController::Renderers.add :docx do |filename, options|
formats[0] = :docx unless formats.include?(:docx) || Rails.version < '3.2'
# This is ugly and should be solved with regular file utils
if options[:template] == action_name
if filename =~ %r{^([^\/]+)/(.+)$}
options[:prefixes] ||= []
options[:prefixes].unshift $1
options[:template] = $2
else
options[:template] = filename
end
end
# disposition / filename
disposition = options.delete(:disposition) || 'attachment'
if file_name = options.delete(:filename)
file_name += '.docx' unless file_name =~ /\.docx$/
else
file_name = "#{filename.gsub(/^.*\//, '')}.docx"
end
# other properties
save_to = options.delete(:save_to)
word_template = options.delete(:word_template) || nil
extras = options.delete(:extras) || false
# content will come from property content unless not specified
# then it will look for a template.
content = options.delete(:content) || render_to_string(options)
document = Htmltoword::Document.create(content, word_template, extras)
File.open(save_to, "wb") { |out| out << document } if save_to
send_data document, filename: file_name, type: Mime::DOCX, disposition: disposition
end
If you compare this file to the source one you'll find that I've added save_to option and when this option is set, renderer saves a document to the given location.
Usage in the controller:
format.docx do
render docx: 'my_view', filename: 'my_file.docx', save_to: "test-#{Time.now.sec}.docx"
end

No server response

I really can't find the solution for this. I'm trying to access a game API to fill platform (PS3 Xbox) information for the games I insert manually. All information about the game that I have (like the name of the game) is in strong params. So I use the name that I inserted manually to get platform using the API.
When I try the first example, things works just well, but the second example, I got no answer from the server. Nothing happens when I submit to the create action, not a single line on server logs :-/
The only difference between the examples is the way I save game's name to make the API request. I'm using Nokogiri Gem to parse API's XML and the code looks fine to me.
Can you help me?
Thanks,
Example 1 (works well):
def create
#game = Game.new(game_params)
firstUrl = "http://thegamesdb.net/api/GetGamesList.php?name=sonic"
# I'm hardcoding a game just for debug
gameList = Nokogiri::XML(open(firstUrl))
gameApiId = gameList.css("Game id").first.text
secondUrl = "http://thegamesdb.net/api/GetGame.php?id=" + gameApiId
gameInformation = Nokogiri::XML(open(secondUrl))
#game.platform = gameInformation.xpath("//Game//Platform").text
if #game.save
redirect_to #game, notice: 'Game was successfully created.'
else
render :new
end
end
private
def game_params
params.require(:game).permit(:name, :publisher, :year, :description, :image, levels_attributes: [:id, :name, :sort_order, :_destroy])
end
Example 2 (no response from server)
def create
#game = Game.new(game_params)
firstUrl = "http://thegamesdb.net/api/GetGamesList.php?name=" + #game.name.gsub(/\s+/, "")
# I'm hardcoding a game just for debug
gameList = Nokogiri::XML(open(firstUrl))
gameApiId = gameList.css("Game id").first.text
secondUrl = "http://thegamesdb.net/api/GetGame.php?id=" + gameApiId
gameInformation = Nokogiri::XML(open(secondUrl))
#game.platform = gameInformation.xpath("//Game//Platform").text
if #game.save
redirect_to #game, notice: 'Game was successfully created.'
else
render :new
end
end
private
def game_params
params.require(:game).permit(:name, :publisher, :year, :description, :image, levels_attributes: [:id, :name, :sort_order, :_destroy])
end
In summary, the difference is this:
firstUrl = "http://thegamesdb.net/api/GetGamesList.php?name=" + #game.name.gsub(/\s+/, "")
What you're going to get on here are tips on how to debug this issue. The problem is since you're pinging a third-party API, we have scant information on what the returned data is meant to be like, making any "solution" a case of stabbing in the dark
--
Debug
I would do several things:
Make sure this new link is accessible & returns data
Make sure the rest of the variables & data is intact
Ensure the third party API is able to provide the return you want
I suspect the issue will be with the way you're calling #game.name.gsub - have you confirmed this is indeed being called?
I just tested the url http://thegamesdb.net/api/GetGamesList.php?name=StarFox and it brought back data. Upon initial insepction, I would suspect the crux of the issue is with your calling of this url

Retrieve file and thumbnail url from AWS Elastic Transcoder job

I have a rails app which uploads videos to an AWS S3 bucket using their CORS configuration, when this is completed and the rails video object is created an Elastic Transcoder job is created to encode the video to .mp4 format and generate a thumbnail image, AWS SNS is enabled to send push notifications when the job is complete.
The process all works nicely and I receive a SNS notification when the upload is complete, however I can fetch the video url just fine but the notification only contains the thumbnail pattern rather than the actual filename.
Below is a typical notification I receive from AWS SNS. NB. This is from the outputs hash
{"id"=>"1", "presetId"=>"1351620000001-000040", "key"=>"uploads/video/150/557874e9-4c67-40f0-8f98-8c59506647e5/IMG_0587.mp4", "thumbnailPattern"=>"uploads/video/150/557874e9-4c67-40f0-8f98-8c59506647e5/{count}IMG_0587", "rotate"=>"auto", "status"=>"Complete", "statusDetail"=>"The transcoding job is completed.", "duration"=>10, "width"=>202, "height"=>360}
As you can see under thumbnailPattern is just the filepattern to use, and not the actual file created.
Does anyone know how I can get the URLS to the files created over elastic transcoder and SNS?
transcoder.rb # => I create a new transcoder object when a video has been saved
class Transcoder < Video
def initialize(video)
#video = video
#directory = "uploads/video/#{#video.id}/#{SecureRandom.uuid}/"
#filename = File.basename(#video.file, File.extname(#video.file))
end
def create
transcoder = AWS::ElasticTranscoder::Client.new(region: "us-east-1")
options = {
pipeline_id: CONFIG[:aws_pipeline_id],
input: {
key: #video.file.split("/")[3..-1].join("/"), # slice off the amazon.com bit
frame_rate: "auto",
resolution: 'auto',
aspect_ratio: 'auto',
interlaced: 'auto',
container: 'auto'
},
outputs: [
{
key: "#{#filename}.mp4",
preset_id: '1351620000001-000040',
rotate: "auto",
thumbnail_pattern: "{count}#{#filename}"
}
],
output_key_prefix: "#{#directory}"
}
job = transcoder.create_job(options)
#video.job_id = job.data[:job][:id]
#video.save!
end
end
VideosController #create
class VideosController < ApplicationController
def create
#video = current_user.videos.build(params[:video])
respond_to do |format|
if #video.save
transcode = Transcoder.new(#video)
transcode.create
format.html { redirect_to videos_path, notice: 'Video was successfully uploaded.' }
format.json { render json: #video, status: :created, location: #video }
format.js
else
format.html { render action: "new" }
format.json { render json: #video.errors, status: :unprocessable_entity }
end
end
end
end
It doesn't appear that the actual name of the thumbnails are passed back, either from SNS notifications or from the request response upon creation of a job:
http://docs.aws.amazon.com/elastictranscoder/latest/developerguide/create-job.html#create-job-examples
http://docs.aws.amazon.com/elastictranscoder/latest/developerguide/notifications.html
Because the base path/name of your thumbnails is known, and the sequence number will always start at 00001, you can iterate from there to determine if/how many of the thumbnails exist upon job completion. Ensure you use HEAD requests against the objects in S3 to determine their presence; its about 10x cheaper than doing a LIST request.
It passed 4 years after last reply. New Cold War raised, there are a lot of political tensions but Amazon sill doesn't fixed this issue.
As workaround I found another solution: usually transcoded file (video/thumbnail) are placed into the new bucket. Or at least under some prefix. I created new S3 event for ObjectCreate(All) for target bucket and specified prefix and connected it to pre-created SNS topic. This topic pings my backend's endpoint twice - first time when video transcoded and second time - when thumbnail created. Using regexp it is quite easy to distinguish what is what.

Resources