Save large text file without using too much memory - ruby-on-rails

I have a model that creates a KML file. I treat that KML as a string and then forward that to mailer which then delivers it:
def write_kml(coords3d, time)
kml = String.new
kml << header
coords3d.each do |coords|
coordinates = String.new
coords.each do |coord|
lat = coord[0].to_f
lng = coord[1].to_f
coordinates << "#{lng}" + "," "#{lat}" + ",0 "
kml << polygon(coordinates)
end
end
kml << footer
kml
end
This gets used here:
CsvMailer.kml_send(kml,time, mode, email).deliver
Mailer:
def kml_send(kml, time, mode, email)
#time = (time / 60).to_i
#mode = mode
gen_time = Time.now
file_name = gen_time.strftime('%Y-%m-%d %H:%M:%S') + " #{#mode.to_s}" + " #{#time.to_s}(mins)"
attachments[file_name + '(KML).kml'] = { mime_type: 'text/kml', content: kml}
mail to: email, subject: ' KML Filem'
end
It takes up a huge amount of memory. Some of these files are quite large (200MB), so on Heroku for example, they take up too much space.
I had some ideas using S3, but I would need to create this file to begin with, so it would still use the memory. Can I write straight to S3 without using memory?

You can do this with s3 multipart uploads, since they don't require you to know the file size upfront.
Parts have to be at least 5MB in size, so the easiest way to use this is to write your data to an in memory buffer and upload the part to s3 every time you get past 5MB. There's a limit of 10000 parts for an upload, so if your file size is going to be > 50GB then you'd need to know that ahead of time so that you can make the parts bigger.
Using the fog library, that would look a little like
def upload_chunk connection, upload_id, chunk, index
md5 = Base64.encode64(Digest::MD5.digest(chunk)).strip
connection.upload_part('bucket', 'a_key', upload_id, chunk_index, chunk, 'Content-MD5' => md5 )
end
connection = Fog::Storage::AWS.new(:aws_access_key_id => '...', :region => '...', :aws_secret_access_key => '...'
upload_id = connection.initiate_multipart_upload('bucket', 'a_key').body['UploadId']
chunk_index = 1
kml = String.new
kml << header
coords3d.each do |coords|
#append to kml
if kml.bytesize > 5 * 1024 * 1024
upload_chunk connection, upload_id, kml, chunk_index
chunk_index += 1
kml = ''
end
end
upload_chunk connection, upload_id, kml, chunk_index
#when you've uploaded all the chunks
connection.complete_multipart_upload('bucket', 'a_key', upload_id)
You could probably come up with something neater if you created an uploader class to wrap the buffer and stuck all the s3 logic in there. Then your kml code doesn't have to know whether it has an actual string or a string that flushes to s3 periodically

Related

Validate PDF is stampable - Rails, Prawn, CombinePDF

I'm working at a company where we upload a good amount of PDFs, which we later stamp using Prawn. Occasionally these PDFs upload and save fine, but when we try to stamp them later they don't work and our managers have to re-convert the file, and re-input a bunch of data.
As such we're looking for ways to validate the PDFs before they're attached to ensure they're going to be stampable later, or convert them to a PDF format we know is going to work with Prawn.
I have two questions
is there anything wrong with our stamping code? (posted below)
is there any way to do that sort of validation? including
converting to a Prawn doc before uploading
converting to a Prawn doc and attempting some trivial operation before uploading
any other solutions
begin
paid_stamp_pdf_file = Tempfile.new('paid')
Prawn::Document.generate(paid_stamp_pdf_file.path) do |pdf|
if self.is_paid_by_trust? && self.submitted_to_trust_date.present?
text = "Submitted to Trust - " + self.submitted_to_trust_date.strftime('%m/%d/%Y') + "\nPAID #{Date.parse(paid_on_date).strftime('%m/%d/%Y')}" + " - $#{'%.2f' % amount}" + payment_method_text
else
text = "PAID #{Date.parse(paid_on_date).strftime('%m/%d/%Y')}" + " - $#{'%.2f' % amount}" + payment_method_text
end
pdf.transparent(0.6) do
pdf.fill_color "ff0000"
pdf.text text, :size => 30, style: :bold, align: :center, valign: :center
end
end
# Stamp "PAID" to every page of the file
paid_stamp = CombinePDF.load(paid_stamp_pdf_file.path).pages[0]
URI.open(self.account_statement_file.blob.url) do |tmp_pdf_file|
pdf = CombinePDF.load tmp_pdf_file.path
pdf.pages.each {|page| page << paid_stamp}
ActiveRecord::Base.transaction do
if pdf.save tmp_pdf_file.path
file_name = self.account_statement_file.filename
self.account_statement_file.purge
self.account_statement_file.attach(io: File.open(tmp_pdf_file.path), filename: file_name, content_type: 'application/pdf')
self.update(is_paid: true, paid_date: paid_on_date, marked_paid_by_user_id: user.id)
return true
else
return false
end
end
end
rescue Exception => e
Rails.logger.error("Failed to mark statement ID #{self.id}: #{e.message}")
return false
end
Any help is greatly appreciated!
ruby 2.7.2
rails 6.1.1
prawn 2.4.0
combine_pdf 1.0.21
Edit:
Was able to replicated error, trying to load from file url
Occurs at line
Same error occurs when trying to parse downloaded file
For anyone else who sees this it was related to CombinePDF only parsing until it reaches what the metadata says the length, but some files lie about that so it causes them to fail and produce a RangeError: index out of range. Adding this work around, then using the relaxed option it adds fixed the issues for me, hopefully it merges into the gem itself soon.
https://github.com/boazsegev/combine_pdf/issues/191

Ruby on Rails : Add multiple signatures on a PDF

I would like to digital sign several times (twice would be fine) a PDF using Ruby on Rails.
I have tried using Origami Gem which works great for a single signature (thank you MrWater for your very helpful post: to Insert digital signature into existing pdf file)
However, I can't sign twice the document. When I do, using the same method, my pdf file displays a signature error (signature invalid).
Do you have any idea of how to make that work using Origami or any other Ruby Gem?
Thank you in advance for your help.
Update 2015-10-25:
You will find below my code to sign a document (maybe it can help to find a solution to the problem, and at least it shows you how to make a single signature, which works quite fine). In comment is my failing attempt for a double signature. I also tried to sign a first time doing the whole process, and sign a second time with the same process but without any success:
PDF_ORI = "contrat_old.pdf"
PDF_NEW = "contrat_new.pdf"
pdf = Origami::PDF.read(PDF_ORI)
# Open certificate files
key = OpenSSL::PKey::RSA.new 2048
cert = OpenSSL::X509::Certificate.new
cert.version = 2
cert.serial = 0
cert.not_before = Time.now
cert.not_after = Time.now + 10 * 365 * 24 * 60 * 60 # 10 years validity
cert.public_key = key.public_key
cert.issuer = OpenSSL::X509::Name.parse('CN=Test')
cert.subject = OpenSSL::X509::Name.parse('CN=test1 ESSAI1')
# Open certificate files
#key2 = OpenSSL::PKey::RSA.new 2048
#cert2 = OpenSSL::X509::Certificate.new
#cert2.version = 2
#cert2.serial = 0
#cert2.not_before = Time.now
#cert2.not_after = Time.now + 10 * 365 * 24 * 60 * 60 # 10 years validity
#cert2.public_key = key2.public_key
#cert2.issuer = OpenSSL::X509::Name.parse('CN=Test2')
#cert2.subject = OpenSSL::X509::Name.parse('CN=test2 ESSAI2')
sigannot = Origami::Annotation::Widget::Signature.new
sigannot.Rect = Origami::Rectangle[:llx => 89.0, :lly => 386.0, :urx => 190.0, :ury => 353.0]
pdf.get_page(1).add_annot(sigannot)
#sigannot2 = Origami::Annotation::Widget::Signature.new
#sigannot2.Rect = Origami::Rectangle[:llx => 89.0, :lly => 386.0, :urx => 190.0, :ury => 353.0]
#pdf.get_page(1).add_annot(sigannot2)
# Sign the PDF with the specified keys
pdf.sign(cert, key,
:method => 'adbe.pkcs7.sha1',
:annotation => sigannot,
:location => "France",
:contact => "tmp#security.org",
:reason => "Proof of Concept"
)
# Sign the PDF with the specified keys
#pdf.sign(cert2, key2,
# :method => 'adbe.pkcs7.sha1',
# :annotation => sigannot2,
# :location => "France",
# :contact => "tmp#security.org",
# :reason => "Proof of Concept"
#)
# Save the resulting file
pdf.save(PDF_NEW)
I know it is quite tricky, but no one can help me?
Or using another solution maybe?
you can use CombinePDF to add watermark to PDF i hve use it in the past and it works great:
To add content to existing PDF pages, first import the new content from an existing PDF file. After that, add the content to each of the pages in your existing PDF.
In this example, we will add a company logo to each page:
company_logo = CombinePDF.load("company_logo.pdf").pages[0]
pdf = CombinePDF.load "content_file.pdf"
pdf.pages.each {|page| page << company_logo}
# notice the << operator is on a page and not a PDF object.
pdf.save "content_with_logo.pdf"
Notice the << operator is on a page and not a PDF object. The << operator acts differently on PDF objects and on Pages.
The << operator defaults to secure injection by renaming references to avoid conflics. For overlaying pages using compressed data that might not be editable (due to limited filter support), you can use:
pdf.pages(nil, false).each {|page| page << stamp_page}
You can see more details here:
https://github.com/boazsegev/combine_pdf

Rails 3 ActionMailer sending 0 bytes pdf attachments

I have managed to send an email with pdf attachments that are stored on s3
def welcome_pack1(website_registration)
require 'open-uri'
#website_registration = website_registration
email_attachments = EmailAttachment.find(:all,:conditions=>{:goes_to_us=>true})
email_attachments.each do |a|
tempfile = File.new("#{Rails.root.to_s}/tmp/#{a.pdf_file_name}", "w")
tempfile << open(a.pdf.url)
tempfile.puts
attachments[a.pdf_file_name] = File.read("#{Rails.root.to_s}/tmp/#{a.pdf_file_name}")
end
mail(:to => website_registration.email, :subject => "Welcome")
end
The attachments are attached to the email. But they come through as 0 bytes. I was using the example posted here paperclip + ActionMailer - Adding an attachment?. Am i missing something?
File objects are buffered - until you close the file (which you're not) then the bytes you've written may not be on disk. A great way to not forget to call close is to use the block form:
File.open(path, 'w') do |f|
#do stuff to f
end #file closed for you when the block is exited.
I'm not sure why you're using a file at all though - why not do
attachments[a.pdf_file_name] = open(a.pdf.url)

Memory issue with huge CSV Export in Rails

I'm trying to export a large amount of data from a database to a csv file but it is taking a very long time and fear I'll have major memory issues.
Does anyone know of any better way to export a CSV without the memory build up? If so, can you show me how? Thanks.
Here's my controller:
def users_export
File.new("users_export.csv", "w") # creates new file to write to
#todays_date = Time.now.strftime("%m-%d-%Y")
#outfile = #todays_date + ".csv"
#users = User.select('id, login, email, last_login, created_at, updated_at')
FasterCSV.open("users_export.csv", "w+") do |csv|
csv << [ #todays_date ]
csv << [ "id","login","email","last_login", "created_at", "updated_at" ]
#users.find_each do |u|
csv << [ u.id, u.login, u.email, u.last_login, u.created_at, u.updated_at ]
end
end
send_file "users_export.csv",
:type => 'text/csv; charset=iso-8859-1; header=present',
:disposition => "attachment; filename=#{#outfile}"
end
You're building up one giant string so you have to keep the entire csv file in memory. You're also loading all of your users which will also sit on a bunch of memory. It won't make any difference if you only have a few hundred or a few thousand users but a some point you will probably need to do 2 things
Use
User.find_each do |user|
csv << [...]
end
This loads users in batches (1000 by default) rather than all of them.
You should also look at writing the csv to a file rather than buffering the entire thing in memory. Assuming you have created a temporary file,
FasterCSV.open('/path/to/file','w') do |csv|
...
end
Will write your csv to a file. You can then use send_file to send it. If you already have a file open, FasterCSV.new(io) should work too.
Lastly, on rails 3.1 and higher you might be able to stream the csv file as you create it, but that isn't something I've tried before.
Additionally to the tips on csv generation, be sure to optimize the call to the database also.
Select only the columns you need.
#users = User.select('id, login, email, last_login, created_at, updated_at').order('login')
#users.find_each do |user|
...
end
If you have for example 1000 users, and each have password, password_salt, city, country, ...
then several 1000 objects less are transfered from database, created as ruby objects and finally garbage collected.

Buffered/RingBuffer IO in Ruby + Amazon S3 non-blocking chunk reads

I have huge csv files (100MB+) on amazon s3 and I want to read them in chunks and process them using ruby CSV library. I'm having a hard time creating the right IO object for csv processing:
buffer = TheRightIOClass.new
bytes_received = 0
RightAws::S3Interface.new(<access_key>, <access_secret>).retrieve_object(bucket, key) do |chunk|
bytes_received += buffer.write(chunk)
if bytes_received >= 1*MEGABYTE
bytes_received = 0
csv(buffer).each do |row|
process_csv_record(row)
end
end
end
def csv(io)
#csv ||= CSV.new(io, headers: true)
end
I don't know what the right setup here should be and what the TheRightIOClass is. I don't want to load the entire file into memory with StringIO. Is there a bufferedio or ringbuffer in ruby to do this?
If anyone has a good solution using threads(no processes) and pipes I would love to see it.
You can use StringIO and do some clever Error handling to insure you have an entire row in a chunk before handling it. The packer class in this example just accumulates the parsed rows in memory until you flush them to disk or a database.
packer = Packer.new
object = AWS::S3.new.buckets[bucket].objects[path]
io = StringIO.new
csv = ::CSV.new(io, headers: true)
object.read do |chunk|
#Append the most recent chunk and rewind the IO
io << chunk
io.rewind
last_offset = 0
begin
while row = csv.shift do
#Store the parsed row unless we're at the end of a chunk
unless io.eof?
last_offset = io.pos
packer << row.to_hash
end
end
rescue ArgumentError, ::CSV::MalformedCSVError => e
#Only rescue malformed UTF-8 and CSV errors if we're at the end of chunk
raise e unless io.eof?
end
#Seek to our last offset, create a new StringIO with that partial row & advance the cursor
io.seek(last_offset)
io.reopen(io.read)
io.read
#Flush our accumulated rows to disk every 1 Meg
packer.flush if packer.bytes > 1*MEGABYTES
end
#Read the last row
io.rewind
packer << csv.shift.to_hash
packer

Resources