prawn pdf group, transaction and rollback method problems - ruby-on-rails

I'm trying to create a pdf report using prawn in a rails application. There are lots of sections that contain user generated content that I want to try and group together. Sometimes this will go over more that one page which results in a cannot group error. I then tried to use a transaction so that in the event of an error I can rollback and then output the content without using the group method.
The problem is the rollback stuffs up the pages. It removes the extra page from the pdf but still has the wrong page count and outputs over lapping content when I try to redo it. I reset the y position after the rollback, as per the prawn documentation but I still get the problems.
eg. The following test code writes 2 pages of numbers, does a rollback to the start and then tries to write the same numbers again. It results in a single page pdf with the second page of numbers overlapping the first and a page count of 2. The page counts at the bottom of the page also overlap one another even though I'm using the prawn number_pages method
class TestReport < Prawn::Document
def to_pdf
font('Helvetica')
bounding_box([bounds.left, bounds.top - 50], :width => bounds.width, :height => bounds.height - 100) do
text 'begin'
y_pos = y
transaction do
begin
group do
64.times do|i|
text i.to_s
end
end
rescue
rollback
end
end
self.y = y_pos
64.times do|i|
text i.to_s
end
text 'end'
text page_number.to_s
end
page_numbers(1)
#render
end
def page_numbers(start)
string = "page <page> of <total>"
options = { :at => [bounds.right - 150, 40],
:width => 150,
:align => :right,
:start_count_at => start,
:color => "000000" }
number_pages string, options
end
end
def test_report
pdf = TestReport.new()
pdf.to_pdf
send_data pdf.render, filename: "test.pdf",
type: "application/pdf",
disposition: "inline"
end
The problems seem to be with transaction rollbacks. The main thing I want is to be able to use the group method. Is there another way?
Is my code wrong? Am I missing something or do transaction not currently work.
I'm currently using the master prawn branch in a ruby on rails application ( gem 'prawn', :git =>
'git://github.com/prawnpdf/prawn.git', :branch => 'master').

This question is quite old now, but I'll post an answer since it is one of the first hits on Google when searching for the exception.
Transactions still doesnt work with page breaks (v 1.0.0.rc2), so I created a helper method that tries to use grouping first and then if the exception occurs it just retries without grouping, making the content span more than one page.
def group_if_possible(pdf, &block)
begin
pdf.group { block.call }
rescue Prawn::Errors::CannotGroup
block.call
end
end
Example: Using it when creating a table:
group_if_possible(pdf) do
pdf.table(rows)
end
EDIT:
Grouping were removed from Prawn 1.x but there is an unofficial grouping gem that works well for Prawn 2:
https://github.com/ddengler/prawn-grouping

Looks like Brad Ediger answered your question on Google Groups, but for the benefit of anyone else looking for help with this, here's his response:
Sadly, transactions do not yet work correctly when they start new
pages or change the pages collection. It's a known issue:
https://github.com/prawnpdf/prawn/issues/268
-be

Related

Validate PDF is stampable - Rails, Prawn, CombinePDF

I'm working at a company where we upload a good amount of PDFs, which we later stamp using Prawn. Occasionally these PDFs upload and save fine, but when we try to stamp them later they don't work and our managers have to re-convert the file, and re-input a bunch of data.
As such we're looking for ways to validate the PDFs before they're attached to ensure they're going to be stampable later, or convert them to a PDF format we know is going to work with Prawn.
I have two questions
is there anything wrong with our stamping code? (posted below)
is there any way to do that sort of validation? including
converting to a Prawn doc before uploading
converting to a Prawn doc and attempting some trivial operation before uploading
any other solutions
begin
paid_stamp_pdf_file = Tempfile.new('paid')
Prawn::Document.generate(paid_stamp_pdf_file.path) do |pdf|
if self.is_paid_by_trust? && self.submitted_to_trust_date.present?
text = "Submitted to Trust - " + self.submitted_to_trust_date.strftime('%m/%d/%Y') + "\nPAID #{Date.parse(paid_on_date).strftime('%m/%d/%Y')}" + " - $#{'%.2f' % amount}" + payment_method_text
else
text = "PAID #{Date.parse(paid_on_date).strftime('%m/%d/%Y')}" + " - $#{'%.2f' % amount}" + payment_method_text
end
pdf.transparent(0.6) do
pdf.fill_color "ff0000"
pdf.text text, :size => 30, style: :bold, align: :center, valign: :center
end
end
# Stamp "PAID" to every page of the file
paid_stamp = CombinePDF.load(paid_stamp_pdf_file.path).pages[0]
URI.open(self.account_statement_file.blob.url) do |tmp_pdf_file|
pdf = CombinePDF.load tmp_pdf_file.path
pdf.pages.each {|page| page << paid_stamp}
ActiveRecord::Base.transaction do
if pdf.save tmp_pdf_file.path
file_name = self.account_statement_file.filename
self.account_statement_file.purge
self.account_statement_file.attach(io: File.open(tmp_pdf_file.path), filename: file_name, content_type: 'application/pdf')
self.update(is_paid: true, paid_date: paid_on_date, marked_paid_by_user_id: user.id)
return true
else
return false
end
end
end
rescue Exception => e
Rails.logger.error("Failed to mark statement ID #{self.id}: #{e.message}")
return false
end
Any help is greatly appreciated!
ruby 2.7.2
rails 6.1.1
prawn 2.4.0
combine_pdf 1.0.21
Edit:
Was able to replicated error, trying to load from file url
Occurs at line
Same error occurs when trying to parse downloaded file
For anyone else who sees this it was related to CombinePDF only parsing until it reaches what the metadata says the length, but some files lie about that so it causes them to fail and produce a RangeError: index out of range. Adding this work around, then using the relaxed option it adds fixed the issues for me, hopefully it merges into the gem itself soon.
https://github.com/boazsegev/combine_pdf/issues/191

Rails "Data too long for column" error after truncation

I have an app that creates text excerpts from body text entered into the model and it seems to work fine except, for some reason, when i try to enter one particular string into the body text.
in my blog_post model I have
t.string "excerpt", limit: 114
in my controller i am creating the excerpt string by doing this:
def create
#blogpost = Blogpost.new(blogpost_params)
#excerpt = #blogpost.body
#blogpost.excerpt = truncate(#excerpt, :length => 114)
respond_to do |format|
if #blogpost.save
etc,etc,
end
This seems to work fine most of the time but i entered the following text as a test
You know how they used to say It's #Sinatra's world, the rest of us are just living in it. - well, it's as true today as it was then. Check out Frank. When he gets out of a #chopper, dressed in a perfect lounge suit, #cocktail in hand, his #hat stays perfectly tilted. When I get out of a #chopper (and I'm not talking about once or twice but every single time I ever get out of a chopper) the spinning blades blow my hat all over the place. #Milliners should think about that and you should too the next time you're out hat shopping.
(sorry it's a bit long) I get the following error:
ActiveRecord::StatementInvalid in MoansController#create
Mysql2::Error: Data too long for column 'excerpt' at row 1....
It looks like the truncate isn't working for some reason.. Is it something to do with this text or have i missed something else?
I think you should remove the database restriction and handle this by using a setter that truncates to the wanted length by default. In you model add excerpt_setter to the attr_accessible list. And then define it like this
def excerpt_setter=(str)
self.excerpt = truncate(str, :length => 114)
end
def excerpt_setter
self.excerpt
end
And then in the controller
def create
#blogpost = Blogpost.new(blogpost_params)
#blogpost.excerpt_setter = truncate(#excerpt.body, :length => 114)
respond_to do |format|
if #blogpost.save
etc,etc,
end
Another thing: You can also define a excerpt method in your model and drop the field if there isnt any good reason to store a part of the body in another field.
include ActionView::Helpers::TextHelper # this is needed to make the truncate method avaiable in model
...
...
...
def excerpt
truncate(self.body, :length => 114)
end
If you dont need the data stored in the database for performence reasons this whould be my prefered solution.

Unit Testing Tire (Elastic Search) - Filtering Results with Method from to_indexed_json

I am testing my Tire / ElasticSearch queries and am having a problem with a custom method I'm including in to_indexed_json. For some reason, it doesn't look like it's getting indexed properly - or at least I cannot filter with it.
In my development environment, my filters and facets work fine and I am get the expected results. However in my tests, I continuously see zero results.. I cannot figure out where I'm going wrong.
I have the following:
def to_indexed_json
to_json methods: [:user_tags, :location_users]
end
For which my user_tags method looks as follows:
def user_tags
tags.map(&:content) if tags.present?
end
Tags is a polymorphic relationship with my user model:
has_many :tags, :as => :tagable
My search block looks like this:
def self.online_sales(params)
s = Tire.search('users') { query { string '*' }}
filter = []
filter << { :range => { :created_at => { :from => params[:start], :to => params[:end] } } }
filter << { :terms => { :user_tags => ['online'] }}
s.facet('online_sales') do
date :created_at, interval: 'day'
facet_filter :and, filter
end
end
end
I have checked the user_tags are included using User.last.to_indexed_json:
{"id":2,"username":"testusername", ... "user_tags":["online"] }
In my development environment, if I run the following query, I get a per day list of online sales for my users:
#sales = User.online_sales(start_date: Date.today - 100.days).results.facets["online_sales"]
"_type"=>"date_histogram", "entries"=>[{"time"=>1350950400000, "count"=>1, "min"=>6.0, "max"=>6.0, "total"=>6.0, "total_count"=>1, "mean"=>6.0}, {"time"=>1361836800000, "count"=>7, "min"=>3.0, "max"=>9.0, "total"=>39.0, "total_count"=>7, "mean"=>#<BigDecimal:7fabc07348f8,'0.5571428571 428571E1',27(27)>}....
In my unit tests, I get zero results unless I remove the facet filter..
{"online_sales"=>{"_type"=>"date_histogram", "entries"=>[]}}
My test looks like this:
it "should test the online sales facets", focus: true do
User.index.delete
User.create_elasticsearch_index
user = User.create(username: 'testusername', value: 'pass', location_id: #location.id)
user.tags.create content: 'online'
user.tags.first.content.should eq 'online'
user.index.refresh
ws = User.online_sales(start: (Date.today - 10.days), :end => Date.today)
puts ws.results.facets["online_sales"]
end
Is there something I'm missing, doing wrong or have just misunderstood to get this to pass? Thanks in advance.
-- EDIT --
It appears to be something to do with the tags relationship. I have another method, ** location_users ** which is a has_many through relationship. This is updated on index using:
def location_users
location.users.map(&:id)
end
I can see an array of location_users in the results when searching. Doesn't make sense to me why the other polymorphic relationship wouldn't work..
-- EDIT 2 --
I have fixed this by putting this in my test:
User.index.import User.all
sleep 1
Which is silly. And, I don't really understand why this works. Why?!
Elastic search by default updates it's indexes once per second.
This is a performance thing because committing your changes to Lucene (which ES uses under the hood) can be quite an expensive operation.
If you need it to update immediately include refresh=true in the URL when inserting documents. You normally don't want this since committing every time when inserting lots of documents is expensive, but unit testing is one of those cases where you do want to use it.
From the documentation:
refresh
To refresh the index immediately after the operation occurs, so that the document appears in search results immediately, the refresh parameter can be set to true. Setting this option to true should ONLY be done after careful thought and verification that it does not lead to poor performance, both from an indexing and a search standpoint. Note, getting a document using the get API is completely realtime.

Generate CSV file from rails

I've been reading similar questions, but many of the answers are outdated or not clear enough for me.
I'd like to be able to just do something like (in a controller action):
respond_to do |format|
format.html
format.csv
end
I know I'd then need a view such as action.csv.erb
So my questions are:
1) What do I need to configure in rails to allow this to happen in general.
2) How should I setup the CSV view to display some basic fields from a model?
UPDATE:
So I've tried to go the route of comma, I installed and vendored the gem.
Then according to the read me, I threw this into my model (customized to my needs):
comma do
user_id 'User'
created_at 'Date'
name 'Name'
end
I then threw this in the control for the index action (according to the readme):
format.csv { render :csv => MyModel.limited(50) }
Then when accessing the index (not in CSV format) I receive the following ActionController Exception error:
undefined method `comma' for
So then I googled that, and I read that I should put require 'comma' in my model.
After doing that, I refreshed (my local index page), and the error changed to:
no such file to load -- comma
So at this point I decided it must not be finding the comma files obviously. So I copied the files from the vendored gem folder of comma, from comma's lib folder, to the rails lib folder. I then refreshed the page and landed on this error:
uninitialized constant Error
Then I pretty much gave up.
The errors from the trace were:
/Users/elliot/.gem/ruby/1.8/gems/activesupport-2.3.5/lib/active_support/dependencies.rb:443:in
load_missing_constant'
/Users/elliot/.gem/ruby/1.8/gems/activesupport-2.3.5/lib/active_support/dependencies.rb:80:in
const_missing'
/Users/elliot/.gem/ruby/1.8/gems/activesupport-2.3.5/lib/active_support/dependencies.rb:92:in
`const_missing'
Other notes, I have already installed FasterCSV
Hope thats enough info :)
I suggest taking a look at comma. It works very well and allows you to handle stuff at the model level, as opposed to the view level.
Have a look at FasterCSV.
csv_string = FasterCSV.generate do |csv|
cols = ["column one", "column two", "column three"]
csv << cols
#entries.each do |entry|
csv << [entry.column_one, entry.column_two, entry.column_three ]
end
filename = "data-#{Time.now.to_date.to_s}.csv"
end
send_data(csv_string, :type => 'text/csv; charset=utf-8; header=present', :filename => filename)
This is terrible, but the CSV library (in 1.9, == FasterCSV) won't play nice with meta_where, so I did it this way:
#customers.collect {|c| lines.push ["#{c.lastname}","#{c.firstname}","#{c.id}","#{c.type}"}
lines = lines.collect {|line| line.join(',')}
csv_string = lines.join("\n")
respond_to do |format|
format.html
format.csv { send_data(csv_string, :filename => "#{#plan.name.camelize}.csv", :type => "text/csv") }
end
It's ugly, but effective.
Take a look at CSV Shaper.
https://github.com/paulspringett/csv_shaper
It has a nice DSL and works really well with Rails models.

Logging Search Results in a Rails Application

We're interested in logging and computing the number of times an item comes up in search or on a list page. With 50k unique visitors a day, we're expecting we could produce 3-4 million 'impressions' per day, which isn't a terribly high amount, but one we'd like to architect well.
We don't need to read this data in real time, but would like to be able to generate daily totals and analyze trends, etc. Similar to a business analytics tool.
We're planning to do this with an Ajax post after the page is rendered - this will allow us to count results even if those results are cached. We can do this in a single post per page, to send a comma delimited list of ids and their positions on the page.
I am hoping there is some sort of design pattern/gem/blog post about this that would help me avoid the common first-timer mistakes that may come up. I also don't really have much experience logging or reading logs.
My current strategy - make something to write events to a log file, and a background job to tally up the results at the end of the day and put the results back into mysql.
Ok, I have three approaches for you:
1) Queues
In your AJAX Handler, write the simplest method possible (use a Rack Middleware or Rails Metal) to push the query params to a queue. Then, poll the queue and gather the messages.
Queue pushes from a rack middleware are blindingly fast. We use this on a very high traffic site for logging of similar data.
An example rack middleware is below (extracted from our app, can handle request in <2ms or so:
class TrackingMiddleware
CACHE_BUSTER = {"Cache-Control" => "no-cache, no-store, max-age=0, must-revalidate", "Pragma" => "no-cache", "Expires" => "Fri, 29 Aug 1997 02:14:00 EST"}
IMAGE_RESPONSE_HEADERS = CACHE_BUSTER.merge("Content-Type" => "image/gif").freeze
IMAGE_RESPONSE_BODY = [File.open(Rails.root + "public/images/tracker.gif").read].freeze
def initialize(app)
#app = app
end
def call(env)
if env["PATH_INFO"] =~ %r{^/track.gif}
request = Rack::Request.new(env)
YOUR_QUEUE.push([Time.now, request.GET.symbolize_keys])
[200, IMAGE_RESPONSE_BODY, IMAGE_RESPONSE_HEADERS]
else
#app.call(env)
end
end
end
For the queue I'd recommend starling, I've had nothing but good times with it.
On the parsing end, I would use the super-poller toolkit, but I would say that, I wrote it.
2) Logs
Pass all the params along as query params to a static file (/1x1.gif?foo=1&bar=2&baz=3).
This will not hit the rails stack and will be blindingly fast.
When you need the data, just parse the log files!
This is the best scaling home brew approach.
3) Google Analytics
Why handle the load when google will do it for you? You would be surprised at how good google analytics is, before you home brew anything, check it out!
This will scale infinitely, because google buys servers faster than you do.
I could rant on this for ages, but I have to go now. Hope this helps!
Depending no the action required to list items, you might be able to do it in the controller and save yourself a round trip. You can do it with an after_filter, to make the addition unobtrusive.
This only works if all actions that list items you want to log, require parameters. This is because page caching ignores GET requests with parameters.
Assuming you only want to log search data on the search action.
class ItemsController < ApplicationController
after_filter :log_searches, :only => :search
def log_searches
#items.each do |item|
# write to log here
end
end
...
# rest of controller remains unchanged
...
end
Otherwise you're right on track with the AJAX, and an onload remote function.
As for processing the you could use a rake task run by a cron job to collect statistics, and possibly update items for a popularity rating.
Either way you will want to read up on the Ruby Logging class. Learning about cron jobs and rake tasks wouldn't hurt either.
This is what I ultimately did - it was enough for our use for now, and with some simple benchmarking, I feel OK about it. We'll be watching to see how it does in production before we expose the results to our customers.
The components:
class EventsController < ApplicationController
def create
logger = Logger.new("#{RAILS_ROOT}/log/impressions/#{Date.today}.log")
logger.info "#{DateTime.now.strftime} #{params[:ids]}" unless params[:ids].blank?
render :nothing => true
end
end
This is called from an ajax call in the site layout...
<% javascript_tag do %>
var list = '';
$$('div.item').each(function(item) { list += item.id + ','; });
<%= remote_function(:url => { :controller => :events, :action => :create}, :with => "'ids=' + list" ) %>
<% end %>
Then I made a rake task to import these rows of comma delimited ids into the db. This is run the following day:
desc "Calculate impressions"
task :count_impressions => :environment do
date = ENV['DATE'] || (Date.today - 1).to_s # defaults to yesterday (yyyy-mm-dd)
file = File.new("log/impressions/#{date}.log", "r")
item_impressions = {}
while (line = file.gets)
ids_string = line.split(' ')[1]
next unless ids_string
ids = ids_string.split(',')
ids.each {|i| item_impressions[i] ||= 0; item_impressions[i] += 1 }
end
item_impressions.keys.each do |id|
ActiveRecord::Base.connection.execute "insert into item_stats(item_id, impression_count, collected_on) values('#{id}',#{item_impressions[id]},'#{date}')", 'Insert Item Stats'
end
file.close
end
One thing to note - the logger variable is declared in the controller action - not in environment.rb as you would normally do with a logger. I benchmarked this - 10000 writes took about 20 seconds. Averaging about 2 milliseconds a write. With the file name in the envirnment.rb, it took about 14 seconds. We made this trade-off so we could dynamically determine the file name - an easy way to switch files at midnight.
Our main concern at this point - we have no idea how many different items will be counted per day - ie. we don't know how long the tail is. This will determine how many rows are added to the db each day. We expect we'll need to limit how far back we keep daily reports and will role up results even further at that point.

Resources