I have searched a lot. I have no choice unless asking this here. Do you guys know an online convertor which has API or Gem/s that can convert PDF to Excel or CSV file?
I am not sure if here is the best place to ask this either.
My application is in Rails 4.2.
PDF file has contains a header and a big table with about 10 columns.
More info:
User upload the PDF via a form then I need to grab the PDF parse it to CSV and read the content. I tried to read the content with PDF Reader Gem however the result wasn't really promising.
I have used: freepdfconvert.com/pdf-excel Unfortunately then don't supply API. (I have contacted them)
Sample PDF
This piece of code convert the PDF into the text which is handy.
Gem: pdf-reader
def self.parse
reader = PDF::Reader.new("pdf_uploaded_by_user.pdf")
reader.pages.each do |page|
puts page.text
end
end
Now if you check the sample attached PDF you will see some fields might be empty which it means I simply can't split the text line with space and put it in an array as I won't be able to map the array to the correct fields.
Thank you.
Ok, After lots of research I couldn't find an API or even a proper software that does it. Here how I did it.
I first extract the Table out of the PDF into the Table with this API pdftables. It is cheap.
Then I convert the HTML table to CSV.
(This is not ideal but it works)
Here is the code:
require 'httmultiparty'
class PageTextReceiver
include HTTMultiParty
base_uri 'http://localhost:3000'
def run
response = PageTextReceiver.post('https://pdftables.com/api?key=myapikey', :query => { f: File.new("/path/to/pdf/uploaded_pdf.pdf", "r") })
File.open('/path/to/save/as/html/response.html', 'w') do |f|
f.puts response
end
end
def convert
f = File.open("/path/to/saved/html/response.html")
doc = Nokogiri::HTML(f)
csv = CSV.open("path/to/csv/t.csv", 'w',{:col_sep => ",", :quote_char => '\'', :force_quotes => true})
doc.xpath('//table/tr').each do |row|
tarray = []
row.xpath('td').each do |cell|
tarray << cell.text
end
csv << tarray
end
csv.close
end
end
Now Run it like this:
#> page = PageTextReceiver.new
#> page.run
#> page.convert
It is not refactored. Just proof of concept. You need to consider performance.
I might use the gem Sidekiq to run it in background and move the result to the main thread.
Check Tabula-Extractor project and also check how it is used in projects like NYPD Moving Summonses Parser and CompStat criminal complaints parser.
Ryan Bates covers csv exports in his rails casts > http://railscasts.com/episodes/362-exporting-csv-and-excel this might give you some pointers.
Edit: as you now mention you need the raw data from an uploaded PDF, you could use JavaScript to read the PDF file and the populate the data into Ryan Bates' export method. Reading PDF's was covered excellently in the following question:
extract text from pdf in Javascript
I would imagine the flow would be something like:
PDF new action
user uploads PDF
PDF show action
PDF is displayed
JavaScript reads PDF
JavaScript populates Ryan's raw data
Raw data is exported with PDF data included
Related
My rails app contains code to handle large file uploads, which basically consists of splitting up the file in javascript and making a number of posts for each chunk to a route where they are then reconstructed back to the original file.
I'm trying to figure out how to write tests for this logic, as up until now I've simply used fixture_file_upload for posting files.
I basically need to split a given file up into a range of bytes, and post that in a way that my route would handle it just as though it has been posted by my javascript.
Anyone know of a way to accomplish this in a rails test?
You could just create multiple fixture files (e.g. file.part1.txt, file.part2.txt, etc.) , upload all the parts and then check that they get concatenated together.
For example, if there are 10 fixture files:
(1..10).each do |part_no|
fixture_name = "file.part#{part_no}.txt"
fixture_file = fixture_file_upload("/files/#{fixture_name}", "text/plain")
post :part_upload, :part => fixture_file
end
# code to check result here
In my application I wish to allow the user to upload a CSV file and then be presented with a view of their data mapped to my columns so that the user can confirm their data is correct. Ideally allowing them to edit incorrect data.
Are there any existing solutions to this via a gem, any other standard solution or any resources that might help with what I want to achieve.
Help very much appreciated.
you can do something like:
require 'csv'
file_content = File.read(params[:file].tempfile.path)
csv = CSV.parse(file_content, :headers => true)
File.unlink(params[:file].tempfile.path)
depends on your params passed to the controller, but CVS can parse a file which is usually written to a tmp dir if uploaded, presentation of the result is up to your view layer
I'm trying to save the results of a survey to a csv file, so every time the survey is completed it adds a new line to the file. I have code that exports database rows to a csv and lets you download it, but i don't know how to incorporate saving the survey to begin with, or if this is even possible? I have a csv file set up with the correct headers.
When your create function is called (the action in controller where form’s submit is directed to; create on REST controllers), you can just add some custom logic to there to convert the data from form into csv structure you want.
Ruby has CSV module builtin, which can be used to both read and write CSV files.
So you want something like following
require "csv"
CSV.open "output.csv", "a+" do |csv|
# example logic from another script how to populate the file
times.each do |key, value|
csv << [ key, value ]
end
end
You just need to define structure of rows how you want, this example throws two columns per row.
EDIT: a+ makes file to be written from the end (new rows) rather than original w+ that truncates the files.
A possible solution could be to use a logger. In your application controller:
def surveys
##surveys_log ||= Logger.new("#{Rails.root}/log/surveys.log")
end
Anywhere where you would like to log the survey:
surveys.info #survey.to_csv # you'll need to implement "to_csv" yourself
Which will result in a surveys.log in your log/ folder.
I have done a lot of searching and cannot find a solution for getting PDF-stamper to work in my rails application. From the tutorials it appears that I write a method in the model? I wrote a simple app with two fields: nameLast and nameFirst. All I want to do is write these to a PDF I have that contains fields for user info. Two field happen to be FirstName and LastName so perfect time to use PDF-stamper right? I just want to take user data from the rails application and have then be able to push a button and generate a PDF. Here is the method I have in my model.
def savePDF
pdf = PDF::Stamper.new("sample.pdf")
pdf.text :nameFirst, "Jason"
pdf.text :nameLast, "Yates"
pdf.save_as "my_output.pdf"
end
That was clearly taken from a tutorial that I must not properly understand. I can actually get this working in java pretty easy, but I don't want to use jRuby. I am using rjb which is working fine. I just don't think I properly understand what needs to happen to get this working. Any help is greatly appreciated!
I'm the author of the pdf-stamper gem.
The save_as method saves the created PDF to the filesystem. If you are building a Rails application, I don't think that is what you want.
I'm guessing from your question you want to send a "stamped" PDF back to the browser. If that is the case, you should call to_s on the created PDF and then pass the output of that to Rails send_data method.
In your controller(not the model) you'll want to add some code like this.
def send
pdf = PDF::Stamper.new("sample.pdf")
pdf.text :nameFirst, "Jason"
pdf.text :nameLast, "Yates"
send_data(pdf.to_s, :filename => "output.pdf", :type => "application/pdf",:disposition => "inline")
end
The problem here really is the documentation for the pdf-stamper gem. The feature you wanted was there just undocumented, hence your confusion. I'll have to fix that.
i was doing the same with use of xfdf as a source data for fields, the following code worked for me, maybe it will be helpful to you aswell:
pdfreader = Rjb::import('com.itextpdf.text.pdf.PdfReader')
pdfstamper = Rjb::import('com.itextpdf.text.pdf.PdfStamper')
pdffields = Rjb::import('com.itextpdf.text.pdf.AcroFields')
xfdfreader = Rjb::import('com.itextpdf.text.pdf.XfdfReader')
pdf = pdfreader.new("#{Rails.root}/public/out/temp/form1.pdf", nil)
xfdf = xfdfreader.new(f)
stamp = pdfstamper.new(pdf, filestream.new("/tmp/text#{i}.pdf"))
pdffields = stamp.getAcroFields()
pdffields.setFields(xfdf)
stamp.close
I'm trying to export data from my models to an excel spreadsheet. I have seen 3 ways
Using the spreadsheet gem which I didn't understand how to use it,
the examples I saw was writing to a local file but I'm looking to
generate a file every time user clicks on a link.
Creating a method called export, and running the query there, then
making a export.xls file in my view, and that file creating the
table I want to be exported to the excel file, but this approach
don't allow me to create multiple sheets.
Followed this tutorial, http://oldwiki.rubyonrails.org/rails/pages/HowToExportToExcel,
but here doesn't show how to put the link in the view, looks to me that I'm missing something in the routes, I can give github so you can take a look at my code if needed.
My choice is to just manualy generate CSV file. Like:
File.new("data.csv", "w+") do |f|
#my_data.each do |data|
f << [data.title, data.body, ...].join(", ") + "\n"
end
end
CSV file can be opened with excel or any other spreadsheet soft.
I'm using writeexcel in my most recent Rails project. A fast and simple to use way to export excel files directly - no CSV!
To use it directly in your views you have to register writeexcel as a template handler - this is excalty what my gist does. Then create a new template like export.xls.writeexcel, insert your code and you're good to go.
Plugging my own gem here, but you might have a look at https://github.com/randym/acts_as_xlsx
It gives you a bit more than writeexcel or spreadsheet in terms of localization, graphs, tables and formatting from the axlsx gem.
It also integrated with active record scoping and method chains.
Blogpost with detailed usage examples:
http://axlsx.blogspot.com/
http://axlsx.blogspot.jp/2011/12/using-actsasxlsx-to-generate-excel-data.html
http://axlsx.blogspot.jp/2011/12/axlsx-making-excel-reports-with-ruby-on.html
On Github: https://github.com/randym/axlsx
On Rubygems: https://rubygems.org/gems/axlsx
On Rubytookbox: https://www.ruby-toolbox.com/projects/axlsx
Basically it involves setting up a responder in your controller
format.xlsx {
xlsx_package = Post.to_xlsx
begin
temp = Tempfile.new("posts.xlsx")
xlsx_package.serialize temp.path
send_file temp.path, :filename => "posts.xlsx", :type => "application/xlsx"
ensure
temp.close
temp.unlink
end
}
and the following on your model
class Post < ActiveRecord::Base
acts_as_xlsx
The two blog posts above give a fairly clear walk-through.