In my application I wish to allow the user to upload a CSV file and then be presented with a view of their data mapped to my columns so that the user can confirm their data is correct. Ideally allowing them to edit incorrect data.
Are there any existing solutions to this via a gem, any other standard solution or any resources that might help with what I want to achieve.
Help very much appreciated.
you can do something like:
require 'csv'
file_content = File.read(params[:file].tempfile.path)
csv = CSV.parse(file_content, :headers => true)
File.unlink(params[:file].tempfile.path)
depends on your params passed to the controller, but CVS can parse a file which is usually written to a tmp dir if uploaded, presentation of the result is up to your view layer
Related
I have searched a lot. I have no choice unless asking this here. Do you guys know an online convertor which has API or Gem/s that can convert PDF to Excel or CSV file?
I am not sure if here is the best place to ask this either.
My application is in Rails 4.2.
PDF file has contains a header and a big table with about 10 columns.
More info:
User upload the PDF via a form then I need to grab the PDF parse it to CSV and read the content. I tried to read the content with PDF Reader Gem however the result wasn't really promising.
I have used: freepdfconvert.com/pdf-excel Unfortunately then don't supply API. (I have contacted them)
Sample PDF
This piece of code convert the PDF into the text which is handy.
Gem: pdf-reader
def self.parse
reader = PDF::Reader.new("pdf_uploaded_by_user.pdf")
reader.pages.each do |page|
puts page.text
end
end
Now if you check the sample attached PDF you will see some fields might be empty which it means I simply can't split the text line with space and put it in an array as I won't be able to map the array to the correct fields.
Thank you.
Ok, After lots of research I couldn't find an API or even a proper software that does it. Here how I did it.
I first extract the Table out of the PDF into the Table with this API pdftables. It is cheap.
Then I convert the HTML table to CSV.
(This is not ideal but it works)
Here is the code:
require 'httmultiparty'
class PageTextReceiver
include HTTMultiParty
base_uri 'http://localhost:3000'
def run
response = PageTextReceiver.post('https://pdftables.com/api?key=myapikey', :query => { f: File.new("/path/to/pdf/uploaded_pdf.pdf", "r") })
File.open('/path/to/save/as/html/response.html', 'w') do |f|
f.puts response
end
end
def convert
f = File.open("/path/to/saved/html/response.html")
doc = Nokogiri::HTML(f)
csv = CSV.open("path/to/csv/t.csv", 'w',{:col_sep => ",", :quote_char => '\'', :force_quotes => true})
doc.xpath('//table/tr').each do |row|
tarray = []
row.xpath('td').each do |cell|
tarray << cell.text
end
csv << tarray
end
csv.close
end
end
Now Run it like this:
#> page = PageTextReceiver.new
#> page.run
#> page.convert
It is not refactored. Just proof of concept. You need to consider performance.
I might use the gem Sidekiq to run it in background and move the result to the main thread.
Check Tabula-Extractor project and also check how it is used in projects like NYPD Moving Summonses Parser and CompStat criminal complaints parser.
Ryan Bates covers csv exports in his rails casts > http://railscasts.com/episodes/362-exporting-csv-and-excel this might give you some pointers.
Edit: as you now mention you need the raw data from an uploaded PDF, you could use JavaScript to read the PDF file and the populate the data into Ryan Bates' export method. Reading PDF's was covered excellently in the following question:
extract text from pdf in Javascript
I would imagine the flow would be something like:
PDF new action
user uploads PDF
PDF show action
PDF is displayed
JavaScript reads PDF
JavaScript populates Ryan's raw data
Raw data is exported with PDF data included
So I have this currency .xml file:
http://www.ecb.int/stats/eurofxref/eurofxref-daily.xml
Now, I am wondering, how can I make my rails application read it? Where do I even have to put it and how do I include it?
I am basically making a currency exchange rate calculator.
And I am going to make the dropdown menu have the currency names from the .xml table appear in it and be usable.
First of all you're going to have to be able to read the file--I assume you want the very latest from that site, so you'll be making an HTTP request (otherwise, just store the file anywhere in your app and read it with File.read with a relative path). Here I use Net::HTTP, but you could use HTTParty or whatever you prefer.
It looks like it changes on a daily basis, so maybe you'll only want to make one HTTP request every day and cache the file somewhere along with a timestamp.
Let's say you have a directory in your application called rates where we store the cached xml files, the heart of the functionality could look like this (kind of clunky but I want the behaviour to be obvious):
def get_rates
today_path = Rails.root.join 'rates', "#{Date.today.to_s}.xml"
xml_content = if File.exists? today_path
# Read it from local storage
File.read today_path
else
# Go get it and store it!
xml = Net::HTTP.get URI 'http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml'
File.write today_path, xml
xml
end
# Now convert that XML to a hash. Lots of ways to do this, but this is very simple xml.
currency_list = Hash.from_xml(xml_content)["Envelope"]["Cube"]["Cube"]["Cube"]
# Now currency_list is an Array of hashes e.g. [{"currency"=>"USD", "rate"=>"1.3784"}, ...]
# Let's say you want a single hash like "USD" => "1.3784", you could do a conversion like this
Hash[currency_list.map &:values]
end
The important part there is Hash.from_xml. Where you have XML that is essentially key/value pairs, this is your friend. For anything more complicated you will want to look for an XML library like Nokogiri. The ["Envelope"]["Cube"]["Cube"]["Cube"] is digging through the hash to get to the important part.
Now, you can see how sensitive this will be to any changes in the XML structure, and you should make the endpoint configurable, and that hash is probably small enough to cache up in memory, but this is the basic idea.
To get your list of currencies out of the hash just say get_rates.keys.
As long as you understand what's going on, you can make that smaller:
def get_rates
today_path = Rails.root.join 'rates', "#{Date.today.to_s}.xml"
Hash[Hash.from_xml(if File.exists? today_path
File.read today_path
else
xml = Net::HTTP.get URI 'http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml'
File.write today_path, xml
xml
end)["Envelope"]["Cube"]["Cube"]["Cube"].map &:values]
end
If you do choose to cache the xml you will probably want to automatically clear out old versions of the cached XML file, too. If you want to cache other conversion lists consider a naming scheme derived automatically from the URI, e.g. eurofxref-daily-2013-10-28.xml.
Edit: let's say you want to cache the converted xml in memory--why not!
module CurrencyRetrieval
def get_rates
if defined?(##rates_retrieved) && (##rates_retrieved == Date.today)
##rates
else
##rates_retrieved = Date.today
##rates = Hash[Hash.from_xml(Net::HTTP.get URI 'http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml')["Envelope"]["Cube"]["Cube"]["Cube"].map &:values]
end
end
end
Now just include CurrencyRetrieval wherever you need it and you're golden. ##rates and ##rates_retrieved will be stored as class variables in whatever class you include this module within. You must test that this persists between calls in your production setup (otherwise fall back to the file-based approach or store those values elsewhere).
Note, if the XML structure changes, or the XML is unavailable today, you'll want to invalidate ##rates and handle exceptions in some nice way...better safe than sorry.
My rails app contains code to handle large file uploads, which basically consists of splitting up the file in javascript and making a number of posts for each chunk to a route where they are then reconstructed back to the original file.
I'm trying to figure out how to write tests for this logic, as up until now I've simply used fixture_file_upload for posting files.
I basically need to split a given file up into a range of bytes, and post that in a way that my route would handle it just as though it has been posted by my javascript.
Anyone know of a way to accomplish this in a rails test?
You could just create multiple fixture files (e.g. file.part1.txt, file.part2.txt, etc.) , upload all the parts and then check that they get concatenated together.
For example, if there are 10 fixture files:
(1..10).each do |part_no|
fixture_name = "file.part#{part_no}.txt"
fixture_file = fixture_file_upload("/files/#{fixture_name}", "text/plain")
post :part_upload, :part => fixture_file
end
# code to check result here
I'm trying to save the results of a survey to a csv file, so every time the survey is completed it adds a new line to the file. I have code that exports database rows to a csv and lets you download it, but i don't know how to incorporate saving the survey to begin with, or if this is even possible? I have a csv file set up with the correct headers.
When your create function is called (the action in controller where form’s submit is directed to; create on REST controllers), you can just add some custom logic to there to convert the data from form into csv structure you want.
Ruby has CSV module builtin, which can be used to both read and write CSV files.
So you want something like following
require "csv"
CSV.open "output.csv", "a+" do |csv|
# example logic from another script how to populate the file
times.each do |key, value|
csv << [ key, value ]
end
end
You just need to define structure of rows how you want, this example throws two columns per row.
EDIT: a+ makes file to be written from the end (new rows) rather than original w+ that truncates the files.
A possible solution could be to use a logger. In your application controller:
def surveys
##surveys_log ||= Logger.new("#{Rails.root}/log/surveys.log")
end
Anywhere where you would like to log the survey:
surveys.info #survey.to_csv # you'll need to implement "to_csv" yourself
Which will result in a surveys.log in your log/ folder.
I'm trying to export data from my models to an excel spreadsheet. I have seen 3 ways
Using the spreadsheet gem which I didn't understand how to use it,
the examples I saw was writing to a local file but I'm looking to
generate a file every time user clicks on a link.
Creating a method called export, and running the query there, then
making a export.xls file in my view, and that file creating the
table I want to be exported to the excel file, but this approach
don't allow me to create multiple sheets.
Followed this tutorial, http://oldwiki.rubyonrails.org/rails/pages/HowToExportToExcel,
but here doesn't show how to put the link in the view, looks to me that I'm missing something in the routes, I can give github so you can take a look at my code if needed.
My choice is to just manualy generate CSV file. Like:
File.new("data.csv", "w+") do |f|
#my_data.each do |data|
f << [data.title, data.body, ...].join(", ") + "\n"
end
end
CSV file can be opened with excel or any other spreadsheet soft.
I'm using writeexcel in my most recent Rails project. A fast and simple to use way to export excel files directly - no CSV!
To use it directly in your views you have to register writeexcel as a template handler - this is excalty what my gist does. Then create a new template like export.xls.writeexcel, insert your code and you're good to go.
Plugging my own gem here, but you might have a look at https://github.com/randym/acts_as_xlsx
It gives you a bit more than writeexcel or spreadsheet in terms of localization, graphs, tables and formatting from the axlsx gem.
It also integrated with active record scoping and method chains.
Blogpost with detailed usage examples:
http://axlsx.blogspot.com/
http://axlsx.blogspot.jp/2011/12/using-actsasxlsx-to-generate-excel-data.html
http://axlsx.blogspot.jp/2011/12/axlsx-making-excel-reports-with-ruby-on.html
On Github: https://github.com/randym/axlsx
On Rubygems: https://rubygems.org/gems/axlsx
On Rubytookbox: https://www.ruby-toolbox.com/projects/axlsx
Basically it involves setting up a responder in your controller
format.xlsx {
xlsx_package = Post.to_xlsx
begin
temp = Tempfile.new("posts.xlsx")
xlsx_package.serialize temp.path
send_file temp.path, :filename => "posts.xlsx", :type => "application/xlsx"
ensure
temp.close
temp.unlink
end
}
and the following on your model
class Post < ActiveRecord::Base
acts_as_xlsx
The two blog posts above give a fairly clear walk-through.