Parse word docs heroku/s3 - ruby-on-rails

I want to implement a functionality that needs to parse word docs, which will uploaded by user and stored on amazon S3. The application will be on heroku. I tried catdoc but it doesn't parse urls. Can anyone suggest tool that can be used on heroku to parse word documents?
UPDATE
I want to scan an uploaded ms-word(.doc) has particular words and tag them accordingly.

If you're just wanting to upload the word document you could take a look at something like the paperclip gem.
This would allow you to save the file on amazon S3 and simply download it, but you could also extend paperclip and run post-processing on the file. This is slightly more complicated.
Like willglynn says, it would be good to know what parsing you need to do, exactly?

Related

configure paperclip to generate styles when they are requested

I need a user to upload an original file and process that into a thumbnail (paperclip's got this - check).
Then I would like to be able to retrieve different styles for that attachment, but I do not need to store those different styles on disk anywhere. I would prefer they are generated during the request.
The reason for that is that these styles are single-use. So, paperclip becomes a glorified one-time-use image resiszer. I'd prefer not to incur the S3 cost if I don't have to.
Wondering if there'a way to do this out of the box. Or, maybe carrierwave supports something like this?
Thanks!
I'm pretty sure that the Refile library supports it. Refile is basically a modern version of the Carrierwave, written by the same author.
You can read more about the on-the-fly processing here:
https://github.com/refile/refile#processing

Creating a table entry while using CarrierWaveDirect

Is it possible to use something like CarrierWaveDirect to upload directly to S3 and still be able to gather some data on the files being uploaded?
For instance, is it possible to change the filename, and save the size in a table before/after the upload? I don't need to do any kind of manipulation to the file either (I was reading some documentation regarding the use of Resque).
I realize that this is a very novice question but I couldn't find the answer anywhere.
After a lot of fiddling, I found out that the S3 secure upload form that the hidden field success_action_redirect returns so params on the uploaded file.

Creating a Rails Music Player App (something like Rdio)

i want to create a rails app that has a lot of mixtapes, which the user can listen to and download (like datpiff.com). All the mixtapes would be uploaded by me. Each mixtape would have their own page, with the title, artist name, cover, etc.
I'm having trouble getting the architecture of the app right. What's the best way to upload all the mixtapes. (I'm thinking something like Amazon S3).
Do I have to upload a zipped file with the entire mixtape, and each individual song, or just the zipped file.
How do i show the information of each song (title, length, etc)
Ofcourse the biggest problem is the streaming of the mixtape, and the download of the file.
Can anyone guide me as to whats the best way to create this app. (Is Rails the best way to do it?)
Thanks in advance.
You're on the right track with S3. Use paperclip in conjunction with it if you want to make some sort of GUI for you to upload stuff with.
For streaming check out jPlayer, which is a jQuery plugin.
Download's no biggie. Check out Rails' send file. For sending from a remote source like S3, look here.

Best way to download and process images during user data import feature?

I have a feature where users bulk import data via CSV. The CSV can have image URLs nested in it. How can I grab the images via the given URLs and post them to my site?
I'm using Ruby 1.9.2, Rails 3, Paperclip w/ S3 as my backend.
Thanks for reading and possibly helping!
I assume you already have all the logic to parse the CSV and get the URL (if not, look into either csv in the standard library or FasterCSV). To grab the file you could:
Use Net::HTTP (see this page)
Use open-uri (which I guess is really just a wrapper for Net::HTTP, but I prefer the syntax (see this question))
Call off to wget or curl to grab the file for you (see this question if calling command line programs from Ruby is a question)
After that, I suppose it's simply a matter of putting a tag on a page and pointing to wherever you saved the file.

rails 3: how to add (or perhaps substitute) a text string to an existing PDF document?

Users of our app need to print a PDF document we have pre-created, but have a placeholder string in the PDF template "YOUR_NAME_HERE" be replaced with their name. (Or, alternatively, we could no use a placeholder and add a new string with a certain font/style at a certain X,Y offset.)
Doing full PDF creation is overkill, since ALL we need to do is add their name to the PDF doc.
To make it more fun, we're hosted on Heroku which does not have local file storage, so we need to create the final PDF as something displayed in their browser that can (hopefully) be saved to local disk.
Does anyone know of a technique that would let us easily add (or replace) text to an existing PDF document?
I'm not finding anything for editing PDFs in ruby. I would just look into using something like prawn to generate them, even if that is a bit overkill when only a few words are different between each.
If efficiency is an issue, you could convert the pre-made part into a PNG and then just add the text on top. It feels dirty, but it'd probably be quicker than full generation and I don't know what other options you have, since it doesn't seem like anyone has implemented a true PDF editor in ruby yet.
As far as local storage, keep in mind that you do have write access to tmp/ on Heroku, so you can use that as long as you're only going to use the file during a single request.

Resources