I've gone through the play ground demos and I am trying to build one prototype for the Letter of Credit use case and was wondering how attachments can be handled within Assets and/or transactions.
We don't yet have a great solution for this, but are interested in requirements and use cases. You could model the attached as a URL and a hash (2 strings in your model) and host the attachments on an external secure document store.
You submit a transaction which carries the URL and the hash of the document to Composer, which updates the asset with the attachment. When the attachment for an asset is required you can query the asset via the REST API, retrieve the linked document from the secure doc store and then compare its hash with the hash on the asset.
In my application, I have a textarea input where users can type a note.
When they click Save, there is an AJAX call to Web Api that saves the note to the database.
I would like for users to be able to attach multiple files to this note (Gmail style) before saving the Note. It would be nice if the upload could start as soon as attached, before saving the note.
What is the best strategy for this?
P.S. I can't use jQuery fineuploader plugin or anything like that because I need to give the files unique names on the server before uploading them to Azure.
Is what I'm trying to do possible, or do I have to make the whole 'Note' a normal form post instead of an API call?
This approach is file-based, but you can apply the same logic to Azure Blob Storage containers if you wish.
What I normally do is give the user a unique GUID when they GET the AddNote page. I create a folder called:
Then any files the user uploads at this stage get assigned to this folder:
When the user does a POST and I have confirmed that all validation has passed, I simply copy the files to the completed folder, with the newly generated note ID:
Then delete the C:\TemporaryUploads\UNIQUE-USER-GUID folder
Cleaning Up
Now. That's all well and good for users who actually go ahead and save a note, but what about the ones who uploaded a file and closed the browser? There are two options at this stage:
Have a background service clean up these files on a scheduled basis. Daily, weekly, etc. This should be a job for Azure's Web Jobs
Clean up the old files via the web app each time a new note is saved. Not a great approach as you're doing File IO when there are potentially no files to delete
Building on RGraham's answer, here's another approach you could take:
Create a blob container for storing note attachments. Let's call it note-attachments.
When the user comes to the screen of creating a note, assign a GUID to the note.
When user uploads the file, you just prefix the file name with this note id. So if a user uploads a file say file1.txt, it gets saved into blob storage as note-attachments/{note id}/file1.txt.
Depending on your requirement, once you save the note, you may move this blob to another blob container or keep it here only. Since the blob has note id in its name, searching for attachments for a note is easy.
For uploading files, I would recommend doing it directly from the browser to blob storage making use of AJAX, CORS and Shared Access Signature. This way you will avoid data going through your servers. You may find these blog posts useful:
Revisiting Windows Azure Shared Access Signature
Windows Azure Storage and Cross-Origin Resource Sharing (CORS) – Lets Have Some Fun
I believe this question is platform/technology independent, however I am using Ruby on Rails with the carrierwave gem.
Users upload documents to my site, and I need to keep them private. I am exploring the different options available to me, along with their advantages and disadvantages.
Option 1
Obfuscate urls to images to make them impossible to guess.
This would be relatively simple to implement and fast to serve up. However, if a url was made public by whatever means, security is lost.
Option 2
Have documents accessed through some sort of intermediate step that requires authentication. This would have improved security over option 1, but would place additional load on the server. A page containing previews of a number of uploaded documents would hammer the server.
Are there any other options available to me? Have I made any mistakes with my claims, or missed any important points?
I think the best option you have is to have a "key" for your documents. You can generate a key, with a certain lifetime, and when you go on /document/name/access_key, you find the record matching and return the file associated with the record. Never exposing the real URL.
I am working on a Rails web application, running on a Heroku stack, that handles looking after some documents that are attached to a Rails database object. i.e. suppose we have an object called product_i of class/table Product/products, and product_i_prospectus.pdf is the associated product prospectus, where each product has a single prospectus.
Since I am working on Heroku, and thus do not have root access, I plan to use Amazon S3 to store the static resource associated with product_i. So far, so good.
Now suppose that product_i_attributes.txt is also a file I want to upload, and indeed I want to actually fill out information in the product_i object (i.e. the row in the table corresponding to product_i), based on information in the file product_i_attributes.txt.
In a sentence: I want to create, or alter, database objects, based on the content of static text files uploaded to my S3 bucket.
I don't actually have to be able to access them once they are in the bucket strictly speaking, I just need to create some stuff out of a text file.
I have done something similar with csv files. I would not try to process the file directly at upload as it can be resource intensive.
My solution was to upload the file to s3 and then call a background job method(delayed_job, resque, etc.) that processed the csv after upload. You could then call a delete after the job processed to remove the file from s3 if you no longer needed it after processing.
For Heroku this will require that you add a worker (if you don't already have one) to process the background jobs that will process the text files.
Take a look at the aws-sdk-for-ruby gem. This will allow you to access your S3 bucket.
Im desiging an app which allows users to upload images (max 500k per image, roughly 20 images) from their hard drive to the site so as to be able to make some custom boardgames (e.g. snakes and ladders) in pdf formate. These will be created with prawn instantly and then made available for instant download.
Neither the images uploaded nor the pdfs created need to be saved on my apps side permanently. The moment the user downloads the pdf they are no longer needed.
Heroku doesn't support saving files to the system (it does allow to the tmp directory but says you shouldnt rely on it striking it out for me). I'm wondering what tools / services I should be looking into to get round this. Ive looked into paperclip, I'm wondering if this is right for this type of job.
Paperclip is on the right track, but the key insight is you need to use the S3 storage backend (Paperclip uses the FS by default which as you've noticed is no good on Heroku). It's pretty handy; instead of flushing writes out to the file system, it uses the AWS::S3 gem to upload them to S3. You can read more about it in the rdoc here: http://github.com/thoughtbot/paperclip/blob/master/lib/paperclip/storage/s3.rb
Here's how the flow would work:
I'd let your users upload their multiple source images. Here's an article on allowing multiple attachments to one model with paperclip: http://www.cordinc.com/blog/2009/04/multiple-attachments-with-vali.html.
Then when you're ready to generate the PDF (probably in a background job, right?), what you do is download all the source images to somewhere in tmp/ (make sure the directory is based on your model id or something so if two people do this at once, the files don't get stepped on). Once you've got all the images downloaded, you can generate your PDF. I know this is using the file system, but as long as you do all your filesystem interactions in one request or job cycle, it will work, your files will still be there. I use this method in a couple production web apps. You can't count on tmp/ being there between requests, but within one it's reliably there.
Storing your generated PDF on S3 with paperclip makes sense too, since then you can just hand your users the S3 URL. If you want you can make something to clear the files off every so often if you don't want to pay the S3 costs, but they should be trivial.
Paperclip sounds like an ideal candidate. It will save images in RAILS_ROOT/public/system/, which is both persistent and private (shouldn't be able to be enumerated on shared hosting).
You can configure it to produce thumbnails of your images if you wish.
And it can remove the images it manages when the associated model is destroyed - after your user downloads their PDF, and you delete the record from the database.
Prawn might not be appropriate, depending on the complexity of the PDFs you need to generate. If you have $$$, go for PrinceXML and the princely gem. I've had some success with wkhtmltopdf, which generates PDFs from a Webkit render of HTML/CSS - but it doesn't support any of the advanced page manipulation stuff that Prince does.
I decided to use Amazon S3 for document storage for an app I am creating. One issue I run into is while I need to upload the files to S3, I need to create a document object in my app so my users can perform CRUD actions.
One solution is to allow for a double upload. A user uploads a document to the server my Rails app lives on. I validate and create the object, then pass it on to S3. One issue with this is progress indicators become more complicated. Using most out-of-the-box plugins would show the client that file has finished uploading because it is on my server, but then there would be a decent delay when the file was going from my server to S3. This also introduces unnecessary bandwidth (at least it does not seem necessary)
The other solution I am thinking about is to upload the file directly to S3 with one AJAX request, and when that is successful, make a second AJAX request to store the object in my database. One issue here is that I would have to validate the file after it is uploaded which means I have to run some clean up code in S3 if the validation fails.
Both seem equally messy.
Does anyone have something more elegant working that they would not mind sharing? I would imagine this is a common situation with "cloud storage" being quite popular today. Maybe I am looking at this wrong.
Unless there's a particular reason not to use paperclip I'd highly recommend it. Used in conjunction with delayed job and delayed paperclip the user uploads the file to your server filesystem where you perform whatever validation you need. A delayed job then processes and stores it on s3. Really, really easy to set up and a better user experience.