Mvc azure storage, auto delete storage after certain time - asp.net-mvc

Im developing a azure website where users can upload blob and metadata. I want uploaded stuff too be deleted after some time.
The only way i can think off is going for a cloudapp instead of a website with a worker role that checks like every hour if the uploaded file has expired and continue and delete it. However im going for a simple website here without workerroles.
I have a function that checks if the uploaded item should be deleted and if the user do something on the page i can easily call this function, BUT.. If the user isnt doing anything and the time runs out it wont delete it because the user never calls the function.. The storage will never be deleted. How would you solve this?
Thanks

Too broad to give one right answer, as you can solve this in many ways. But... from an objective perspective because you're using Web Sites I do suggest you look at Web Jobs and see if this might be the right tool for you (as this gives you the ability to run periodic jobs without the bulk of extra VMs in web/worker configuration). You'll still need a way to manage your metadata to know what to delete.
Regarding other Azure-specific built-in mechanisms, you can also consider queuing delete messages, with an invisibility time equal to the time the content is to be available. After that time expires, the queue message becomes visible, and any queue consumer would then see the message and be able to act on it. This can be your Web Job (which has SDK support for queues) or really any other mechanism you build.
Again, a very broad question with no single right answer, so I'm just pointing out the Azure-specific mechanisms that could help solve this particular problem.

Like David said in his answer, there can be many solutions to your problem. One solution could be to rely on blob itself. In this approach you can periodically fetch the list of blobs in the blob container and decide if the blob should be removed or not. The periodic fetching could be done through a Azure WebJob (if application is deployed as a website) or through a Azure Worker Role. Worker role approach is independent of how your main application is deployed. It could be deployed as a cloud service or as a website.
With that, there are two possible approaches you can take:
Rely on Blob's Last Modified Date: Whenever a blob is updated, its Last Modified property gets updated. You can use that to identify if the blob should be deleted or not. This approach would work best if the uploaded blob is never modified.
Rely on Blob's custom metadata: Whenever a blob is uploaded, you could set the upload date/time in blob's metadata. When you fetch the list of blobs, you could compare the upload date/time metadata value with the current date/time and decide if the blob should be deleted or not.

Another approach might be to use the container name to be the "expiry date"
This might make deletion easier, as you then could just remove expired containers

Related

Notify users PDF creation is ready and can be downloaded

Users can create a PDF in my app which takes some time to generate, so it has to be done in a background job. No problem, but then there is a delay and the user must be notified that the PDF is ready.
So the first choice is to send an email with a download link or a push notification in the app itself. My preference is the push notification, so I guess ActionCable is the way to go? My app runs on Heroku, so is ActionCable also a good choice then or is another solution preferable?
Then there is another consideration, where to store the generated PDF until the user downloads it? I could upload it to Azure/S3/etc with ActiveStorage, or I could store it temporarily in an app folder and delete it after download. My preference is to do the last, because the PDF is there only for a few minutes and therefore the hassle to store it in the cloud is not really needed?
You have a very broad question here, which is very much dependent on the overall user needs and experience you want them to have.
I'll start with the simplest part, in terms of temporary storage of the PDF. There are several things to bear in mind here.
I would say that from a scalability, and application security standpoint, storing the PDF to the cloud is the way to go. Opening up writable directories on your application server carries a risk. Also, if you ever need to scale to more than one server, this will not work. Deleting items from cloud storage is not hard with the appropriate APIs.
Is it essential for the user to be authenticated in some way to download the PDF? This is more challenging if you push the PDF to a cloud bucket (unless you have the PDF named with a very complex, unguessable name, that name only accessed through the authenticated application). If the data is less sensitive, then your email notification can show the link directly, but you won't know easily if a user has retrieved the PDF and it is now ready to be deleted.
In terms of notification, I'd go with email for several reasons. Simplicity is the main one. Do you have experience with ActionCable? It appears simple on the surface, but there are many things to bear in mind when using it: infrastructure and UI being the major ones. Also, from a user experience perspective, are users likely to hang around in the application waiting for the PDF to be completed? What happens if they logout? How will they know the PDF is available?
If the timescale for generation of the PDF is short and absolutely optimized scalability is not a big deal, you could consider a simpler mechanism that checks for user notifications (a simple query onto a user_notifications table for example) for every user action, and use a flash or some other session flag that the UI can check and use to asynchronously retrieve the notification.
Just ideas. Impossible to give definitive answers.

Preventing Rails from connecting to database during initialization

I am quite new at Ruby/Rails. I am building a service that make an API available to users and ends up with some files created in the local filesystem, without any need to connect to any database. Then, once every few hours, I want to run a piece of ruby code that takes these local files, uploads them to Amazon S3 and registers their location into a Postgres database.
Right now both codes live together in the same project. I am observing that every time a user does something the system connects to the database. I have seen this answer which recommends to eliminate all traces of ActiveRecord in my code, but given that I want to have my background bookkeeping process connect to the database I am stuck on what to do.
Is it possible to define two different profiles (one with database and one without) and specify which profile a certain function call should run on? would this work?
I'm a bit confused by this, the db does not magically connect to the database for kicks on every request, it does so because of a specific request requires it. Generally through ActiveRecord but not exclusively
If your system is connecting every time you make a request, then that implies you have some sort of user metric or authorisation based code in there. Just killing off the database will cause this to fail, and likely you'll have to find it anyways, to then get your system to work. I'd advise locating it.
Things to look for are before_filters in controllers, or database session management, for example, or look for what is in the logs - the query should appear - and that will tell you what is being loaded, modified or whatnot.
It might even work to stop your database, just before doing a user activity, and see where the error leads you. Rinse and repeat until the user activity works, without the database.

Attaching/uploading files to not-yet-saved Note - what is best strategy for this?

In my application, I have a textarea input where users can type a note.
When they click Save, there is an AJAX call to Web Api that saves the note to the database.
I would like for users to be able to attach multiple files to this note (Gmail style) before saving the Note. It would be nice if the upload could start as soon as attached, before saving the note.
What is the best strategy for this?
P.S. I can't use jQuery fineuploader plugin or anything like that because I need to give the files unique names on the server before uploading them to Azure.
Is what I'm trying to do possible, or do I have to make the whole 'Note' a normal form post instead of an API call?
Thanks!
This approach is file-based, but you can apply the same logic to Azure Blob Storage containers if you wish.
What I normally do is give the user a unique GUID when they GET the AddNote page. I create a folder called:
C:\TemporaryUploads\UNIQUE-USER-GUID\
Then any files the user uploads at this stage get assigned to this folder:
C:\TemporaryUploads\UNIQUE-USER-GUID\file1.txt
C:\TemporaryUploads\UNIQUE-USER-GUID\file2.txt
C:\TemporaryUploads\UNIQUE-USER-GUID\file3.txt
When the user does a POST and I have confirmed that all validation has passed, I simply copy the files to the completed folder, with the newly generated note ID:
C:\NodeUploads\Note-100001\file1.txt
Then delete the C:\TemporaryUploads\UNIQUE-USER-GUID folder
Cleaning Up
Now. That's all well and good for users who actually go ahead and save a note, but what about the ones who uploaded a file and closed the browser? There are two options at this stage:
Have a background service clean up these files on a scheduled basis. Daily, weekly, etc. This should be a job for Azure's Web Jobs
Clean up the old files via the web app each time a new note is saved. Not a great approach as you're doing File IO when there are potentially no files to delete
Building on RGraham's answer, here's another approach you could take:
Create a blob container for storing note attachments. Let's call it note-attachments.
When the user comes to the screen of creating a note, assign a GUID to the note.
When user uploads the file, you just prefix the file name with this note id. So if a user uploads a file say file1.txt, it gets saved into blob storage as note-attachments/{note id}/file1.txt.
Depending on your requirement, once you save the note, you may move this blob to another blob container or keep it here only. Since the blob has note id in its name, searching for attachments for a note is easy.
For uploading files, I would recommend doing it directly from the browser to blob storage making use of AJAX, CORS and Shared Access Signature. This way you will avoid data going through your servers. You may find these blog posts useful:
Revisiting Windows Azure Shared Access Signature
Windows Azure Storage and Cross-Origin Resource Sharing (CORS) – Lets Have Some Fun

Asp.Net MVC - how would i maintain user state in azure in my application

I know that there are a few questions like this, but the question is more in respect to this specifict situation.
Im developing a platform for taking tests online. A test is a set of images and belonging questions. Its being hosted on Azure and using MVC 4.
How I would love that if the user have taken half the test and the brower crashes or something makes it deviate from the test and comes back, it will give the option to resume.
I have one idea my self, but would like to know if theres other options. I was considering to use the localstorage. When a user starts a test, the information for the test is saved in localstorage and every time he moved on to a new image, it updates the local state. Then if the test-player is loaded it will check if any ongoing tests are avalible.
How could i do it? any one witch similar problem/solution.
Local Storage is not a good choice, because it is specific to each instance. That means if you have two instances of a Web Role (the recommended minimum), then each instance would have it's own local storage. They are not shared, and there is no way to access local storage on a specific machine.
You really have two options. You could use a database like SQL Azure, or use Azure caching. Azure caching is probably easier, since it's super easy to serialize/deserialize complex objects, but the downside is that caching is only valid for 72 hours. If a cached object isn't accessed/updated in 72 hours, it gets purged.
I would not recommend you storing this information on the client browser. The user has access to local storage, cookies, etc ... and could modify it. You could store the test start time in your database on the server. Then every time the user sends a request to the server in order to answer a question you would verify if the test is still active or the maximum allowed time has elapsed.

Searching for a song while using multiple API's

I'm going to attempt to create an open project which compares the most common MP3 download providers.
This will require a user to enter a track/album/artist name i.e. Deadmau5 this will then pull the relevant prices from the API's.
I have a few questions that some of you may have encountered before:
Should I have one server side page that requests all the data and it is all loaded simultaneously. If so, how would you deal with timeouts or any other problems that may arise. Or should the page load, then each price get pulled in one by one (ajax). What are your experiences when running a comparison check?
The main feature will to compare prices, but how can I be sure that the products are the same. I was thinking running time, track numbers but I would still have to set one source as my primary.
I'm making this a wiki, please add and edit any issues that you can think of.
Thanks for your help. Look out for a future blog!
I would check amazon first. they will give you a SKU (the barcode on the back of the album, I think amazon calls it an EAN) If the other providers use this, you can make sure they are looking at the right item.
I would cache all results into a database, and expire them after a reasonable time. This way when you get 100 requests for Britney Spears, you don't have to hammer the other sites and slow down your application.
You should also make sure you are multithreading whatever requests you are doing server side. Curl for instance allows you to pull multiple urls, and assigns a user defined callback. I'd have the callback send a some data so you can update your page with as the results come back. GETTUNES => curl callback returns some data for each url while connection is open that you parse it on the client side.

Resources