Send large size data with Rails - ruby-on-rails

I need to send to the customers a raw emails via my rails app.
When they click a link, a new page must open and they need to be able to see the source code of an email. I have a lot of cases where there are emails really big (even 40/50 mb), and it takes a lot of time to server to send it.
E.G.
I have an email with 3 attachments, the total size is 30mb. My
controller method it takes 700 ms to process it and to retrieve the
raw source from imap server, but in the broswer, it takes up to 5
seconds. (2.5 to the first byte, 2.5 to download it).
Right now I just send the string with the render method. Is there a better way? where I am losing all that time?
To be more clear:
With the word 'send' I mean when the server has to 'send' the source code to the browser so the suer can visualize it

What about storing attachments on your server and include in email only links to those attachments? This way your emails will be blazingly fast and clients may download attachments separately. If you don't want to store files on your server, you may use Amazon S3 or some other cloud storage (there are many in these days).
To ease file uploading, I'd recommend you carrierwave library, Amazon S3 integration goes from the box.

I would suggest you to use Delayed Job: https://github.com/collectiveidea/delayed_job or Sidkiq: https://github.com/mperham/sidekiq
When the job is in processing state, you can mark that email as Sending.... and once the background job is completed you can mark it as Sent
Hope this helps

I think you could benefit from looking into solutions for HTTP streaming, which would allow you to start sending data while still processing the request.
This is difficult with Rack based servers, so Rails might have some issues with this approach.
Another approach is to try and split the raw source into chunks and request each chunk using Ajax.
This will allow your app to be more responsive and offer a better user experience. This is also known as a perceived performance approach, since the user experiences the app as more responsive even if it takes the same amount of time to load.
If I were trying to resolve the issue, I would look into an Ajax solution that would allow me to leverage IMAP's partial fetch feature, referenced in it's RFC.
It's possible to write a simple server side API that fetches a part of an email or returns a signal when there is no more to download and than use Javascript to request the data from the server until the 'no more data' return value is received.
This would allow you to display the downloaded data as it's being received.

Related

How to dynamically and efficiently pull information from database (notifications) in Rails

I am working in a Rails application and below is the scenario requiring a solution.
I'm doing some time consuming processes in the background using Sidekiq and saves the related information in the database. Now when each of the process gets completed, we would like to show notifications in a separate area saying that the process has been completed.
So, the notifications area really need to pull things from the back-end (This notification area will be available in every page) and show it dynamically. So, I thought Ajax must be an option. But, I don't know how to trigger it for a particular area only. Or is there any other option by which Client can fetch dynamic content from the server efficiently without creating much traffic.
I know it would be a broad topic to say about. But any relevant info would be greatly appreciated. Thanks :)
You're looking at a perpetual connection (either using SSE's or Websockets), something Rails has started to look at with ActionController::Live
Live
You're looking for "live" connectivity:
"Live" functionality works by keeping a connection open
between your app and the server. Rails is an HTTP request-based
framework, meaning it only sends responses to requests. The way to
send live data is to keep the response open (using a perpetual connection), which allows you to send updated data to your page on its
own timescale
The way to do this is to use a front-end method to keep the connection "live", and a back-end stack to serve the updates. The front-end will need either SSE's or a websocket, which you'll connect with use of JS
The SEE's and websockets basically give you access to the server out of the scope of "normal" requests (they use text/event-stream content / mime type)
Recommendation
We use a service called pusher
This basically creates a third-party websocket service, to which you can push updates. Once the service receives the updates, it will send it to any channels which are connected to it. You can split the channels it broadcasts to using the pub/sub pattern
I'd recommend using this service directly (they have a Rails gem) (I'm not affiliated with them), as well as providing a super simple API
Other than that, you should look at the ActionController::Live functionality of Rails
The answer suggested in the comment by #h0lyalg0rithm is an option to go.
However, primitive options are.
Use setinterval in javascript to perform a task every x seconds. Say polling.
Use jQuery or native ajax to poll for information to a controller/action via route and have the controller push data as JSON.
Use document.getElementById or jQuery to update data on the page.

Writing a spec for sending a mail via API

I want an E-Mail to be sent using a background process whenever an Invite was generated.
What I currently have is this approach: The Invite model has the method send_mail, which sends an E-Mail using the Mandrill API and gem. It also has the method queue_mail adds InviteMailer with the invite's ID to the queue using Resque.
However… Since I'm having sort of a really hard time writing specs for this, I assume this might not be the best approach to send mails.
What I mainly want and need to test:
was the mail added to the queue?
is InviteMailer working properly?
does the mail contain the correct vital information?
Vital informations are: sent to the correct person, contains a link to a specific site and some specific data/text; also I'm not sure how to get the current host to the link.
I don't think this is a rare thing to do, so I wonder what the best practices are.
My testing environment: rspec, capybara, factory girl. I already added VCR, to cache the API-request.
You can use Mailcatcher to fake your mail server, and check received mail via web API:
Features
Catches all mail and stores it for display.
Shows HTML, Plain Text and Source version of messages, as applicable.
Rewrites HTML enabling display of embedded, inline images/etc and open links in a new window. (currently very basic)
Can send HTML for analysis by Fractal.
Lists attachments and allows separate downloading of parts.
Download original email to view in your native mail client(s).
Command line options to override the default SMTP/HTTP IP and port settings.
Mail appears instantly if your browser supports WebSockets, otherwise updates every thirty seconds.
Growl notifications when you receive a new message.
Runs as a daemon run in the background.
Sendmail-analogue command, catchmail, makes using mailcatcher from PHP a lot easier.
Written super-simply in EventMachine, easy to dig in and change.
How
gem install mailcatcher
mailcatcher
Go to http://localhost:1080/
Send mail through smtp://localhost:1025
API
A fairly RESTful URL schema means you can download a list of messages
in JSON from /messages, each message's metadata with
/messages/:id.json, and then the pertinent parts with
/messages/:id.html and /messages/:id.plain for the default HTML
and plain text version, /messages/:id/:cid for individual
attachments by CID, or the whole message with /messages/:id.source.

Posting multipart form data with Ruby

I'm building a rails app that interacts with a 3rd party API
When a user uploads a file to rails, it should be forwarded on to the 3rd party site via an HTTP POST.
In some cases, the upload can be several hundred MBs.
At the moment, I've just been re-posting to the API using Net::HTTP and accessing the multipart form object like so
#tempfile = params[:video][:file_upload].tempfile
This is hella slow though and feels kinda dirty.
Is there a better way to do this?
Is it possible to have the user post directly to the 3rd party service or do you have to handle the API through your Rails stack? Ideally you would be able to do this and would not have to load the file into your stack and then re-post it to the API. If you can't post directly, I would recommend seeing if the API has a streaming service so that you can send parts of the file instead of the entire thing at once. Either way I think you'll start running into Timeout errors on your side and on the API side with large files, so you'll have to increase your own timeouts or create a different type of streaming file uploader.
Spin up a background job using DelayedJob. In the delayed job, you could try rails redirect_to.
https://github.com/tobi/delayed_job
http://apidock.com/rails/ActionController/Base/redirect_to

How to send many emails via ASP.NET without delaying response

Following a specific action the user takes on my website, a number of messages must be sent to different emails. Is it possible to have a separate thread or worker take care of sending multiple emails so as to avoid having the response from the server take a while to return if there are a lot of emails to send?
I would like to avoid using system process or scheduled tasks, email queues.
You can definitely spawn off a background thread in your controller to handle the emails asynchronously.
I know you want to avoid queues, but another thing i have done in the past is written a windows service that pulls email from a DB queue and processes it at certain intervals. This way you can separate the 2 applications if there is a lot of email to be sent.
This can be done in many different ways, depending on how large your application is and what kind of reliability you want. Any of these ways should help you achieve what you want (in ascending order based on complexity):
If you're using IIS SMTP Server or another mail server that supports a pickup directory option, you can go with that. With this option, instead of sending the emails directly, they are saved first in the pickup directory. Your call will immediately return after the email is saved in the pickup directory, so the user won't have to wait until the email is sent. On the other hand, the server will try to send the email as soon as it's saved in the pickup directory so it's almost immediate (just without blocking the call).
You can use a background thread like described in other answers. You'll need to be careful with this option as the thread can end unexpectedly before it finishes its job. You'll need to add some code to make sure this works reliably (personally, I'd prefer not to use this option).
Using a messaging queue server like MSMQ. This is more work and you probably should only look into this if you have a large scale application or have good reasons not to use the first option with the pickup directory.
There are a few ways you could do this.
You could store enough details about the message in the database, and write a windows service to loop through them and send the email. When the user submits the form it just inserts the required data about the message and trusts the service will pick it up. Almost an email queue which you said you didn't want, but you're going to end up in a queue situation with almost any solution.
Another option would be to drop in NServiceBus. Use that for these kinds of tasks.
I typically compile the message body and store that in a table in the db along with the from and to addresses, a subject, and a timestamp indicating when the email was sent. Then I have a background task check the table periodically and pull any that haven't been sent. This task attempts to send each email and updates the timestamp accordingly. One advantage of storing the compiled message body up front is that the background task doesn't have to do any processing of context-specific data, and therefore can be pretty darn simple.
Whenever an operation like is hingent upon an event, there is always the possibility something will go wrong.
In ASP.NET you can spawn multiple threads and have those threads do the action. Make sure you tell the thread it's a background thread, otherwise ASP.NET might way for the thread to finish before rendering your page:
myThread.IsBackground = true;
I know you said you didn't want to use system process or scheduled tasks, but a windows service would be a viable approach to this as well. The approach would be to use MS Queue, or save the actions needing to be done in a DataBase table. Then have a windows service check every minute or so and do those actions.
This way, if something fails (Email server down) those emails / actions can still be done.
They will also be recorded for audit's (which is very nice to have).
This method allows you're web site to function as a website while offloading these tasks to another service. The last thing you need is for multiple ASP.NET processes to be used up waiting for emails to send. let something else handle that.

Why would you upload assets directly to S3?

I have seen quite a few code samples/plugins that promote uploading assets directly to S3. For example, if you have a user object with an avatar, the file upload field would load directly to S3.
The only way I see this being possible is if the user object is already created in the database and your S3 bucket + path is something like
user_avatars.domain.com/some/id/partition/medium.jpg
But then if you had an image tag that tried to access that URL when an avatar was not uploaded, it would yield a bad result. How would you handle checking for existence?
Also, it seems like this would not work well for most has many associations. For example, if a user had many songs/mp3s, where would you store those and how would you access them.
Also, your validations will be shot.
I am having trouble thinking of situations where direct upload to S3 (or any cloud) is a good idea and was hoping people could clarify either proper use cases, or tell me why my logic is incorrect.
Why pay for storage/bandwidth/backups/etc. when you can have somebody in the cloud handle it for you?
S3 (and other Cloud-based storage options) handle all the headaches for you. You get all the storage you need, a good distribution network (almost definitely better than you'd have on your own unless you're paying for a premium CDN), and backups.
Allowing users to upload directly to S3 takes even more of the bandwidth load off of you. I can see the tracking concerns, but S3 makes it pretty easy to handle that situation. If you look at the direct upload methods, you'll see that you can force a redirect on a successful upload.
Amazon will then pass the following to the redirect handler: bucket, key, etag
That should give you what you need to track the uploaded asset after success. Direct uploads give you the best of both worlds. You get your tracking information and it unloads your bandwidth.
Check this link for details: Amazon S3: Browser-Based Uploads using POST
If you are hosting your Rails application on Heroku, the reason could very well be that Heroku doesn't allow file-uploads larger than 4MB:
http://docs.heroku.com/s3#direct-upload
So if you would like your users to be able to upload large files, this is the only way forward.
Remember how web servers work.
Unless you're using a sort of async web setup like you could achieve with Node.JS or Erlang (just 2 examples), then every upload request your web application serves ties up an entire process or thread while the file is being uploaded.
Imagine that you're uploading a file that's several megabytes large. Most internet users don't have tremendously fast uplinks, so your web server spends a lot of time doing nothing. While it's doing all of that nothing, it can't service any other requests. Which means your users start to get long delays and/or error responses from the server. Which means they start using some other website to get the same thing done. You can always have more processes and threads running, but each of those costs additional memory which eventually means additional $.
By uploading straight to S3, in addition to the bandwidth savings that Justin Niessner mentioned and the Heroku workaround that Thomas Watson mentioned, you let Amazon worry about that problem. You can have a single-process webserver effectively handle very large uploads, since it punts that actual functionality over to Amazon.
So yeah, it's more complicated to set up, and you have to handle the callbacks to track things, but if you deal with anything other than really small files (and even in those cases), why cost yourself more money?
Edit: fixing typos

Resources