I am building a website with rails on AWS and I am trying to determine the best ways to stress-test while also getting some idea of the cost I will be paying by user (very roughly). I have looked at tools like Selerium and I am curious if I could do something similar with Postman.
My objects are:
Observe what kind of load the server would be under during the test, how the cpu and memory are affected.
See how the load generated would affect the cpu cycles on the system that would generate cost to me by AWS.
Through Postman I can easily generate REST calls to my rails server and simulate user interaction, If I created some kind of multithreaded application that would make many calls like to the server, would that be an efficient way to measure these objectives?
If not, is there a tool that would help me either either (or both) of these objectives?
thanks,
You can use BlazeMeter to do the load test.
This AWS blog post show you how you can do it.
Related
Are there any recommended Load Test tools / services that are able to cycle through AWS Application Load Balancer logs stored in S3 preferably utilising the time stamps to perform piano roll type functionality?
aws-log-replay seems to be something you're looking for, it can replay requests with defined concurrency.
With regards to more or less popular load testing tools I can only think of Apache JMeter with Access Log Sampler which support out of box access log files from Tomcat, Weblogic, Reisin and SunOne, however you can come up with your own implementation of Generator class or dynamically populate HTTP Request sampler fields using JSR223 PreProcessor like it's described in Stop Making Assumptions! Learn How to Replay Your Production Traffic With JMeter guide.
Actually I don't think you will be able to produce realistic load by replaying your access logs, it might work for something simple like static content, however if your application assumes authentication, sessions, complex workflows, etc. - I'm afraid your "replay" attempt will got stuck at login page.
So instead of trying to replay complex scenarios from the logs I would suggest sticking to the load testing tool of your choice and create it from scratch. Access logs can be used to identify workload distribution (like X % of users are normally doing this, Y % are doing that, etc.) and anticipated concurrency (like at X time we had Y online users).
I created an application that uses Watir to automate logging in and perform a couple of functions within a site.
Right now it's written 100% purely in ruby classes that I was just executing in irb, but I want to put it into a Rails application and put it online. I haven't been able to find much information about using something like Capybara or Watir for anything other than testing. Is this because of how slow they are or is it a capabilities issue?
Would I be able to run a background process that opens a browser with Watir and performs a few functions for each user in production?
Another question I have is how to keep the session over a longer period of time. There are two sites that require 2FA that my app logs into. If I wanted to log in and perform a function once an hour with a Watir browser, I could create it as a background process (if that works). But when the process is done the browser closes and when the background process runs again in an hour it requires 2FA again.
My other worry is speed. If I have 50 users that all need to run a Watir browser at the same time I imagine that will be slow. I am not worried as much about speed as long as they run and collect the data and perform the few actions we need, but how it will effect the applications integrity.
WATIR is specifically designed as a testing tool.. WATIR stands for Web Application Testing In Ruby. It's design is centered around interacting with a browser the way a user would, effectively simulating the same actions a user would take when using a site. That it would be sub-optimal for other tasks is to be expected. Due to scraping and testing having very similar activities, there a a number of people that use watir for that task, but it is not designed for that purpose, and it is unlikely that the WATIR developers would ever add features specific to data scraping verses testing.
With the things you are contemplating, you should ask yourself if you are doing the equivalent of using a socket wrench as a hammer, and if there might be a better tool you could use.
If the sites you are interacting with support an API, then that would be the preferred way to interact with them, to get information from the site. If that is not supported, you may want to consider looking at other gems that would let you request the site HTML or parse the HTML directly (e.g. Nokogiri)
You should also inspect the terms of service for the sites you are interacting with (if you don't own them) to ensure that there are not prohibitions against using 'robots' or other automated means to access the site. If so, then using Watir in the way you propose may end up getting you banned from access to the site if the pattern of your access is obviously the result of an automated process.
This is actually done more often than people think. Maybe not specifically with Watir, but running a browser automation task in a job. The jobs should be queued and run asynchronously, preferably in a different process than your main web app.
I wrote about this strategy here: https://blogstephenarifin.wordpress.com/2018/08/23/integrating-a-third-party-without-an-api-using-rails-5-and-heroku/
If you find yourself having to use Watir then the best way is to use it to render the page (for example in headless mode for javascript), save it and then use Nokogiri to process it. Apis are suggested a lot by people who don't and can't find a use for scraping but at times is necessary and perfectly legit (you may even be scraping your own data). Apis are not a universal option.
Second you should probably regulate its use to a background job. If you have end users (and you really shouldn't have many simultaneous users) many services inform customers data will be available in a few hours to a few days
I currently have an API for one of my projects and a service that is responsible for generating export files as CSVs, archive and store them somewhere in the cloud.
Since my API is written in Rails and my service in plain Ruby, I use the Her gem in the service to interact with the API. But I find my current implementation less performant, since I do a Model.all in my service, which in turn triggers a request that may contain way too many objects in the response.
I am curious on how to improve this whole task. Here's what I've thought of:
implement pagination at API level and call Model.where(page: xxx) from my service;
generate the actual CSV at API level and send the CSV back to the service (this may be done sync or async).
If I were to use the first approach, how many objects should I retrieve per page? How big should a response be?
If I were to use the second approach, this would bring quite an overhead to the request (and I guess API requests shouldn't take that long) and I also wonder whether it's really the API's job to do this.
What approach should I follow? Or, is there something better that I'm missing?
You need to pass a lot of information through a ruby process, that's always not simple, I don't think you're missing anything here.
If you decide to generate CSVs at the API level then what do you get with maintaining the service? You could just ditch the service altogether because replacing your service with an nginx proxy would do the same thing better (if you're just streaming the response from API host)?
If you decide to paginate, there will be a performance reduction for sure, but nobody can tell you exactly how much you should paginate - bigger pages will be faster and consume more memory (reducing throughput by being able to run less workers), smaller pages will be slower and consume less memory but demand more workers because of IO wait times,
exact numbers will depend on the IO response times of your API app and the cloud and your infrastructure, I'm afraid no one can give you a simple answer you can follow without experimentation with a stress test, and once you set up a stress test, you will get a number of your own anyway - better than anybody's estimate.
A suggestion, write a bit more about your problem, constraints you are working under etc and maybe someone can help you with a bit more radical solution. For some reason I get the feeling that what you're really looking for is a background processor like sidekiq or delayed job, or maybe connect your service to the DB directly through a DB view if you are anxoius to decouple your apps, or an nginx proxy for API responses, or nothing at all... but I really can't tell without more information.
I think it really depends how you want do define 'performance' and what your goal for your API is. Do you want to make sure no request to your API takes longer than 20msec to respond, than adding pagination would be a reasonable approach. Especially if the CSV generation is just an edge case, and the API is really built for other services. The number of items per page would then be limited by the speed at which you can deliver them. Your service would not be particularly more performant (even less so), since it needs to call the service multiple times.
Creating an async call (maybe with a webhook as callback) would be worth adding to your API if you think it is a valid use case for services to dump the whole record set.
Having said that, I think strictly speaking it is the job of the API to be quick and responsive. So maybe try to figure out how caching can improve response times, so paging through all the records is reasonable. On the other hand it is the job of the service to be mindful of the amount of calls to the API, so maybe store old records locally and only poll for updates instead of dumping the whole set of records each time.
I am interested in understanding the different approaches to handling large file uploads in a Rails application, 2-5Gb files.
I understand that in order to transfer a file of this size it will need to be broken down into smaller parts, I have done some research and here is what I have so far.
Server-side config will be required to accept large POST requests and probably a 64bit machine to handle anything over 4Gb.
AWS supports multipart upload.
HTML5 FileSystemAPI has a persistent uploader that uploads the file in chunks.
A library for Bitorrent although this requires a transmission client which is not ideal
Can all of these methods be resumed like FTP, the reason I dont want to use FTP is that I want to keep in the web app if this is possible? I have used carrierwave and paperclip but I am looking for something that will be able to be resumed as uploading a 5Gb file could take some time!
Of these approaches I have listed I would like to undertand what has worked well and if there are other approaches that I may be missing? No plugins if possible, would rather not use Java Applets or Flash. Another concern is that these solutions hold the file in memory while uploading, that is also a constraint I would rather avoid if possible.
I've dealt with this issue on several sites, using a few of the techniques you've illustrated above and a few that you haven't. The good news is that it is actually pretty realistic to allow massive uploads.
A lot of this depends on what you actually plan to do with the file after you have uploaded it... The more work you have to do on the file, the closer you are going to want it to your server. If you need to do immediate processing on the upload, you probably want to do a pure rails solution. If you don't need to do any processing, or it is not time-critical, you can start to consider "hybrid" solutions...
Believe it or not, I've actually had pretty good luck just using mod_porter. Mod_porter makes apache do a bunch of the work that your app would normally do. It helps not tie up a thread and a bunch of memory during the upload. It results in a file local to your app, for easy processing. If you pay attention to the way you are processing the uploaded files (think streams), you can make the whole process use very little memory, even for what would traditionally be fairly expensive operations. This approach requires very little actual setup to your app to get working, and no real modification to your code, but it does require a particular environment (apache server), as well as the ability to configure it.
I've also had good luck using jQuery-File-Upload, which supports good stuff like chunked and resumable uploads. Without something like mod_porter, this can still tie up an entire thread of execution during upload, but it should be decent on memory, if done right. This also results in a file that is "close" and, as a result, easy to process. This approach will require adjustments to your view layer to implement, and will not work in all browsers.
You mentioned FTP and bittorrent as possible options. These are not as bad of options as you might think, as you can still get the files pretty close to the server. They are not even mutually exclusive, which is nice, because (as you pointed out) they do require an additional client that may or may not be present on the uploading machine. The way this works is, basically, you set up an area for them to dump to that is visible by your app. Then, if you need to do any processing, you run a cron job (or whatever) to monitor that location for uploads and trigger your servers processing method. This does not get you the immediate response the methods above can provide, but you can set the interval to be small enough to get pretty close. The only real advantage to this method is that the protocols used are better suited to transferring large files, the additional client requirement and fragmented process usually outweigh any benefits from that, in my experience.
If you don't need any processing at all, your best bet may be to simply go straight to S3 with them. This solution falls down the second you actually need to do anything with the files other than server them as static assets....
I do not have any experience using the HTML5 FileSystemAPI in a rails app, so I can't speak to that point, although it seems that it would significantly limit the clients you are able to support.
Unfortunately, there is not one real silver bullet - all of these options need to be weighed against your environment in the context of what you are trying to accomplish. You may not be able to configure your web server or permanently write to your local file system, for example. For what it's worth, I think jQuery-File-Upload is probably your best bet in most environments, as it only really requires modification to your application, so you could move an implementation to another environment most easily.
This project is a new protocol over HTTP to support resumable upload for large files. It bypass Rails by providing its own server.
http://tus.io/
http://www.jedi.be/blog/2009/04/10/rails-and-large-large-file-uploads-looking-at-the-alternatives/ has some good comparisons of the options, including some outside of Rails.
Please go through it.It was helpful in my case
Also another site to go to is:-
http://bclennox.com/extremely-large-file-uploads-with-nginx-passenger-rails-and-jquery
Please let me know if any of this does not work out
I would by-pass the rails server and post your large files(split into chunks) directly from the browser to Amazon Simple Storage. Take a look at this post on splitting files with JavaScript. I'm a little curious how performant this setup would be and I feel like tinkering with this setup this weekend.
I think that Brad Werth nailed the answer
just one approach could be upload directly to S3 (and even if you do need some reprocessing after you could theoretical use aws lambda to notify your app ... but to be honest I'm just guessing here, I'm about to solve the same problem myself, I'll expand on this later)
http://aws.amazon.com/articles/1434
if you use carrierwave
https://github.com/dwilkie/carrierwave_direct_example
Uploading large files on Heroku with Carrierwave
Let me also pin down few options that might help others looking for a real world solution.
I have a Rails 6 with Ruby 2.7 and the main purpose of this app is to create a Google drive like environment where users can upload images and videos and them process them again for high quality.
Obviously we did tried using local processing using Sidekiq background jobs but it was overwhelming during large uploads like 1GB and more.
We did tried tuts.io but personally I think is not quite easy to setup just like Jquery File uploads.
So we experimented with AWS..moving in steps listed below and it worked like a charm....uploading directly to S3 from the browser.
using React drop zone uploader...we uploads multiple files to S3.
we setup Aws Lambda for an input bucket to get triggered for all types of object creations on that bucket.
this Lambda converts the file and again uploads the reprocessed one to another one - output bucket and notifies us using Aws SNS to keep a track of what worked and what failed.
in Rails side... we just dynamically use the new output bucket and then serve it with Aws Cloud-front distribution.
You may check Aws notes on MediaConvert to check step by step guide and they also have a well written Github repos for all sorts of experimentation.
So, from the user's point of view, he can upload one large file, with Acceleration enabled on the S3, the React library show uploading progress and once it gets uploaded, Rails callback api again verifies its existence in the S3 BUCKET like mybucket/user_id/file_uploaded_slug and then its confirmed to user through a simple flash message.
You can also configure Lambda to notify end user on successful upload/encoding, if needed.
Refer this documentation - https://github.com/mike1011/aws-media-services-vod-automation/tree/master/MediaConvert-WorkflowWatchFolderAndNotification
Hope it helps someone here.
I've been working on a rails project that's unusual for me in a sense that it's not going to be using a MySQL database and instead will roll with mongoDB + Redis.
The app is pretty simple - "boot up" data from mongoDB to Redis, after which point rails will be ready to take requests from users which will consist mainly of pulling data from redis, (I was told it'd be pretty darn fast at this) doing a quick calculation and sending some of the data back out to the user.
This will be happening ~1500-4500 times per second, with any luck.
Before the might of the user army comes down on the server, I was wondering if there was a way to "simulate" the page requests somehow internally - like running a rake task to simply execute that page N times per second or something of the sort?
If not, is there a way to test that load and then time the requests to get a rough idea of the delay most users will be looking at?
Caveat
Performance testing is a very broad topic, and the right tool often depends on the type and quality of results that you need. As just one example of the issues you have to deal with, consider what happens if you write a benchmark spec for a specific controller action, and call that method 1000 times in a row. This might give a good idea of performance of that controller method, but it might be making the same redis or mongo query 1000 times, the results of which the database driver may be caching. This also ignores the time it'll take your web server to respond and serve up the static assets that are part of the request (this may be okay, especially if you have other tests for this).
Basic Tools
ab, or ApacheBench, is a simple commandline tool that you can use to test the throughput and speed of your app. I usually go to this first when I want to send a thousand requests at a web server, or test how many simultaneous requests my app can handle (e.g. when comparing mongrel, unicorn, thin, and goliath). Because all requests originate from the same server, this is good for a small number of requests, but as the number of requests grow, you'll be limited by the resources on your testing machine (network stack, cpu, and maybe memory).
Benchmark is a standard ruby class, and is great for quickly spitting out some profiling information. It can also be used with Test::Unit and RSpec. If you want a rake task for doing some quick benchmarking, this is probably the place to start
mechanize - I like using mechanize for quickly scripting an interaction with a page. It handles cookies and forms, but won't go and grab assets like images by default. It can be a good tool if you're rolling your own tests, but shouldn't be the first one to go to.
There are also some tools that will simulate actual users interacting with the site (they'll download assets as a browser would, and can be configured to simulate several different users). Most notable are The Grinder and Tsung. While still very much in development, I'm currently working on tsung-rails to make it easier to automate rails load testing with tsung, and would love some help if you choose to go in this direction :)
Rails Profiling Links
Good overview for writing performance tests
Great slide deck covering most of the latest tools for profiling at various levels