I've been working on a rails project that's unusual for me in a sense that it's not going to be using a MySQL database and instead will roll with mongoDB + Redis.
The app is pretty simple - "boot up" data from mongoDB to Redis, after which point rails will be ready to take requests from users which will consist mainly of pulling data from redis, (I was told it'd be pretty darn fast at this) doing a quick calculation and sending some of the data back out to the user.
This will be happening ~1500-4500 times per second, with any luck.
Before the might of the user army comes down on the server, I was wondering if there was a way to "simulate" the page requests somehow internally - like running a rake task to simply execute that page N times per second or something of the sort?
If not, is there a way to test that load and then time the requests to get a rough idea of the delay most users will be looking at?
Caveat
Performance testing is a very broad topic, and the right tool often depends on the type and quality of results that you need. As just one example of the issues you have to deal with, consider what happens if you write a benchmark spec for a specific controller action, and call that method 1000 times in a row. This might give a good idea of performance of that controller method, but it might be making the same redis or mongo query 1000 times, the results of which the database driver may be caching. This also ignores the time it'll take your web server to respond and serve up the static assets that are part of the request (this may be okay, especially if you have other tests for this).
Basic Tools
ab, or ApacheBench, is a simple commandline tool that you can use to test the throughput and speed of your app. I usually go to this first when I want to send a thousand requests at a web server, or test how many simultaneous requests my app can handle (e.g. when comparing mongrel, unicorn, thin, and goliath). Because all requests originate from the same server, this is good for a small number of requests, but as the number of requests grow, you'll be limited by the resources on your testing machine (network stack, cpu, and maybe memory).
Benchmark is a standard ruby class, and is great for quickly spitting out some profiling information. It can also be used with Test::Unit and RSpec. If you want a rake task for doing some quick benchmarking, this is probably the place to start
mechanize - I like using mechanize for quickly scripting an interaction with a page. It handles cookies and forms, but won't go and grab assets like images by default. It can be a good tool if you're rolling your own tests, but shouldn't be the first one to go to.
There are also some tools that will simulate actual users interacting with the site (they'll download assets as a browser would, and can be configured to simulate several different users). Most notable are The Grinder and Tsung. While still very much in development, I'm currently working on tsung-rails to make it easier to automate rails load testing with tsung, and would love some help if you choose to go in this direction :)
Rails Profiling Links
Good overview for writing performance tests
Great slide deck covering most of the latest tools for profiling at various levels
Related
I created an application that uses Watir to automate logging in and perform a couple of functions within a site.
Right now it's written 100% purely in ruby classes that I was just executing in irb, but I want to put it into a Rails application and put it online. I haven't been able to find much information about using something like Capybara or Watir for anything other than testing. Is this because of how slow they are or is it a capabilities issue?
Would I be able to run a background process that opens a browser with Watir and performs a few functions for each user in production?
Another question I have is how to keep the session over a longer period of time. There are two sites that require 2FA that my app logs into. If I wanted to log in and perform a function once an hour with a Watir browser, I could create it as a background process (if that works). But when the process is done the browser closes and when the background process runs again in an hour it requires 2FA again.
My other worry is speed. If I have 50 users that all need to run a Watir browser at the same time I imagine that will be slow. I am not worried as much about speed as long as they run and collect the data and perform the few actions we need, but how it will effect the applications integrity.
WATIR is specifically designed as a testing tool.. WATIR stands for Web Application Testing In Ruby. It's design is centered around interacting with a browser the way a user would, effectively simulating the same actions a user would take when using a site. That it would be sub-optimal for other tasks is to be expected. Due to scraping and testing having very similar activities, there a a number of people that use watir for that task, but it is not designed for that purpose, and it is unlikely that the WATIR developers would ever add features specific to data scraping verses testing.
With the things you are contemplating, you should ask yourself if you are doing the equivalent of using a socket wrench as a hammer, and if there might be a better tool you could use.
If the sites you are interacting with support an API, then that would be the preferred way to interact with them, to get information from the site. If that is not supported, you may want to consider looking at other gems that would let you request the site HTML or parse the HTML directly (e.g. Nokogiri)
You should also inspect the terms of service for the sites you are interacting with (if you don't own them) to ensure that there are not prohibitions against using 'robots' or other automated means to access the site. If so, then using Watir in the way you propose may end up getting you banned from access to the site if the pattern of your access is obviously the result of an automated process.
This is actually done more often than people think. Maybe not specifically with Watir, but running a browser automation task in a job. The jobs should be queued and run asynchronously, preferably in a different process than your main web app.
I wrote about this strategy here: https://blogstephenarifin.wordpress.com/2018/08/23/integrating-a-third-party-without-an-api-using-rails-5-and-heroku/
If you find yourself having to use Watir then the best way is to use it to render the page (for example in headless mode for javascript), save it and then use Nokogiri to process it. Apis are suggested a lot by people who don't and can't find a use for scraping but at times is necessary and perfectly legit (you may even be scraping your own data). Apis are not a universal option.
Second you should probably regulate its use to a background job. If you have end users (and you really shouldn't have many simultaneous users) many services inform customers data will be available in a few hours to a few days
I am building a website with rails on AWS and I am trying to determine the best ways to stress-test while also getting some idea of the cost I will be paying by user (very roughly). I have looked at tools like Selerium and I am curious if I could do something similar with Postman.
My objects are:
Observe what kind of load the server would be under during the test, how the cpu and memory are affected.
See how the load generated would affect the cpu cycles on the system that would generate cost to me by AWS.
Through Postman I can easily generate REST calls to my rails server and simulate user interaction, If I created some kind of multithreaded application that would make many calls like to the server, would that be an efficient way to measure these objectives?
If not, is there a tool that would help me either either (or both) of these objectives?
thanks,
You can use BlazeMeter to do the load test.
This AWS blog post show you how you can do it.
For web development I'd like to mix rails and node.js since I want to get the best out of both worlds (rails for fast web development and node for concurrency). I know that some people choose to just use full ruby stack with eventmachine that is integrated into rails controller so that every request can be nonblocking by using fiber in event-loop model. I have been able to understand how that works in a big picture.
At this moement however I want to try doing nonblocking request processing with rails and node.js with message queue concept. I heard that this can be achieved by using redis as an intermediary. I'm still having trouble trying to figure out how that works as of now. From what I can understand: so we have 2 apps A (rails) and B (node.js) and redis. rails app will handle requests from users that go through controllers in REST manner, and then from there rails will pass that through redis, and then redis will form queues and node.js app will pick up that queue and do whatever necessary afterhand (write or read from backend db).
My questions:
So how would that improve concurrency and scalability? from what i
know since rails handle the requests through controllers
synchronously, and then write to redis, the requests will be
blocking still, even though node.js end can pickup the queue
asynchronously. (I have a feeling that it's not asynchronous yet if it's not end to end
non-blocking).
Would node.js be considered a proxy or an application here if redis
is the intermediary?
I'm new to redis and learning it still. If I'm using 100% noSQL
solution for my backend database, such as mongoDB or couchDB, are they replaceable by redis entirely or is redis more seen as a
messaging queue tool like rabbitMQ?
Is messaging queue a different concurrency concept than threading or
event-loop model or is it supposed to supplement them?
That's all my question. I'm new to message queue concept. Will appreciate any help and pointers to right direction and articles that help me learn more. thanks.
You are mixing some things here that don't go together.
Let's first make sure we are on the same page regarding the strengths/weaknesses of the involved technologies
Rails: Used for it's web-development simplicity and perfect for serving database-backed web-applications.
Not very performant when having to serve a large number of long running requests as you'd run out of threads on your Ruby workers - but well suited for anything that can scale horizontally with more web-nodes (multiple web-servers - 1 db).
Node.js: Great for high-concurrency scenarios. Not as easy as rails to write a regular web-application in it. But can handle near an insane amount of long-running low-cpu tasks efficiently.
Redis: A Key-Value Store that supports operations on it's data-structures (increment/decrement values, append/prepent push/pop to lists - all operations that make this DB work consistently with multiple clients writing at once)
Now as you can see, there is no benefit in having Rails AND Node serve the same request - communicating through Redis. Going through the Rails Stack would not provide any benefit if the requests ends up being handled by the Node server.
And even if you only offload some processing to the node server, it's still the Rails webserver that handles the requests and has to wait for a response from node - killing the desired scalability. It simply makes no sense.
Where you would a setup with Node and Rails together is in certain areas of your app that have drastically different scaling requirements.
If you are for example writing a Website that displays live stats for Football games you can easily see that there are two different concerns in your app: The "normal" Site that contains signup, billing and profile stuff that screams for a quick implementation through rails. And the "live" portion of the site where users see live results and you expect to handle a lot of clients at once - all waiting for something to happen (low cpu - high concurrency).
In such a case it may be beneficial to actually seperate the two parts of the site into a Ruby and a Node app, with then sharing data about the user through a store like Redis (but actually you just need some shared state that both can look at and write to for synchronization purposes).
So you would use for example Rails for the Signup/Login portions - once signed up write the session cookie into redis alongside with the permissions of the user (what game is he allowed to follow) and hand the user off to the Node.js app.
There the Node app can read the session information from Redis and serve the user.
Word of advice:
You don't get scalability by simply throwing Node.js into your Toolbox. You really have to find out what Node.js is good at (low-cpu high-io concurrent operations) and how you can leverage that to remedy some of the problems your currently chosen technology has.
I can answer 3 for you. Redis does not guarantee that when you perform an operation that result will actually be on disk, also transaction handling it a bit "different". It also requires for the whole database to be in memory. Depending on the situation this can be an issue or not. It is however incredibly fast. It is not a messaging queue, you can easily make a queue out of it, but it is not it's purpose. If you want to have a queuing system only you can probably do better with something else.
I asked this on HN but didn't get much advice.
I'm a n00b in web app development. Nevertheless, I've been working on an app (in Ruby on Rails + deployed with Heroku) that has gotten some really positive feedback, so I'd like to dedicated more resources to it.
However, I'm not a sysadmin or anything of the sort, so I'm unsure as to what steps to take to ensure my app is robust and can handle unexpected traffic spikes without crashing.
Essentially, I'd like to prepare for the worst-case scenario in terms of handling unexpected traffic spikes etc.
Any specific pointers (especially with Heroku) will be helpful!
What is the overall distribution of http requests for loading a typical page on your app? Open that in Mozilla Firebug / Chrome Dev Tool and analyze http requests being made.
If you see that there are LOTS of static content being loaded (Like CSS / images/ JS ) for each page then it would indicate a cache issue (static content are not getting cached).
You could even move static content to a CDN (http://en.wikipedia.org/wiki/Content_delivery_network) These two are low hanging fruits.
next step is to ensure that your app can be hosted on multiple machines (E.g. it does not depend on same http session across each host and similar things). This way you can add more hosts to serve the demand.
To ensure your app is stable during all development and deployments, write unit tests, functional tests and integration tests (there are a lot of gems to handle this, like shoulda, rspec, cucumber, capybara, selenium...).
I would also use Hoptoad for error notification. There is a free offer for 1 project ( https://hoptoadapp.com/account/new/Egg )
The next thing to monitor your site could be NewRelic ( http://newrelic.com/ ). It gives you an overview what queries took long and where is a bottleneck in your app.
A very short answer: In theory you need to learn about capacity, scalability and performance.
On practice you can spin up more instances on heroku (or engines) but you'll be paying more money for it.
I'm writing a Rails web service that interacts with various pieces of hardware scattered throughout the country.
When a call is made to the web service, the Rails app then attempts to contact the appropriate piece of hardware, get the needed information, and reply to the web client. The time between the client's call and the reply may be up to 10 seconds, depending upon lots of factors.
I do not want to split the web service call in two (ask for information, answer immediately with a pending reply, then force another api call to get the actual results).
I basically see two options. Either run JRuby and use multithreading or else run several regular Ruby instances and hope that not many people try to use the service at a time. JRuby seems like the much better solution, but it still doesn't seem to be mainstream and have out of the box support at Heroku and EngineYard. The multiple instance solution seems like a total kludge.
1) Am I right about my two options? Is there a better one I'm missing?
2) Is there an easy deployment option for JRuby?
I do not want to split the web service call in two (ask for information, answer immediately with a pending reply, then force another api call to get the actual results).
From an engineering perspective, this seems like it would be the best alternative.
Why don't you want to do it?
There's a third option: If you host your Rails app with Passenger and enable global queueing, you can do this transparently. I have some actions that take several minutes, with no issues (caveat: some browsers may time out, but that may not be a concern for you).
If you're worried about browser timeout, or you cannot control the deployment environment, you may want to process it in the background:
User requests data
You enter request into a queue
Your web service returns a "ticket" identifier to check the progress
A background process processes the jobs in the queue
The user polls back, referencing the "ticket" id
As far as hosting in JRuby, I've deployed a couple of small internal applications using the glassfish gem, but I'm not sure how much I would trust it for customer-facing apps. Just make sure you run config.threadsafe! in production.rb. I've heard good things about Trinidad, too.
You can also run the web service call in a delayed background job so that it's not hogging up a web-server and can even be run on a separate physical box. This is also a much more scaleable approach. If you make the web call using AJAX then you can ping the server every second or two to see if your results are ready, that way your client is not held in limbo while the results are being calculated and the request does not time out.