Grails: detemine number of calls to vaious API Services - grails

I have this working grails app, and i want to determine how many calls various parts of it's API gets per time unit.
This app offers 100+ web services, and 10k+ open connections at any time, so it is pretty busy.
It's written in grails 1.3.6 (developed over 10 years).
My overall target is to gain knowledge about which interfaces are actually used and how often.
I see several poor possibilities:
Log everything, parse the logs and 'grep -c'.... (Bad idea)
tcpdump... (Bad idea)
Increment global variables (thread safe) where i want to instrument it (least bad idea)
How do i get grails to tell me which services are called and how often?
...without killing performance?
Any ideas/pointers will be appreciated :-)

Related

Large percent of requests in CLRThreadPoolQueue

We have an ASP.NET MVC application hosted in an azure app-service. After running the profiler to help diagnose possible slow requests, we were surprised to see this:
An unusually high % of slow requests in the CLRThreadPoolQueue. We've now run multiple profile sessions each come back having between 40-80% in the CLRThreadPoolQueue (something we'd never seen before in previous profiles). CPU each time was below 40%, and after checking our metrics we aren't getting sudden spikes in requests.
The majority of the requests listed as slow are super simple api calls. We've added response caching and made them async. The only thing they do is hit a database looking for a single record result. We've checked the metrics on the database and the query avg run time is around 50ms or less. Looking at application insights for these requests confirms this, and shows that the database query doesn't take place until the very end of the request time line (I assume this is the request sitting in the queue).
Recently we started including SignalR into a portion of our application. Its not fully in use but it is in the code base. We since switched to using Azure SignalR Service and saw no changes. The addition of SignalR is the only "major" change/addition we've made since encountering this issue.
I understand we can scale up and/or increase the minWorkerThreads. However, this feels like I'm just treating the symptom not the cause.
Things we've tried:
Finding the most frequent requests and making them async (they weren't before)
Response caching to frequent requests
Using Azure SignalR service rather than hosting it on the same web
Running memory dumps and contacting azure support (they
found nothing).
Scaling up to an S3
Profiling with and without thread report
-- None of these steps have resolved our issue --
How can we determine what requests and/or code is causing requests to pile up in the CLRThreadPoolQueue?
We encountered a similar problem, I guess internally SignalR must be using up a lot of threads or some other contended resource.
We did three things that helped a lot:
Call ThreadPool.SetMinThreads(400, 1) on app startup to make sure that the threadpool has enough threads to handle all the incoming requests from the start
Create a second App Service with the same code deployed to it. In the javascript, set the SignalR URL to point to that second instance. That way, all the SignalR requests go to one app service, and all the app's HTTP requests go to the other. Obviously this requires a SignalR backplane to be set up, but assuming your app service has more than 1 instance you'll have had to do this anyway
Review the code for any synchronous code paths (eg. making a non-async call to the database or to an API) and convert them to async code paths

"OutOfMemoryError: PermGen space" with Grails webapps

Our team has been working with Grails (version 2.3.5) for a little under a year now, and the delivery team managing our servers has little to no experience related to applications written in Grails.
We have several Tomcat 7 instances, both in the test and production environments, with a certain amount of webapps. While some of the instances only containing webapps developed in Java (w/ Spring, Hibernate) sometimes get up to something like 20 contexts with no major issue, it seems like anything past 6 Grails applications (applications much similar to their Java counterparts) starts regularly causing the dreaded PermGen space issue.
The PermGen allocated is currently 536Mb, and the delivery team obviously suggests either using a separate instance for the new applications, or increasing the allocated memory; at the same time they are urging us to verify how these few apps are saturating the memory.
Our impression is that this is normal with Grails apps, but not having any senior Grails developer we have no way to confirm it from experience or better knowledge.
Is 536Mb too little allocated space in PermGen for 8 "regular" Grails webapps?
Update:
To specify what I mean by "regular", these are all couples of front-end + back-end for different services, where the front-end has nothing much more than a list of requests, a wizard to go from zero to a completed request, validates data, persists it, calls a webservice to get a protocol number, and in a couple cases calls an external payment gateway.
The back-end is used to manage requests and performs similar operations.
Every app has maybe around 20 entities with respective controllers, services, and views, and on top of that we have a few classes to handle security w/ Spring Security and an external infrastructure.
That's how it is. You have basically two options.
Migrate on Java 8 (see http://www.infoq.com/articles/Java-PERMGEN-Removed)
Increase the PermGen space further.
And a quick background. Unlike regular Java with Spring, Groovy and Grails generate quite a lot of classes in the runtime (GSPs being one example). Groovy also generate huge amount of classes itself - each closure is a class. All this put pressure on the permgen.
To ease off the pressure get rid of all unnecessary plugins, consolidate GSPs, rethink closures, use AOP only when absolutely needed etc.
We used to have similar problems, so our team started using one tomcat per app. We also separated credentials from security purposes. Now it's easier to manage them, monitor logs and make periodical updates.
Hint: it's easier (imho and cleaner) to train your admin in creating users, home_dirs with tomcats instances and just providing credentials.

Async App Server versus Multiple Blocking Servers

tl;dr Many Rails apps or one Vertx/Play! app?
I've been having discussions with other members of my team on the pros and cons of using an async app server such as the Play! Framework (built on Netty) versus spinning up multiple instances of a Rails app server.
I know that Netty is asynchronous/non-blocking, meaning during a database query, network request, or something similar an async call will allow the event loop thread to switch from the blocked request to another request ready to be processed/served. This will keep the CPUs busy instead of blocking and waiting.
I'm arguing in favor or using something such as the Play! Framework or Vertx.io, something that is non-blocking... Scalable. My team members, on the other hand, are saying that you can get the same benefit by using multiple instances of a Rails app, which out of the box only comes with one thread and doesn't have true concurrency as do apps on the JVM. They are saying just use enough app instances to match the performance of one Play! application (or however many Play! apps we use), and when a Rails app blocks the OS will switch processes to a different Rails app. In the end, they are saying that the CPUs will be doing the same amount of work and we will get the same performance.
So here are my questions:
Are there any logical fallacies in the arguments above? Would the OS manage the Rails app instances as well as Netty (which also runs on the JVM, which maps threads to cores very well) manages requests in its event loop?
Would the OS be as performant in switching on blocking calls as would something like Netty or Vertx, or even something built on Ruby's own EventMachine?
With enough Rails app instances to match the performance Play! apps, would there be a cost noticeable cost difference in running the servers? If there are no cost difference it wouldn't really matter what method is used, in my opinion. Shoot if it was cheaper financially to run up a million Rails apps than one Play! app I would rather do that.
What are some other benefits to using either of these approaches that I may be failing to ask about?
Both approaches can and have worked. So if switching would incur a high development cost and/or schedule hit then it's probably not worth the effort...yet. Make the switch when the costs become unacceptably high. Think of using microservices as a gradual switching strategy.
If you are early on in your development cycle then making the switch early may make sense. Rewriting is a pain.
Or perhaps you'll never have to switch and rails will work for your use case like a charm. And you've been so successful at making your customers happy that the cash is just rolling in.
Some of the downsides of a blocking single server approach:
Increased memory usage. Sources: multiple processes, memory leaks, lack of shared datastructures (which increases communication costs and brings up consistency issues).
Lack of parallelism. This has two consequences: more boxes and more latency. You'll need potentially a much larger box count to handle the same load. So if you need to scale and have money concerns then this can be a problem. If it isn't a concern then it doesn't matter. In the server it means increased latency, the sort of latency which can't be improved by multiplying processes, which may be a killer argument depending on your app.
Some examples of those who had made such a switch from rails to node.js and golang:
LinkedIn Moved From Rails To Node: 27 Servers Cut And Up To 20x Faster : http://highscalability.com/blog/2012/10/4/linkedin-moved-from-rails-to-node-27-servers-cut-and-up-to-2.html
Why Timehop Chose Go to Replace Our Rails App : https://medium.com/building-timehop/why-timehop-chose-go-to-replace-our-rails-app-2855ea1912d
How We Moved Our API From Ruby to Go and Saved Our Sanity : http://blog.parse.com/learn/how-we-moved-our-api-from-ruby-to-go-and-saved-our-sanity/
How We Went from 30 Servers to 2: Go : http://www.iron.io/blog/2013/03/how-we-went-from-30-servers-to-2-go.html
These posts represent arguments that are probably illustrative of what your group is going through. The decision is unfortunately not an obvious one.
It depends on the nature of what you are building, the nature of your team, the nature of resources, the nature of your skills, the nature of your goals and how you value all the different tradeoffs.
Would costs really drop? Isn't the same amount of computation done no matter the number of servers?
Depends on the type and scale of the work being done. Typically web services are IO bound, waiting on responses from other services like databases, caches, etc.
If you are using a single threaded server the process is blocked on IO a lot so it is doing nothing a lot. In contrast the nonblocking server will be able to handle many many requests while the single threaded server is blocked. You can keep adding processes, but there are only so many processes a single machine can run. A nonblocking server can have the same number of processes while keeping the CPU busy as possible handling requests. It's often possible to handle higher loads on smaller cheaper machines when using nonblocking servers.
If your expected request rate can be handled by an acceptable number of boxes and you don't expect huge spikes then you would be fine with single threaded servers. Nonblocking servers are great at soaking up load spikes without necessarily having to add machines.
If your work is such that response latencies don't really matter then you can get by with fewer nodes.
If your workload is CPU bound then you'll need more boxes anyway because there won't be the same opportunity for parallelism because the servers won't be blocking on IO.

What makes erlang scalable?

I am working on an article describing fundamentals of technologies used by scalable systems. I have worked on Erlang before in a self-learning excercise. I have gone through several articles but have not been able to answer the following questions:
What is in the implementation of Erlang that makes it scalable? What makes it able to run concurrent processes more efficiently than technologies like Java?
What is the relation between functional programming and parallelization? With the declarative syntax of Erlang, do we achieve run-time efficiency?
Does process state not make it heavy? If we have thousands of concurrent users and spawn and equal number of processes as gen_server or any other equivalent pattern, each process would maintain a state. With so many processes, will it not be a drain on the RAM?
If a process has to make DB operations and we spawn multiple instances of that process, eventually the DB will become a bottleneck. This happens even if we use traditional models like Apache-PHP. Almost every business application needs DB access. What then do we gain from using Erlang?
How does process restart help? A process crashes when something is wrong in its logic or in the data. OTP allows you to restart a process. If the logic or data does not change, why would the process not crash again and keep crashing always?
Most articles sing praises about Erlang citing its use in Facebook and Whatsapp. I salute Erlang for being scalable, but also want to technically justify its scalability.
Even if I find answers to these queries on an existing link, that will help.
Regards,
Yash
Shortly:
It's unmutable. You have no variables, only terms, tuples and atoms. Program execution can be divided by breakpoint at any place. Fully transactional model.
Processes are even lightweight than .NET threads and isolated.
It's made for communications. Millions of connections? Fully asynchronous? Maximum thread safety? Big cross-platform environment, which built only for one purpose — scale&communicate? It's all Ericsson language — first in this sphere.
You can choose some impersonators like F#, Scala/Akka, Haskell — they are trying to copy features from Erlang, but only Erlang born from and born for only one purpose — telecom.
Answers to other questions you can find on erlang.com and I'm suggesting you to visit handbook. Erlang built for other aims, so it's not for every task, and if you asking about awful things like php, Erlang will not be your language.
I'm no Erlang developer (yet) but from what I have read about it some of the features that makes it very scalable is that Erlang has its own lightweight processes that are using message passing to communicate with each other. Because of this there is no such thing as shared state and locking which is the case when using for example a multi threaded Java application.
Another difference compared to Java is that the Erlang VM does garbage collection on every little process that is running which does not take any time at all compared to Java which does garbage collection only per VM.
If you get problem with bottlenecks from database connection you could start by using a database pooling app running against maybe a replicated PostgreSQL cluster or if you still have bottlenecks use a multi replicated NoSQL setup with Mnesia, Riak or CouchDB.
I think process restarts can be very useful when you are experiencing rare bugs that only appear randomly and only when specific criteria is fulfilled. Bugs that cause the application to crash as soon as you restart the app should optimally be fixed or taken care of with a circuit breaker so that it does not spread further.
Here is one way process restart helps. By not having to deal with all possible error cases. Say you have a program that divides numbers. Some guy enters a zero to divide by. Instead of checking for that possible error (and tons more), just code the "happy case" and let process crash when he enters 3/0. It just restarts, and he can figure out what he did wrong.
You an extend this into an infinite number of situations (attempting to read from a non-existent file because the user misspelled it, etc).
The big reason for process restart being valuable is that not every error happens every time, and checking that it worked is verbose.
Error handling is verbose typically, so writing it interspersed with the logic handling doing a task can make it harder to understand the code. Moving that logic outside of the task allows you to more clearly distinguish between "doing things" code, and "it broke" code. You just let the thing that had a problem fail, and handle it as needed by a supervising party.
Since most errors don't mean that the entire program must stop, only that that particular thing isn't working right, by just restarting the part that broke, you can keep operating in a state of degraded functionality, instead of being down, while you repair the problem.
It should also be noted that the failure recovery is bounded. You have to lay out the limits for how much failure in a certain period of time is too much. If you exceed that limit, the failure propagates to another level of supervision. Each restart includes doing any needed process initialization, which is sometimes enough to fix the problem. For example, in dev, I've accidentally deleted a database file associated with a process. The crashes cascaded up to the level where the file was first created, at which point the problem rectified itself, and everything carried on.

Best way to run rails with long delays

I'm writing a Rails web service that interacts with various pieces of hardware scattered throughout the country.
When a call is made to the web service, the Rails app then attempts to contact the appropriate piece of hardware, get the needed information, and reply to the web client. The time between the client's call and the reply may be up to 10 seconds, depending upon lots of factors.
I do not want to split the web service call in two (ask for information, answer immediately with a pending reply, then force another api call to get the actual results).
I basically see two options. Either run JRuby and use multithreading or else run several regular Ruby instances and hope that not many people try to use the service at a time. JRuby seems like the much better solution, but it still doesn't seem to be mainstream and have out of the box support at Heroku and EngineYard. The multiple instance solution seems like a total kludge.
1) Am I right about my two options? Is there a better one I'm missing?
2) Is there an easy deployment option for JRuby?
I do not want to split the web service call in two (ask for information, answer immediately with a pending reply, then force another api call to get the actual results).
From an engineering perspective, this seems like it would be the best alternative.
Why don't you want to do it?
There's a third option: If you host your Rails app with Passenger and enable global queueing, you can do this transparently. I have some actions that take several minutes, with no issues (caveat: some browsers may time out, but that may not be a concern for you).
If you're worried about browser timeout, or you cannot control the deployment environment, you may want to process it in the background:
User requests data
You enter request into a queue
Your web service returns a "ticket" identifier to check the progress
A background process processes the jobs in the queue
The user polls back, referencing the "ticket" id
As far as hosting in JRuby, I've deployed a couple of small internal applications using the glassfish gem, but I'm not sure how much I would trust it for customer-facing apps. Just make sure you run config.threadsafe! in production.rb. I've heard good things about Trinidad, too.
You can also run the web service call in a delayed background job so that it's not hogging up a web-server and can even be run on a separate physical box. This is also a much more scaleable approach. If you make the web call using AJAX then you can ping the server every second or two to see if your results are ready, that way your client is not held in limbo while the results are being calculated and the request does not time out.

Resources