Correct way to implement standalone Grails batch processes? - grails

I want to implement the following:
Web application in Grails going to MongoDB database
Long-running batch processes populating and updating that database in the background
I would like for both of them to reuse the same Grails services and same GORM domain classes (using mongodb plugin for Grails).
For the Web application everything should work fine, including the dynamic GORM finder methods.
But I cannot figure out how to implement the batch processes.
a. If I implement them as Grails service methods, their long-running nature will be a problem. Even wrapping them in some async executors will unnecessarily complicate everything, as I'd like them each to be a separate Java process so they can be monitored and stopped easily and separately.
b. If I implement them as src/groovy scripts and try to launch from command line, I cannot inject the Grails services properly (ApplicationHolder method throws NPE) or get the GORM finder methods to work. The standalone GORM guides all have Hibernate in mind and overall it seems not the right route to pursue.
c. I considered the 'batch-launcher' Grails plugin but it failed to install and seems a bit abandoned.
d. I considered the 'run-script' Grails command to run the scripts from src/groovy and it seems it might actually work in development, but seems not the right thing to do in production.
I cannot be the only person with such a problem - so how is it generally solved?
How do people run standalone scripts sharing the code base and DB with their Grails applications?

Since you want the jobs processing to be in a separate JVM from your front-end application, the easiest way to do that is to have two instances of Grails running, one for the front-end that serves web requests, and the other to deal with job processing.
Thankfully, the rich ecosystem of plugins for Grails makes this sort of thing quite easy, though perhaps not the most efficient, since running an entire Grails application just for processing is a bit overkill.
The way I tend to go about it is to write my application as one app, with services that take care of the job processing. These services are tied to the RabbitMQ plugin, so the general flow is that the web requests (or quartz scheduled jobs) put jobs into a work queue, and then the worker services take care of processing them.
The advantage with this is that, since it's one application, I have full access to all of the domain objects, etc., and I can leverage the dissconnected nature of a message queue to scale out my front- and back-ends seperately without needing more than one application. Instead, I can just install the same application multiple times and configure the number of threads dedicated to processing jobs and/or the queues that the job processors are looking at.
So, with this setup, for development, I will usually just set the number of job processing threads to whatever makes sense for the development work I'm doing, and then just a simple grails run-app, and I have a fully functional system (assuming I have a RabbitMQ server running to use as well).
Then, when I go to deploy into production, I deploy 2 instances of the application, one for the front-end work and the other for the back-end work. I just configure the front-end instance to have 1 or 0 threads for processing jobs, and the back-end instance I give many more threads. This lets me update either portion as needed or spin up more instances if I need to scale one part or the other.
I'm sure there are other ways to do this, but I've found this to be both really easy to develop (since it's all one application), and also really easy to deploy, scale, and maintain.

Related

Durable tasks sub orchestration with micro services

I'm attempting to use azure durable tasks to orchestrate some microservices but am running into a small gap in understanding how taskhubs work as well as coordinating the projects correctly.
I'm trying to create a main orchestrator that is in charge of kicking off sub orchestrations to do the actual work. Below is a diagram of what I'm trying to achieve.
The idea is that each .net Project will be able to scale independent of the other, so if .Net project 2 was under quite a bit of load I'd be able to scale that project only and not have to worry about the other 2 projects. The problem I'm running into is from what I understand the taskhub queue is shared by all the services so there is no way to have each process focus on only it's work, meaning each project can see everything in the queue and it may cause 1 project to dequeue a message intended for project 2. Is this correct?
From reading the documentation it doesn't seem clear that I can send project 2 it's sub orchestration messages as well as send project 3 it's specific orchestration.
Am I thinking about this problem incorrectly, is there a different way I might want to approach this?
What you want cannot be achieve.
As of now, Azure Function only allow orchestrator functions to call activity and sub-orchestrator functions that exist in the same function app. The main reason is a technical one: queues within a task hub are shared across all functions, so there's no way to guarantee that a message intended for FunctionAppA does not get picked up by FunctionAppB.
If cross-project communication is required, the correct method is to use http or queue.

Whats the right strategy of when to create jobs and sub jobs in sidekiq?

So I have a system that receives messages from devices and then it goes through 3 different servers and countless of services are run on each job. From an architecture perspective, whats certain considerations in using sidekiq to make my program async? Are there downsides to making sub processes run using sidekiq. Any advice?
architecture(system design) should be based on the problems you are trying to solve. if your services are design to unique business domains and if they are async compatible then you can spawn sub jobs for each service. but if not or your need flexible transactions among services then job per request is the right choice. so you may have both of these implementations in your system based on the requirements.
The upside to making your program async with sidekiq is that it is easy and produces good reporting in case of an error. The downsides of using sidekiq for this task is that there is a lot of overhead creating and executing the jobs. This could become such a problem that it represents the majority of the resources used.

"OutOfMemoryError: PermGen space" with Grails webapps

Our team has been working with Grails (version 2.3.5) for a little under a year now, and the delivery team managing our servers has little to no experience related to applications written in Grails.
We have several Tomcat 7 instances, both in the test and production environments, with a certain amount of webapps. While some of the instances only containing webapps developed in Java (w/ Spring, Hibernate) sometimes get up to something like 20 contexts with no major issue, it seems like anything past 6 Grails applications (applications much similar to their Java counterparts) starts regularly causing the dreaded PermGen space issue.
The PermGen allocated is currently 536Mb, and the delivery team obviously suggests either using a separate instance for the new applications, or increasing the allocated memory; at the same time they are urging us to verify how these few apps are saturating the memory.
Our impression is that this is normal with Grails apps, but not having any senior Grails developer we have no way to confirm it from experience or better knowledge.
Is 536Mb too little allocated space in PermGen for 8 "regular" Grails webapps?
Update:
To specify what I mean by "regular", these are all couples of front-end + back-end for different services, where the front-end has nothing much more than a list of requests, a wizard to go from zero to a completed request, validates data, persists it, calls a webservice to get a protocol number, and in a couple cases calls an external payment gateway.
The back-end is used to manage requests and performs similar operations.
Every app has maybe around 20 entities with respective controllers, services, and views, and on top of that we have a few classes to handle security w/ Spring Security and an external infrastructure.
That's how it is. You have basically two options.
Migrate on Java 8 (see http://www.infoq.com/articles/Java-PERMGEN-Removed)
Increase the PermGen space further.
And a quick background. Unlike regular Java with Spring, Groovy and Grails generate quite a lot of classes in the runtime (GSPs being one example). Groovy also generate huge amount of classes itself - each closure is a class. All this put pressure on the permgen.
To ease off the pressure get rid of all unnecessary plugins, consolidate GSPs, rethink closures, use AOP only when absolutely needed etc.
We used to have similar problems, so our team started using one tomcat per app. We also separated credentials from security purposes. Now it's easier to manage them, monitor logs and make periodical updates.
Hint: it's easier (imho and cleaner) to train your admin in creating users, home_dirs with tomcats instances and just providing credentials.

Ruby on Rails on few servers

I have a big application. One of the part of this is highload processing with user files. I decide to provide for this one dedicate server. There will be nginx for distribution content and some programs (non rails) for processing files.
I have two question:
What better to use on this server? (Rails or something else, maybe Sinatra)
If I'll use Rails how to deploy? I can't find any instruction. If I have one app and two servers how to deploy it and delegate task for each other?
ps I need to authorize user on both servers. In Rails I use Devise.
You can use Rails for this. If both servers will act as a web client to the end user then you'll need some sort of load balancer in front of the two servers. HAProxy does a great job on this.
As far as getting the two applications to communicate with each other, this will be less trivial than you may think. What you should do is use a locking mechanism on performing the tasks. Delayed_job by default will lock a job in the queue so that any other works will not try and work on the same job. You can use callbacks from ActiveJob to notify the user via web sockets whenever their job is completed.
Anything that will take time or calling an external API should usually be placed into a background processing queue so that you're not holding up the user.
If you cannot spin up more than the two servers, you should make one of them the master or at least have some clear roles of the two servers. For example, one server may be your background processing and memcache server while the other is storing your database and handles your web sockets.
There are a lot of different ways of configuring the services and anything including and beyond what I've mentioned is opinionated.
Having separate servers for handling tasks is my preference as it makes them easier to manage from a Sys Admin perspective. For example, if we find that our web sockets server is hammered, we can simply spin up a few more web socket servers and throw them into a load balancer pool. The end user would not be negatively impacted from your networking changes. Whereas, if you have your servers performing dual roles outside of your standard Rails installation, you may find yourself cloning and wasting resources. Each of my web servers usually also perform background tasks on low-intermediate priority queues while a dedicated server is left for handling mission critical jobs.

Rails best practice: background process/thread?

I'm coming from a PHP environment (at least in terms of web dev) and into the beautiful world of Ruby, so I may have some dumb questions. I imagine there are some fundamentally different options available when not using PHP.
In PHP, we use memcache to store alerts we want to display in a bar along the top of the page. When something happens that generates an alert (such as a new blog post being made), a cron script that runs once every 5 minutes or so puts that information into memcache.
Now when a user visits the site, we look in memcache to find any alerts that they haven't already dismissed and we display them.
What I'm guessing I can do differently in Rails, is to by-pass the need for a cron script, and also the need to look in memcache on every request, by using a Singleton and a polling process running in a separate thread to copy from memcache to this singleton. This would, in theory, be more optimized than checking memcache once-per-request and also encapsulate the polling logic into one place, rather than being split between a cron task and the lookup logic.
My question is: are there any caveats to having some sort of runloop in the background while a Rails app is running? I understand the implications of multithreading, from Objective-C/Java, but I'm asking specifically about the Rails (3) environment.
Basically something like:
class SiteAlertsMap < Hash
include Singleton
def initialize
super
begin_polling
end
# ... SNIP, any specific methods etc ...
private
def begin_polling
# Create some other Thread here, which polls at set intervals
end
end
This leads me into a similar question. We push (encrypted) tasks onto an SQS queue, for things related to e-commerce and for long-running background tasks. We don't use cron for this, but rather we have a worker daemon written in PHP, which runs in the background. Right now when we deploy, we have to shut down this worker and start it again from the new code-base. In Rails, could I somehow have this process start and stop with the rails server (unicorn) itself? I don't think that's something I'd running on the main process in a separate thread, since we often want to control it as a process by itself, but it would be nice if it just conveniently ran when the web application was running.
Threading for background processes in ruby would be a terrible mistake, especially since you're using a multi-process server. Using unicorn with say 4 worker processes would mean that you'd be polling from each of them, which is not what you want. Ruby doesn't really have real threads, it has green threads in 1.8 and a global interpreter lock in 1.9 IIRC. Many gems and libraries are also obnoxiously unthreadsafe.
Using memcache is still your best option and, if you have it set up correctly, you should only see it adding a millisecond or two to the request time. Another option which would give you the benefit of persisting these alerts while incurring minimal additional overhead would be to store these alerts in redis. This would better protect you against things like memcache crashing or server reboots.
For the background jobs you should use a similar approach to what you have now, but there are several off the shelf handlers for this like resque, delayed_job, and a few others. If you absolutely have to use SQS as the backend queue, you might be able to find some code to help you, but otherwise you could write it yourself. This still requires the other daemon to be rebooted whenever there is a code change. In practice this isn't a huge concern as best practices dictate using a deployment system like capistrano where a rule can easily be added to bounce the daemon on deploy. I use monit to watch the daemon process, so restarting it is as easy as telling monit to restart it.
In general, Ruby is not like Java/Objective-C when it comes to threads. It follows the more Unix-like model of process based isolation, but the community has come up with best practices and ways to make this less painful than in other languages. Ruby does require a bit more attention to setting up its stack as it is not as simple as enabling mod_php and copying some files around, but once the choices and architecture is understood, it is easier to reason about how your application works. The process model, in my opinion, is much better for web apps as it isolates code and state from the effects of other running operations. The isolation also makes the app easier to work with in a distributed system.

Resources