How to use a scheduler(cron) container to execute commands in other containers - docker

I've spent a fair amount of time researching and I've not found a solution to my problem that I'm comfortable with. My app is working in a dockerized environment:
one container for the database;
one or more containers for the APP itself. Each container holds a specific version of the APP.
It's a multi-tenant application, so each client (or tenant) may be related to only one version at a time (migration should be handle per client, but that's not relevant).
The problem is I would like to have another container to handle scheduling jobs, like sending e-mails, processing some data, etc. The scheduler would then execute commands in app's containers. Projects like Ofelia offer a great promise but I would have to know the container to execute the command ahead of time. That's not possible because I need to go to the database container to discover which version the client is in, to figure it out what container the command should be executed in.
Is there a tool to help me here? Should I change the structure somehow? Any tips would be welcome.
Thanks.

So your question is you want to get the APP's version info in the database container before scheduling jobs,right?
I think this is relate to the business, not the dockerized environment,you may have ways to slove the problem:
Check the network ,make sure the network of the container can connect to each other
I think the database should support RPC function,you can use it to get the version data
You can use some RPC supported tools,like SSH

Related

Should I use a Container/Service Fabric Guest Executable for a scheduled daily workload?

This is a more general question about which types of payloads to host in a Container. In our case we will use Service Fabric guest executables. For this post I will only use the word Container to refer to both. The reason I do this is they have similar properties and think more people may understand a container than a SF Guest Exe.
WebAPIs/Services that needs to scale are a good fit for containers, but this question is related to what we call a "Batch" job. This nomenclature comes out of the old .bat files, but in our case we are using a .NET Framework or Core .exe (console apps).
Currently Windows Task Scheduler kicks off the batch running under a service account on a VM. We want the processing to happen on a certain time of day or day of the week and not before or after. There is not any real scaling here. There is one instance which may or may not be multithreaded and on average they generally run between 2-15 minutes and then stop. Some run longer some run shorter. I understand there are limitations to this approach but this is the type of payload I'm discussing here.
As we modernize the Technology stack we are looking to use the Orchestrator as much as possible. As a technologist I've always tried to understand the different tools in our tool belts and not use a tool just because that's the one I used last, instead use the correct tool for the task.
We started out by not writing any more .net console apps. Instead we put the business logic of these "batches" into WebApi's. Then having the task scheduler call the API when it needed to perform its action. If I put this into Service Fabric and host it my concern is that the system resources are consumed for 23 hours and 45 minutes a day when they are not being used. That seems to be opposite of what you would expect when using a container.
Now if I could spin up a Service Fabric Guest Exe/Container on demand and then after it finishes destroy the instance of the app that could fit the need. Then I could have the benefits of the orchestrator without the determent of having it consume resources all the time. I would hope to retire the Batch Server (VM) as the hardware is usage is not optimized and instead add resources to the cluster.
UPDATE
Looking at Vaclav's Scalability Doco I think there might be a use case in here? https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-concepts-scalability He uses a "Workload Manager Service" combined with CreateServiceAsync, to spin up an instance of the service on demand. I guess I would deploy the app to the image store but not create an instance of the app until needed. Then I need to figure out how to end it, is it as simple as changing the infinite loop in Program.cs? The thing is it doesn't look like there is a Program.cs in a Guest Executable.
This looks like a way to run a package until completion, which was releases as part of 7.1. But how do we start a second execution of the service? I want to execute based on a request coming in.
https://learn.microsoft.com/en-us/azure/service-fabric/run-to-completion
Thoughts?

Wait after Docker Compose cluster is up

I'm using Docker Compose for integration tests. After starting a cluster with Docker Compose I need to wait for some time until the application and its cache is up. I see ways to make one container wait before starting another but is there a way to make the whole set up wait?
Thank you in advance!
You need to work out what you mean by the "cluster is up". Docker doesn't really care too much about what the application inside each container is doing, so long as it doesn't terminate.
If you need to wait for some state transitions inside the containers, you'll need to manage this at the application level - for instance, you could write to a file on a filesystem exposed from the container, you could HTTP POST a message somewhere, etc, etc. Then pick up that message and use that to start your integration test. I'd strongly consider reusing whatever you're using for your monitoring infrastructure, as this is effectively the same problem.

How does Celery discover new Nodes?

I'm running Celery and RabbitMQ Gunicorn in Docker.
My question is this: I understand that Celery is designed for distributed processing. What I have see no docs on at all is, assuming that I have several machines/nodes on the same LAN, how do they discover each other? Does RabbitMQ play a role? Do celery instances somehow discover each other? Is there a list of suitable hosts somewhere? If so, how do I edit it?
Also, assuming I'm going to use only one node to handle the HTTP requests, do I still need to have gunicorn running on all nodes? I ask this because in the gunicorn start command, it has a setting for the number of workers. And, is this setting applicable only to that node, or as a max total for all connected nodes?
EDIT:
After the first answer, I started working on this. It seems that I need some sort of networking setup, either swarm or bridging etc. I should clarify that I'm using docker-compose to bring up the solution, and I see that a normal swarm setup doesn't work, and I have to use something slightly different if I go that route.
To be clear: I need a way in which I can add celery workers on separate hosts and have them be able to communicate with the "main" host so that I can increase the capacity of the system. If someone could provide a clear process for achieving this or a link to such, it'd be most helpful.
I hope I've expressed this clearly, please let me know if you need any further info.
Thanks!
I feel like #ffledgling didn't fully answer the question so I am adding a note:
Here is a list of all events sent by the worker to the broker (in your case RabbitMq): http://docs.celeryproject.org/en/latest/userguide/monitoring.html#event-reference
As you can see, there are few worker self-related messages/events:
worker-online
worker-heartbeat
worker-offline
All of them contain a signature of the hostname. Therefore a successful handshake flow (not exactly handshake because master doesn't respond with message but using it as a metaphor here) may look like this:
>
new worker online --> worker send worker-online message to the queue --> master received and start to read logs from worker host --> master schedule tasks --> ...
Beyond that, host name is a standard body field in every event (both task and worker self-related), here is the documentation: http://docs.celeryproject.org/en/latest/internals/protocol.html?highlight=event%20reference#standard-body-fields
For example, if you look at task-started event: it also contains a hostname as signature, this is how the master knows who picked up the task and where to read the log of the task from.
I understand that Celery is designed for distributed processing. What
I have see no docs on at all is, assuming that I have several
machines/nodes on the same LAN, how do they discover each other? Does
RabbitMQ play a role? Do celery instances somehow discover each other?
Is there a list of suitable hosts somewhere? If so, how do I edit it?
Celery is a distributed task queue that works using a message brokering system such as RabbitMQ.
What essentially happens all celery workers connect a shared Queue such as RabbitMQ. The master(s) dispatch work by pushing it onto the queue. Workers who are connected to the Queue as well, pull work off of the queue and then attempt to execute it. Once it is finished (successfully or otherwise), it will push the results back onto the Queue, which the master(s) can then query.
Given this architecture, you do not need to add a list of hosts, they "auto-detect" work. You simply need to start them up and ensure they can talk to the Queue.
A slightly more detailed explanation from another SO answer.
Link to the architecture with a diagram.
Also, assuming I'm going to use only one node to handle the HTTP
requests, do I still need to have gunicorn running on all nodes? I ask
this because in the gunicorn start command, it has a setting for the
number of workers. And, is this setting applicable only to that node,
or as a max total for all connected nodes?
No, you do not need guicorn running on all the nodes, just the one you're using to serve HTTP requests via python. Celery workers do not need guicorn. The worker setting in guicorn refers to the number of workers in the HTTP listeners pool. This is separate, independent and unrelted to the set of workers that celery uses.

Can I safely run multiple instances of the same Windows Service?

I have a windows service running on a server. It's a 'helper app' for a system we have and it does a very specific task (downloads html files based on the config) with a specific database configuration.
We're now developing a system that's very similar to the existing system (at least on the face of it, where this service has an impact). I need to have a different instance of the service to run the same server with a different database configuration, so it can do its task for the new system, as well as the existing system.
Can somebody tell me if it's going to cause problems if I install a second instance of the same service on the same box?
I intend to install the service from a different directory from where the original is installed.
It turns out I couldn't. I needed to give each instance of the service a unique name. Not a big deal though.
This won't be a problem at all as long as the program itself does not do things in a way that would cause the various instances to conflict - like trying to write to the same files at the same time, or the like. As long as each is configured/coded to keep to itself, it will be fine.

How can I make some external code run before a Windows service starts

This probably sounds crazy, but it's a real problem: I have an ISV-provided Windows service that I cannot change. There's a bug in the service where it doesn't "clean up" some data that it should upon startup.
As a workaround, until the vendor can fix the bug, I would like to cause another process or script to always run just before this problem service starts.
For example: I could create a second "monitor" service that is tied to the problem service with a service dependency. The second service would perform this workaround/cleanup before the problem service is allowed to start. But that seems like a sledgehammer of a solution to a simple problem. Anyone else have ideas for a simpler solution?
The workaround code is trivial and could live, for example, in a PowerShell script.
Create a new service that does what you need, then force a dependency on it.
You should check out our Service Protector application which can run a pre-startup script before starting another service. It too may be overkill, but sometimes it is better to purchase a targeted utility rather than investing your programming time in a one-off/throw-away solution.
In any case, your solution of writing another service and enforce a dependency should do the trick, provided that your new service does not declare itself as "Started" until after it has completed its cleanup work. If not, Windows may start your real service too soon.
Good luck.

Resources