Restarting windows service within itself - windows-services

I have a windows service that invokes some heavy image processing whenever a user sends some data to it. So if there are more than one data, the data is queued up and is processed in order. however sometimes processing the data may go for a toss, and the processing hangs in there forever. Not sure yet why that happens. When this happens I want to restart the serivice by itself, so that when the service restarts next one from the queue is picked up. My question is is it a good idea to restart the service within itself? can you even do that or is there any other way to do it?
Sapna

As Oded said in his comment, if the service has hung, it can't restart itself. It would be best if you could figure out why it hangs and just stop it from hanging altogether, but assuming that that's not possible for some reason.
The two options I can think of would be if the image processing is done in a thread, and it's only that thread that hangs, then you might be able to have a separate "monitoring" thread that keeps checking if the processing thread is still happy and otherwise it kills it and restarts it. Or, if the whole service hangs, you could have a separate monitoring service, that does the checking and restarting.

you have three tasks:
detect that service is stuck:
this can be done in different ways, the first one would be to use timeouts
restart the service:
can be done by separate monitoring service or by another thread of the same service
handle task queue between different service instances:
you need to serialize your task queue to disk so when service is restarted it can continue handling the queue

Related

Delay on requests from Google API Gateway to Cloud Run

I'm currently seeing delays of 2-3 seconds on my first requests coming into our APIs.
We've set the min instances to 1 to prevent cold start but this a delay is still occurring.
If I check the metrics I don't see any startup latencies in the specified timeframe so I have no insights in what is causing these delays. Tracing gives the following:
The only thing I can change, is switching to "CPU is always allocated" but this isn't helping in any way.
Can somebody give more information on this?
As mentioned in the Answer :
As per doc :
Idle instances As traffic fluctuates, Cloud Run attempts to reduce the
chance of cold starts by keeping some idle instances around to handle
spikes in traffic. For example, when a container instance has finished
handling requests, it might remain idle for a period of time in case
another request needs to be handled.
Cloud Run But, Cloud Run will terminate unused containers after some
time if no requests need to be handled. This means a cold start can
still occur. Container instances are scaled as needed, and it will
initialize the execution environment completely. While you can keep
idle instances permanently available using the min-instance setting,
this incurs cost even when the service is not actively serving
requests.
So, let’s say you want to minimize both cost and response time latency
during a possible cold start. You don’t want to set a minimum number
of idle instances, but you also know any additional computation needed
upon container startup before it can start listening to requests means
longer load times and latency.
Cloud Run container startup There are a few tricks you can do to
optimize your service for container startup times. The goal here is to
minimize the latency that delays a container instance from serving
requests. But first, let’s review the Cloud Run container startup
routine.
When Starting the service
Starting the container
Running the entrypoint command to start your server
Checking for the open service port
You want to tune your service to minimize the time needed for step 1a.
Let’s walk through 3 ways to optimize your service for Cloud Run
response times.
1. Create a leaner service
2. Use a leaner base image
3. Use global variables
As mentioned in the Documentation :
Background activity is anything that happens after your HTTP response
has been delivered. To determine whether there is background activity
in your service that is not readily apparent, check your logs for
anything that is logged after the entry for the HTTP request.
Avoid background activities if CPU is allocated only during request processing
If you need to set your service to allocate CPU only during request
processing, when the Cloud Run service finishes handling a
request, the container instance's access to CPU will be disabled or
severely limited. You should not start background threads or routines
that run outside the scope of the request handlers if you use this
type of CPU allocation. Review your code to make sure all asynchronous
operations finish before you deliver your response.
Running background threads with this kind of CPU allocation can create
unpredictable behavior because any subsequent request to the same
container instance resumes any suspended background activity.
As mentioned in the Thread reason could be that all the operations you performed have happened after the response is sent.
According to the docs the CPU is allocated only during the request processing by default so the only thing you have to change is to enable CPU allocation for background activities.
You can refer to the documentation for more information related to the steps to optimize Cloud Run response times.
You can also have a look on the blog related to use of Google API Gateway with Cloud Run.

Close applications safely from a service

I have 2 programs in Delphi - a service and some child processes that may run in any user session (these start when the service starts and should be closed when the service stops).
When the service stops I have to close the child applications safely, to make them catch formClose/FormDestroy events.
The service cannot use desktop communication, so it cannot send WMs like WM_Close, etc., to those processes.
Calling TerminateProcess does not make formClose/FormDestroy events occur in my child processes ...
So, what method of child process termination may be used here?
Currently, the only idea we have is to run taskkill.exe /im process.exe in each user session - it somehow makes killed process to run formClose/FormDestroy. How does it work? Just by sending WM_CLOSE?
The best solution is some simple IPC. In this case, all you really need is a global manual-reset event object, as IInspectable already suggested.
However, if you aren't allowed to do it the right way, you could instead launch another child process to send window messages to the application(s) you want to close.

Sidekiq not adding jobs to queue

Sometime ago I wrote a small Ruby application which uses Sidekiq to convert video files and pushes them further to few online video hosting services. I use two Workers and Queues, one to actually convert file and second to publish converted files. Jobs are pushed to first Queue by Rails application for conversion, and after successful processing Conversion Worker pushes Upload job to second queue.
Rails -> Converter Queue -> Uploader Queue
Recently I discover a massive memory leak in converter library which appears after every few jobs and overloads whole server, so I did a little hack to avoid this by stopping whole Sidekiq Worker process using Interrupt exception and starting it again by Systemd.
It works perfectly until yesterday. I get notification from my client that files are not converted. I did some investigation to find out whats failing and found that jobs are not added to Converter queue. It starts failing without any changes in code or services. When Rails adds jobs to Sidekiq Queue it receives proper Job ID, no exception or warning at all, but the job simply not appears in any Queue. I checked Redis logs, Systemd logs, dmesg, every logs that i could check and did not find even the slightest warning - it seems that jobs get lost in vacuum :/ In fact, after more digging and debugging I discover that if one job is pushed rapidly ( 100 times in a loop ), then there is a chance that Sidekiq will add job to Queue. Of course, sometimes it will add all jobs, and sometimes not even single one.
The second Queue works perfectly - it picks every single job that I add to it. When I try to add 1000 new jobs, second Queue queues them all, when Converter queue gets at best 10 jobs. Things gets really weird when I try to use another Queue - I pushed 100 jobs to a new Queue, of course all of them are added properly and then I instruct Conversion worker to use that new Queue. And it works - I can add new Jobs to that Queue and it seems that all of them are pushed successfully - but when Worker finish processing all jobs that were pushed before that Worker was assigned to this Queue it starts to failing again. Disabling code that restarts Worker after every job didn't help at all.
Funny thing is that in fact jobs are pushed to Queue but only when I pushes them multiple times, and it seems totally random when Job is added properly. This bugs appears from nowhere, for few months things works perfectly and recently starts failing without any changes in code or server. Logs are perfectly clear, Sidekiq is used with the same Redis server without any problems by few other applications - it seems that only this particular Worker have this problem. I did not found any references to similar bug on the web and I spent two days trying to debug this and find source of this weird behavior, and I found nothing, everything seems to work perfectly and Jobs are simply disappearing somewhere between push and Redis database.

Rails - passing data from background job to main thread

I am using in my app a background job system (Sidekiq) to manage some heavy job that should not block the UI.
I would like to transmit data from the background job to the main thread when the job is finished, e.g. the status of the job or the data done by the job.
At this moment I use Redis as middleware between the main thread and the background jobs. It store data, status,... of the background jobs so the main thread can read what it happens behind.
My question is: is this a good practice to manage data between the scheduled job and the main thread (using Redis or a key-value cache)? There are others procedures? Which is best and why?
Redis pub/sub are thing you are looking for.
You just subscribe main thread using subscribe command on channel, in which worker will announce job status using publish command.
As you already have Redis inside your environment, you don't need anything else to start.
Here are two other options that I have used in the past:
Unix sockets. This was extremely fiddly, creating and closing connections was a nuisance, but it does work. Also dealing with cleaning up sockets and interacting with the file system is a bit involved. Would not recommend.
Standard RDBMS. This is very easy to implement, and made sense for my use case, since the heavy job was associated with a specific model, so the status of the process could be stored in columns on that table. It also means that you only have one store to worry about in terms of consistency.
I have used memcached aswell, which does the same thing as Redis, here's a discussion comparing their features if you're interested. I found this to work well.
If Redis is working for you then I would stick with it. As far as I can see it is a reasonable solution to this problem. The only things that might cause issues are generating unique keys (probably not that hard), and also making sure that unused cache entries are cleaned up.

Getting a worker to change queues for delayed_jobs

Ok.. I have an issue with worker management.
I have 30/n clients..
Each with a pile of things/jobs to get done.
Each with their own "schema" in postgresqu.
Non should ever block another..
So my thinking was to have a queue for each..
But then I have the problem how do I handle queues.. A worker for say 10 queues would have the same problem.. He'd not get to client 2 if he was working on client 1's stuff.
A worker for each queue.. Okay.. that's expensive.. We could create a worker per client and leave it running at all times (we got piles of dough, just like everyone else.)
So workers on the fly seems the best option.
But then we've got the build up and tear down issue, and scheduling is also a right pain.
A suggestion was put to have a single worker doing nothing but starting and stopping workers.
The issue I have is that I don't want to have the build up and tear down..
So here's my thinking.. I might have a worker on Q_A who's only got one job left for client A.. Could I switch his que.. make him work on Q_B stuff?
I was (for a second) thinking of switching the queue that a job was assigned to, to the existing workers queue, but then new stuff for Q_A would be behind that..
Any ideas? Alternatives to switching a workers Q would be much appreciated.

Resources