Manually reset Windows Service Failure Count - windows-services

I configured my Windows Service recovery as following:
List item
First failure: Restart the Service
Second failure: Restart the Service
Subsequent failures: Take No Action
Reset fail count after: 1 days
Restart service after: 1 minutes
Now I would like to reset the failure count after the third failure so that when an admin manually restarts the service, the service can be automatically recovered anew.
So far, the only solution I found is to reset the counter is uninstalling / reinstalling the service which I find not clean. Any idea how to do that?

The service control manager counts the number of times each service has failed since the system booted. The count is reset to 0 if the service has not failed for dwResetPeriod seconds.
Calling ChangeServiceConfig2 with the dwResetPeriod of SERVICE_FAILURE_ACTIONS set to 0 will reset the count.
You will need to query (see QueryServiceConfig2) the original setting of dwResetPeriod, set it to zero then reset it back to the original to preserve the state.

Related

AWS ECS Fargate CannotPullContainerError: ref pull has been retried failed to copy: httpReadSeeker: failed open: unexpected status code

From ECS console I started to see this issue
I think I understand pretty well the cause of this. ECS pulls these images as an anonymous user, since I haven't provided any credentials, and since the task was set to run every 5 minutes it was hitting the anonymous quota. No big deal, I set the task to run every 10 minutes and for now the problem is solved.
What drives me nuts is:
From CloudWatch console you can see that the task was executed. If I graph the number of executions I will see a chart with data points every 5 minutes. This is wrong to me, because in order to execute the task ECS first needs to pull something, and it can't, therefore there is no execution.
I can't find the error (CannotPullContainerError: ref pull has been retried 1 time(s): failed to copy: httpReadSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/amazon/aws-for-fluent-bit/manifests/sha256:f134723f733ba4f3489c742dd8cbd0ade7eba46c75d574...) in any CloudWatch log stream.
I need to be able to monitor for this error. I spot checked it because I found it by chance since I was looking into the console. How can I monitor for this?
FYI, I use Data Dog too for log ingestion. Maybe that can help?

In what cases does Google Cloud Run respond with "The request failed because the HTTP connection to the instance had an error."?

We've been running Google Cloud Run for a little over a month now and noticed that we periodically have cloud run instances that simply fail with:
The request failed because the HTTP connection to the instance had an error.
This message is nearly always* proceeded by the following message (those are the only messages in the log):
This request caused a new container instance to be started and may thus take longer and use more CPU than a typical request.
* I cannot find, nor recall, a case where that isn't true, but I have not done an exhaustive search.
A few things that may be of importance:
Our concurrency level is set to 1 because our requests can take up to the maximum amount of memory available, 2GB.
We have received errors that we've exceeded the maximum memory, but we've dialed back our usage to obviate that issue.
This message appears to occur shortly after 30 seconds (e.g., 32, 35) and our timeout is set to 75 seconds.
In my case, this error was always thrown after 120 seconds from receiving the request. I figured out the issue that Node 12 default request timeout is 120 seconds. So If you are using Node server you either can change the default timeout or update Node version to 13 as they removed the default timeout https://github.com/nodejs/node/pull/27558.
If your logs didn't catch anything useful, most probably the instance crashes because you run heavy CPU tasks. A mention about this can be found on the Google Issue Tracker:
A common cause for 503 errors on Cloud Run would be when requests use
a lot of CPU and as the container is out of resources it is unable to
process some requests
For me the issue got resolved by upgrading node "FROM node:13.10.1 AS build" to "FROM node:14.10.1 AS build" in docker file it got resolved by upgarding the node.

Incomplete GCLOUD SQL UPDATE operation upon STORAGE FULL is preventing any further operations including START

Stopped SQL service before realizing storage was full.
The UPDATE OPERATION during stopping did not cleanly complete.
Now any further operations, including START and attempting to change storage size give only the error:
"Operation failed because another operation was already in progress."
Tried from both web cloud console and gcloud command line. Same error on both.
How can I clear this incomplete UPDATE OPERATION so I can then increase storage size and start the SQL server?
It's not a perfect solution, but apparently the OPERATION does eventually complete after 2 to 3 hours. And this continues, every 2 to 3 hours, until an operation which increases the storage size, which again takes 2 to 3 hours to complete, and then the interface works normally again.
Posting this in case someone else runs into this same problem. There may still be a better solution, but giving it a lot of time does seem to work.

Error with exit value: {function_clause,[{inet,tcp_close,[[]]},{}]}, I am getting this error in my server when client runs more than 5 minutes?

I am spawning my server each time whenever a client connects. But if the client is running considerable time with server i.e multiple times, regularly I am getting the *" Error in process <0.111.0> with exit value: {function_clause,[{inet,tcp_close,[[]]},{run_server,function,8}]}".* I think it is from inet options.... please any one can give some idea to overcome the error.
You are passing in an empty list to inet:tcp_close/1 instead of a port.

IMAP Idle Timeout

Lets say I am using IMAP IDLE to monitor changes in a mail folder.
The IMAP spec says that IDLE connections should only stay alive for 30 minutes max, but it is recommended that a lower number of minutes is selected - say 20 minutes, then cancel the idle and restart.
I am wondering what would happen if the mail contents changed between the idle canceling, and the new idle being created. An email could potentially be missed. Given that RECENT is a bit vague, this could lead to getting a message list before the old idle ends, and a new idle starts.
But this is almost the same as polling every 20 minutes, and defeats some of the benefit of idle.
Alternatively, a new idle session could be started prior to terminating the expiring one.
But in any case, I think this problem has already been solved so here I am asking for recommendations.
Thanks,
Paul
As you know, the purpose of IMAP IDLE command (RFC 2177) is to make it possible to have the server transmit status updates to the client in real time. In this context, status updates means untagged IMAP server responses such as EXISTS, RECENT, FETCH or EXPUNGE that are sent when new messages arrive, message status is updated or a message is removed.
However, these IMAP status updates can be returned by any IMAP command, not just the IDLE command - for example, the NOOP command (see RFC 3501 section 6.1.2) can be used to poll for server updates as well (it predates the IDLE command). IDLE only makes it possible to get these updates more efficiently - if you don't use IDLE command, server updates will simply be sent by the server when the client executes another command (or even when no command is in progress in some cases) - see RFC 3501 section 5.2 and 5.3 for details.
This means that if a message is changed between the IDLE canceling and the new IDLE command, the status updates should not be lost, just as they are not lost if you never used IDLE in the first place (and use NOOP every few seconds instead, for example) - they should simply be sent after the new IDLE command is started.
Another approach would be to remember last highest uid of the folder being monitored. Whenever you think there is chance that you missed update. Do a search as follows :*

Resources