I can't quite explain the problem, because I myself do not understand it. I'd appreciate getting help with defining/locating/dealing with the issue.
The Setup
I have a Win10 VM having tests run on it, and a Jenkins VM (Windows Server 2008) running those tests on it.
I am using a testing app called JSystem. Sadly, it does not support Windows 10 officially, as it uses Telnet to communicate with target SUTs (which was removed from Windows 10), so I had to create a way to use WinRM to communicate with that type of VM.
The Problem
The gist of it, is that at some point in time, the test on Jenkins just 'freezes'. The connection is still on 'established' state, the VM (host and client) are still working. It does not happen every time, and it might happen a few minutes after the testing started, or a couple of hours. The test that causes is is almost never the same, but naturally it happens when there's some form of communication between the SUT and the testing VM. It can be file transfer, or a simple command like "dir". It can happen during the request for the command to happen, or when sending the result back.
More Information
I did gather some more information that might help.
I did not see it happen when I try to run the test from my own development environment (that is, not using Jenkins as a medium) - However, it might've been because I was unlucky and did not try enough. My own environment is a Windows 10 as well, and not a VM.
Looking at the event viewer on the SUT, there was a warning "Time-Service" event ID 50, an NTPClient time sync issue one minute after the freeze happened. However, the Jenkins VM had no events at all. That said, the event repeats itself a lot on the SUT and it does not always freeze the test, but it's possible it causes interference if it happens during a communication attempt between the VMs.
I can still connect to the SUT with WinRM just fine with other sources, and it responds as well.
Rather than frozen, it's more like SUT is waiting for a request from Jenkins, and Jenkins is waiting for a response from the SUT. The weird thing, however, is that normally these tests have a timeout of 30-60 seconds, it should not wait longer than that (unless configured otherwise in the test, of course) before failing the test step.
I can't be sure if this has anything to do with it, but I do have time sync issues between VMs. I've asked in another question about how to solve it, so if that's the issue in your opinion, please let me know, especially if you have a solution.
What is a good way to approach this?
Related
Do server-less functions install modules every time they are called?
I am trying to understand how serverless functions really work. I understand that serverfull is basically a computer that executes code and that the server code runs on it 24/7 unless it is stopped for some reason. On the other hand I understand serverless code just runs when it is called. Where is this code stored? When I call a function in a serverless application does it install the modules (for example from npm) everytime I call the function? Is this what causes cold-start delays?
I understand that serverfull is like my computer running code. How can I describe serverless using the same analogy?
My Questions:
Do server-less functions install modules every time they are called?
If there is no server, where is this code stored in serverless?
I understand that serverfull is like my computer running code. How can I describe serverless using the same analogy?
No, the dependencies are a part of the deployment artifact (e.g. a ZIP file or container image in the case of AWS Lambda), so they do not have to be installed on each invocation.
I understand that serverfull is like my computer running code. How can I describe serverless using the same analogy?
That's not going to be a perfect explanation, but hopefully, it fits your analogy. Imagine that your computer is sleeping, but there's another computer that can receive requests and wake up your computer whenever it receives a new one, so it can be run on your computer. After it finishes running, it goes back to sleep. But instead of a single computer, there are many of them that can be brought from sleep in a matter of milliseconds. Hope that makes sense.
I have a windows service running using Topshelf. This service makes a lot of SQL server queries. When the hosting computer is restarted it almost always causes errors in my service due to SQL Server stopping in the middle of my service making a query. I've been asked to solve this so the logs won't have so many errors as these computers are restarted frequently.
Topshelf has some built-in WhenShutdown logic that you can use to run when the computer is shutdown/restarted, but there is still no guarantee that my service will stop before SQL Server, and based on the error frequency it pretty much always happens that way. I have tried to also use Topshelfs WhenCustomCommandReceived to listen for windows PreShutdown event as shown here, but my tests when logging any custom command received and then rebooting my computer shows no logs. I also tried adding SQL Server as a dependency to my service, but this still doesn't guarantee mine will stop before SQL Server.
I have also tried adding in the logic from this solution, but again I never see any logs indicating this code is even being executed on a restart. Any tips on how I can better solve this issue?
tldr: how to ensure my topshelf service stops before SQL server on a computer restart/shutdown
Thanks!
I am using Jenkins as CI from the past one year. But sometimes the service does not start automatically even it was configured as "Automatic" in windows services. And there was no any error or warning logged in event viewer. It is very crazy why it happens like this. Is there any configuration to be done to avoid such case?
I ran into a very similar situation and was only able to get a successful restart after opening up connectivity to http://178.255.83.1:80. This seems to be a OCSP server, so the traffic seems legitimate.
I'm not sure if it's actually the issue you were having, but it might be something to look at.
I've been helplessly observing this problem for a couple months now, and have decided this is my best shot.
I'm not sure what the cause of the problem is, but I can list some of the things I'm doing. I have an iOS app that uses AFNetworking to connect to a remote server hosted by Google App Engine using HTTP POST requests.
Now, everything works great, but sometimes, very very sporadically and random, I get failed requests. The activity indicator spins and spins for about a minute, and I get no feedback at the end - just a failed request. I check my server logs, and I don't see any errors. After the failed request, I try again, and it works fine. It works fine for the whole day. And then another time randomly the issue repeats itself, sometimes spinning for 10 seconds with a fail, or a minute.
Generally, what can possibly be the cause of this? Is it normal to have some failed connections randomly? Is that something on my part?
But the weird thing is, is that while on my iPhone the app is running, and the indicator is spinning, and it's trying to connect, I try connecting on the iOS simulator, and the connection works just fine. I try again on the iPhone, and it doesn't work.
If I close the app completely and start again, then it works again. So it sounds like it may be a software issue rather than connection issue, but then again I have no evidence or data what so ever.
I know it's vague, but I'm hoping someone may have had a similar problem. Anything helps.
There is a known issue with instance start on GAE for Java. You can star http://code.google.com/p/googleappengine/issues/detail?id=7706 issue.
The same problem was reported for Python but it is not such a big problem.
I think you should check logging level you use on appengine and monitor all your calls. Instance start usually takes more time, so you will be able to see how much time do you use on start and is it really a timeout problem.
For Java version you could try to change log level to debug:
.level = DEBUG
in your logging.properties file. It will give you more information about instance start process.
I have a quite big application, running from inside spree extension. Now the issue is, all requests are very slow even locally. I am getting messages like 'Waiting for localhost" or "waiting for server" in my browser status bar for 3 - 4 seconds for each request issued, before it starts execution. I can see execution time logged in log file is quite good. But overall response time is poor because of initial delay. So please suggest me, where can I start looking into improving this situation?
One possible root cause for this kind of problem is that initial DNS name resolution is failing before eventually resolving. You can check if this is the case using tcpdump (if that's available for your platform) or wireshark. Look for taffic to and from your client host on port 53 and see if the name responses are happening in a timely fashion.
If it turns out that this is the problem then you need to make sure that the client is configured such that the first resolver it trys knows about your server addresses (I'm guessing these are local LAN addresses that are failing). Different platforms have different ways of configuring this. A quick hack would be to put the address of your server in the client's hosts file to see if that fixes it.
Once you send in your request, you will see 'waiting for host' right up until the Ruby work is done, and it starts sending a response. So, if there is pretty much any processing work that is slowing you down, you'd see this error. What you'd want to do is start looking at the functions that youre seeing the behaviour on, and breaking them down into pieces to see which peices are slow. If EVERYTHING is slow, than you need to look at the things that are common to every function - before functions, or Application Controller code, or something similar. What I do, when I'm just playing around to see what I need to fix is just put 'puts' statements in my code at different stages, to print the current time, then I can see which stage is taking a long time, you know?