We have found extreme memory use of an instance of com.vaadin.data.provider.DataCommunicator
Version: 8.5.2
Application is websocket event driven UI.
It appears that this happens sometimes when user suspends computer and vaadin hasn't destroyed the session yet.
Heap dump shows that atmosphere connection is still in CONNECTED state.
Vaadin heartbeats are 60 seconds.
Please advise what other info might be useful. This doesn't happen consistently enough to reproduce.
Related
I am totally new in web programming... Now I am working on an already implemented ASP.NET MVC application which is deployed in IIS. This app is bound to an application pool which has only one worker process. At this moment, I am trying to understand what happens if the worker process freezes/hangs due to an uncontrolled exception thrown by app code. So may someone explain me it?
What we have observed is that when this happens, application stops working correctly and we need to restart its application pool in order to app begins to work correctly again. After observing this behavior, I have a doubt..... In application pool advanced configuration, under process model, the ping maximum response time (seconds) is set to 90 so as far as I know, when application pool pings the worker process and it does not respond because it is hang, after 90 seconds then worker process should terminate, but it seems it is not terminating because when this happens we need to restart application pool in order to app works again.... so Why in this case worker process does not terminate?
First off, you have "only" one Worker Process and should probably keep it that way. Often times Web Gardening causes more issues than it helps, particularly with .NET Apps. Second, you say it freezes/hangs due to "uncontrolled" (unhandled?) exception thrown by app code. Why do you think this is the case. Do you have an error page or something indicating its an exception? The "ping" process checks if the process is still doing work, but not necessarily finishing requests. So from the perspective of WAS, IIS is still responding.
If you want to troubleshoot, you could investigate getting a memory dump with DebugDiag and perform some automated analysis on it. https://support.microsoft.com/en-us/help/919792/how-to-use-the-debug-diagnostics-tool-to-troubleshoot-a-process-that-h
Our application uses neo4j 3.5.x (tried both community and enterprise editions) to store some data.
No matter how we setup memory in conf/neo4j.conf (tried with lots of combinations for initial/max heap settings from 4 to 16 GB), the GC process runs periodically every 3 seconds, putting the machine to its knees, slowing the whole system down.
There's a combination (8g/16g) that seems to make stuff more stable, but a few minutes (20-30) after our system is being used, GC kicks again on neo4j and goes into this "deadly" loop.
If we restart the neo4j server w/o restarting our system, as soon as our system starts querying neo4j, GC starts again... (we've noticed this behavior consistently).
We've had a 3.5.x community instance which was working fine from last week (when we've tried to switch to enterprise). We've copied over the data/ folder from enterprise to community instance and started the community instance... only to have it behave the same way the enterprise instance did, running GC every 3 seconds.
Any help is appreciated. Thanks.
Screenshot of jvisualvm with 8g/16g of heap
On debug.log, only these are significative:
2019-03-21 13:44:28.475+0000 WARN [o.n.b.r.BoltConnectionReadLimiter] Channel [/127.0.0.1:50376]: client produced 301 messages on the worker queue, auto-read is being disabled.
2019-03-21 13:45:15.136+0000 WARN [o.n.b.r.BoltConnectionReadLimiter] Channel [/127.0.0.1:50376]: consumed messages on the worker queue below 100, auto-read is being enabled.
2019-03-21 13:45:15.140+0000 WARN [o.n.b.r.BoltConnectionReadLimiter] Channel [/127.0.0.1:50376]: client produced 301 messages on the worker queue, auto-read is being disabled.
And I have a neo4j.log excerpt from around the time the jvisualvm screenshot shows but it's 3500 lines long... so here it is on Pastebin:
neo4j.log excerpt from around the time time the jvisualvm screenshot was taken
JUST_PUT_THIS_TO_KEEP_THE_SO_EDITOR_HAPPY_IGNORE...
Hope this helps, I have also the logs for the Enterprise edition if needed, though they are a bit more 'cahotic' (neo4j restarts in between) and I have no jvisualvm screenshot for them
I'm writing an Electron app and a few builds back testers started noticing that two electron.exe processes were consuming a lot of CPU time all the time. One pegging a CPU core and the other using about 85% of a core.
I'm certain that this was not always the case as builds several months ago didn't do this. But I'm at a loss to know how to debug what code changes may have introduced this as the code base has evolved dramatically over that time.
process.getIOCounters() reports that several gigabytes of IO is occurring every few minutes. The application is not deadlocked and everything still works it is just chewing through CPU. It happens anytime the app is open even if it is in the background without any user input. I only have windows 10 x64 systems that I've deployed this to as Electron 1.7.9 and also 1.7.5.
Based on the behavior I'm certain that this IO is interprocess communication between the render and main threads, but I'm not manually performing any IPC. I think this problem is being caused by some module we've introduced that improperly resides in the rendered thread.
My question, how does one debug the The Electron render/main thread IPC pipe? Can it be hooked to know what the contents of the gigabytes of traffic are?
Based on the past few days of attempting to debug this I've answer the question for myself:
My question, how does one debug the The Electron render/main thread IPC pipe?
Don't, electron seemed like a good idea, writing all your client and platform code in the same place. But there are a lot of catches, and out of the blue libraries will have strange bugs that are costly to address because they are outside the main stream use case. This certainly has a lot to do with me not being an Electron Expert, but in the real world there are deadlines and timelines and I can't always get up to speed as much as I would like to.
I've updated my architecture to the tried an true Service/GUI model. I'll be maintaining full browser support for the client code as well as an Electron mode with hooks for some features when electron is detected.
This allows me to quickly identify issues that are specific to browser, version or platform framework. It also lets me use which ever version of NodeJS that I would like to for the service which has also been an issue in my case.
I still love Electron though, I'm just going to be more careful as I use it. If I do discover the specifics of why I had this problem I'll check back and report those details.
Update
So this issue was not directly related to Electron like I had supposed, the IPC was not between the renderer and main threads and was a red herring. It was actually a chrome key frame animation issue which was causing a 60 FPS redraw rate, still not sure why this caused GBs of IPC, but whatever. See https://github.com/Microsoft/vscode/issues/22900
I was able to discover this by porting this app back to native browser ( with nodejs service ). I then ran in chrome, edge and firefox. Only chrome behaved this way.
In my logs I have requests to signalr/poll and signalr/connect that take around 30 seconds.
The applications had some isues recently caused by thread starvation. Might these requests be the root cause or is it an expected behaviour and is the duration normal?
When I request the site with chrome I see websocket traffic so I gues it is running ok for most clients.
The applications is accessed via vpn and sometimes the connection is bad. Could this be a reason for falling back to long polling?
If you do not have enough threads, you end up in a deadlock, and the app will start to error out, and not work properly. At that point you would be forced to restart your AppPool, or web server. If your application is falling back to long polling, and a client is connected but not doing anything the poll will remain open until it gets a response, or if there is a timeout configured (default is 30 seconds I believe) will close on timeout. I would try restarting your AppPool and see if that helps, if not there is something wrong in the transportation layer, it should only need to fall back to long polling in extreme circumstances
I have a website that is hanging every 5 or 10 requests. When it works, it works fast, but if you leave the browser sit for a couple minutes and then click a link, it just hangs without responding. The user has to push refresh a few times in the browser and then it runs fast again.
I'm running .NET 3.5, ASP.NET MVC 1.0 on IIS 7.0 (Windows Server 2008). The web app connects to a SQLServer 2005 DB that is running locally on the same instance. The DB has about 300 Megs of RAM and the rest is free for web requests I presume.
It's hosted on GoGrid's cloud servers, and this instance has 1GB of RAM and 1 Core. I realize that's not much, but currently I'm the only one using the site, and I still receive these hangs.
I know it's a difficult thing to troubleshoot, but I was hoping that someone could point me in the right direction as to possible IIS configuration problems, or what the "rough" average hardware requirements would be using these technologies per 1000 users, etc. Maybe for a webserver the minimum I should have is 2 cores so that if it's busy you still get a response. Or maybe the slashdot people are right and I'm an idiot for using Windows period, lol. In my experience though, it's usually MY algorithm/configuration error and not the underlying technology's fault.
Any insights are appreciated.
What diagnistics are available to you? Can you tell what happens when the user first hits the button? Does your application see that request, and then take ages to process it, or is there a delay and then your app gets going and works as quickly as ever? Or does that first request just get lost completely?
My guess is that there's some kind of paging going on, I beleive that Windows tends to have a habit of putting non-recently used apps out of the way and then paging them back in. Is that happening to your app, or the DB, or both?
As an experiment - what happens if you have a sneekly little "howAreYou" page in your app. Does the tiniest possible amount of work, such as getting a use count from the db and displaying it. Have a little monitor client hit that page every minute or so. Measure Performance over time. Spikes? Consistency? Does the very presence of activity maintain your applicaition's presence and prevent paging?
Another idea: do you rely on any caching? Do you have any kind of aging on that cache?
Your application pool may be shutting down because of inactivity. There is an Idle Time-out setting per pool, in minutes (it's under the pool's Advanced Settings - Process Model). It will take some time for the application to start again once it shuts down.
Of course, it might just be the virtualization like others suggested, but this is worth a shot.
Is the site getting significant traffic? If so I'd look for poorly-optimized queries or queries that are being looped.
Your configuration sounds fine assuming your overall traffic is relatively low.
To many data base connections without being release?
Connecting some service/component that is causing timeout?
Bad resource release?
Network traffic?
Looping queries or in code logic?