NewRelic says "No data reporting for this application" - timeout

NewRelic monitoring occasionally shows some apps as grey with the warning message "No data reporting for this application".
On drilling down into the app I can see that data has been reported that same day, but the site has simply not been active for a period (e.g. for an hour or so).
I suspect NR is designed for high volume sites where a long period of inactivity is unexpected, however there are use-cases where sites may be accessed infrequently, such as where the site provides data for one country during business hours, out of hours no activity would be expected.
Is there any way to amend the time a site has to be inactive before becoming grayed out by NewRelic so we can have sites inactive for over 1 day go gray, rather than simply those not active for the last ~hour?

The agents only sends in data when the site is active (not actively means nothing to send) so it is normal for a site to "go grey". Once the site receives traffic again it will light up green and start to reporting information.
Since having a site turn gray doesn't affect any metrics, alerting, or reports there is no configuration setting for how long to wait before turning grey.

Related

Cloud services to notify on a script not succeeding for a long time

I have a python script that resides on a VPS, reads (each hour) financial news from a public datafeed and emails me when certain keywords of interest appear. That can happen only a few times a week, but such events are very important and must not be missed. On any data fetching or parsing error, I should also be notified via email, and errors of course get recorded into the server's local log file.
But how do I know that my smtp credentials are not blocked by the mail provider, or my VPS is not shut down by my hoster? In that case, I would not be notified and would be unaware of important events (and the failure to fetch/deliver them itself) until I decided to log into VPS manually and take a look at the logs.
Even if I would use a backup notification channel, e.g., SMS or Telegram, it still would not protect against cloud provider service disruption, or my account being blocked due to temporary payment issues, as there would exist no instance of the script to deliver the message on any of the channels. That's why I suspect some 3rd party fault-tolerant service is needed. Especially if I'm a freelance coder having lots of similar scripts, running on a mixture of VPSes, serverless/Lambdas, possibly for different end clients.
What is the best practice you dear developers are using to be notified when some script has not succeeded for a long enough time? I would like something reliable and ready-to-use, maybe you can recommend some existing monitoring services. At least I was not able to find the ones that solve my particular problem straight away.
To clarify, I don't want to spend time on some manual checking until it's absolutely necessary (in this case, I can tolerate up to 2 hours, and if it does not self-heal within that period, then I need to be notified), and I obviously don't want to get regular annoying reports that the service is doing fine and there simply were no interesting news detected. Plus, I of course want to keep the costs reasonable.

Azure Websites and ASP.NET, how much inactivity before the app pool is recycled causing a recompilation?

I have a MVC3, .NET4.5 asp.net web application hosted on Azure Websites.
I am experimenting with "Free", "Shared" and "Standard" scaling configurations.
I have noticed that after a period of inactivity the compiled code get dropped from memory, or the app pool gets recycled forcing a JIT recompile.
My main question is what is time period before the compiled code gets dropped forcing a recompile? I assume this is as a result of the application pool recycling? I have come across this on standard shared hosts such as DiscountASP.
My second question is: What is the best approach to minimise this issue as I would not like my users bumping into this recompilation lag? My initial thoughts are precompilation.
Many thanks in advance.
EDIT:
I have a found a related SO post on this here: App pool timeout for azure web sites
However it seems, as like standard Shared hosting, one cannot change App Pool recycling. One has more flexibility with the "Standard" scale option, since it is dedicated. So the likely options at present are:
1) Precompilation
2) Use of "Keep alive" ping sites.
EDIT2:
1) "Keep Alive" approach seems to be working. I have a 10 minute monitor running.
I believe the inactivity period is 20 minutes by default. I haven't used web sites yet so I'm not famailiar with rescrtictions on changing settings but one quick way to keep your site activie is to use a uptime monitoring service like Pingdom (you can check one site for free at time of writing), this will ping your site regularly and prevent it from becoming idle.

Publishing to live – Get status and prevent timeouts

I have the following scenario:
LR Portal 6.1.20 EE GA2 Portal behind IBM WebSeal
Staged Sites
Custom portlet which needs to publish it’s contents from staging to live
The custom portlet is publishing it’s contents with a class that extends BasePortletDataHandler and overrides the following methods:
doExportData
doImportData
doDeleteData
isAlwaysExportable
isPublishToLiveByDefault
isAlwaysStaged
This works quite well in developing mode, where there is no WebSeal. In control panel, you go to "site pages" and invoke “publish to live”.
In production however, we get WebSeal timeouts whenever this process takes more than 2 minutes. The process is still running in the background, but the user has no way telling if it's done, if it worked or if it did not. He gets no feedback about it what so ever.
Is there a way to implement a custom portlet for the control panel which takes care of these problems? How do I get/track the status of the process and how do I keep the session alive?
I don't have any experience with liferay, but I administrator WebSEAL daily so I can approach your question from that angle. You can increase the timeouts on individual junctions. I have encountered similar scenarios with applications in the past. We have had to go up to a 300 second timeout.
[junction:junction_name]
http-timeout = 300
https-timeout = 300
http://publib.boulder.ibm.com/infocenter/tivihelp/v2r1/index.jsp?topic=%2Fcom.ibm.itame.doc_6.1.1%2Fam611_webseal_admin95.htm
You may also need to increase the server timeouts:
[server]
client-connect-timeout = 300
http://publib.boulder.ibm.com/infocenter/tivihelp/v2r1/topic/com.ibm.itame.doc_6.1.1/am611_webseal_admin94.htm?path=3_10_3_3_1_4_0_6_5#http-https-timeouts
The problem is the application doesn't send any data over the TCP connection, so WebSEAL times out the connection. Unless you can change the way your application works, you'll have to increase the timeouts. Preferably you would use AJAX or similar technology to have the client routinely query the server for a status once the procedure is kicked off. However, I had a customer that was integrating with us and they couldn't change their application code, so I was forced to increase the timeouts for them as well.

How do you monitor your website?

My team is responsible for a high traffic website, which is very dynamic with about 3.5 million unique urls. We deploy our application about 1 per week, we have a CMS that updates about 100 updates per week and our internal data source releases about 1 a week as well, and we consume about 10 other public webservices. It is always our teams responsibly to make sure everything is up and running.
We use pingdom to make sure some of are up, but it is limited to a few checks and it does not handle as many urls as we need.
We use Nagios as well but it is a bit of a black box and has not been fully adopted by our development team. Most of our developers are windows focused and cringe at the thought of all the configuration.
Most of what we need is just monitoring a few urls, and something that can notify me when things go down or change.
I think you should something like unit testing, that does internal testing of every website release before and after deoyment. Your application should have excellent exception handling as well and an exception should be logged and monitored.
If you use external monitoring with a tool like pingdom or www.downnotifier.com you can check one url per pagetime. For example: one news article, one text page and one product page.

Identifying poor performance in an Application

We are in the process of building a high-performance web application.
Unfortunately, there are times when performance unexpectedly degrades and we want to be able to monitor this so that we can proactively fix the problem when it occurs, as opposed to waiting for a user to report the problem.
So far, we are putting in place system monitors for metrics such as server memory usage, CPU usage and for gathering statistics on the database.
Whilst these show the overall health of the system, they don't help us when one particular user's session is slow. We have implemented tracing into our C# application which is particularly useful when identifying issues where data is the culprit, but for performance reasons tracing will be off by default and only enabled when trying to fix a problem.
So my question is are there any other best-practices that we should be considering (WMI for instance)? Is there anything else we should consider building into our web app that will benefit us without itself becoming a performance burden?
This depends a lot on your application, but I would always suggest to add your application metrics into your monitoring. For example number of recent picture uploads, number of concurrent users - I think you get the idea. Seeing the application specific metrics in combination with your server metrics like memory or CPU sometimes gives valuable insights.
In addition to system health monitoring (using Nagios) of parameters such as load, disk space, etc.., we
have built-in a REST service, called from Nagios, that provides statistics on
transactions pers second (which makes sense in our case)
number of active sessions
the number of errors in the logs per minute
....
in short, anything that is specific to the application(s)
monitor the time it takes for a (dummy) round trip transaction: as if an user or system was performing the business function
All this data being sent back to Nagios, we then configure alert levels and notifications.
We find that monitoring the number of Error entries in the logs gives some excellent short term warnings of a major crash/issue on the way for a lot of systems.
Many of our customers use Systems and Application Monitor, which handles the health monitoring, along with Synthetic End User Monitor, which runs continuous synthetic transactions to show you the performance of a web application from the end-user's perspective. It works for apps outside and behind the firewall. Users often tell us that SEUM will reveal availability problems from certain locations, or at certain times of day. You can download a free trial at
SolarWinds.com.

Resources