How can I know that my server is working fine i.e in good health condition.
My Requirement is Users are complaining that they can not access the web application (Web site) something like it taking long time to do, some times its not completing the request.
I want to know whether my web site is in good condition or not before users and to get an alert message.
I want to know how we can measure whether the server is very responsive or user is not facing any problem. Some times my site takes long time coz. millions of data records have to be retrieved in that case I can not depend upon response time.
please help me on this
Monitoring response time without any third party software can be done with scripts like webinject. Webinject is a perl script that execute some browsing scenario and tells you if it acceptable or not.
Run a script at a regular interval, say 10min, that will start a webinject scenario. If the scenario is ko (check the return code of your webinject call), your script can send you an email, a sms, start a sound alarm, ... whatever is relevant to you.
You can also add some complexity by running a diagnostic script (check network by pinging relevant hardware, check cpu/ram usage of your servers, check the number of sessions in your database, ...) and send the diagnostic by email. You can also save the response times in a database (like a rdd database) to have a graphical view and be able to do some problem analysis on it.
Related
I have a python script that resides on a VPS, reads (each hour) financial news from a public datafeed and emails me when certain keywords of interest appear. That can happen only a few times a week, but such events are very important and must not be missed. On any data fetching or parsing error, I should also be notified via email, and errors of course get recorded into the server's local log file.
But how do I know that my smtp credentials are not blocked by the mail provider, or my VPS is not shut down by my hoster? In that case, I would not be notified and would be unaware of important events (and the failure to fetch/deliver them itself) until I decided to log into VPS manually and take a look at the logs.
Even if I would use a backup notification channel, e.g., SMS or Telegram, it still would not protect against cloud provider service disruption, or my account being blocked due to temporary payment issues, as there would exist no instance of the script to deliver the message on any of the channels. That's why I suspect some 3rd party fault-tolerant service is needed. Especially if I'm a freelance coder having lots of similar scripts, running on a mixture of VPSes, serverless/Lambdas, possibly for different end clients.
What is the best practice you dear developers are using to be notified when some script has not succeeded for a long enough time? I would like something reliable and ready-to-use, maybe you can recommend some existing monitoring services. At least I was not able to find the ones that solve my particular problem straight away.
To clarify, I don't want to spend time on some manual checking until it's absolutely necessary (in this case, I can tolerate up to 2 hours, and if it does not self-heal within that period, then I need to be notified), and I obviously don't want to get regular annoying reports that the service is doing fine and there simply were no interesting news detected. Plus, I of course want to keep the costs reasonable.
I had a python application, which, with the use of shell tricks, was able to send me emails when new error messages appeared in the log during an execution session, scheduled with cron.
Now I am packing it in docker and was able to reproduce most of its functionality with docker-compose.
But when it comes to emails on failures, I am not sure of what is the best way to implement it.
What are your suggestions? Are there any best practices?
Update:
The app runs couple of times a day. Previously, all prints to stderr was duplicated to stdout to preserve chronological order in the main log file. Then, the wrapper script would accumulate all stderr from a single session in another, temporary file. And if that file was not empty after the session, its content was send in a single email from me to myself through SMTP with proper authentication. I was happy to receive and able to handle them for the last few months.
Right now I see three possible solutions:
Duplicating everything worth sending to a temporary file right in the app, this way docker logs would persist. Then sending it after the session from the entrypoint, provided, there is a way to setup in container all the requirements.
Grepping docker log from the outside. But that's somewhat missing the point of docker.
Relaying reports via local net to another container, with something like https://hub.docker.com/r/juanluisbaptiste/postfix/ which then will send it in a email
I was not able to properly setup postfix or use mail utility inside the container, but python seems to work just fine https://realpython.com/python-send-email/
TL;DR look at NewRelic's free tier.
There is a lot to unpack here. First, it would be helpful to know more about what you had been doing previously with the backticks. Including some more information about what commands might change how I'd respond to this. Without that I might make some assumptions that are incorrect/not applicable.
I would consider errors via email a bad idea for a few reasons:
If many errors occur quickly, you inbox will get flooded with emails and/or flood a mail server with messages or flood the network with traffic. It's your mailbox and your network so you can do what you want, but it tends to fail dramatically when it happens, especially in production.
To send an email you need to send those messages through a SMTP server/gateway/relay of some sort and often times automated scripts like this get blocked as they trigger spam detections. When that happens and there is an error, the messages get silently dropped and production issues go unreported. Again, it's your data/errors and you can do that if you want, but I wouldn't consider it best practices as far as reliability goes. I've seen it fail to alert many times in my past experience for this very reason.
It's been my experiences (over 20 years in the field) that any alerting sent via email quickly gets routed to a sub-folder via a mail rule and start getting ignored as they are noisy. As soon as people start to ignore the messages, serious errors get lost in the inbox along with them and go unnoticed.
De-duplication of error messages isn't built-in and you could get hundreds or thousands of emails in a few seconds, often with the one meaningful error like finding a needle in a haystack of emails.
So I'd recommend not using email for those reasons. If you were really set on using email you could have a script running inside the container that tails the error log and sends the emails off using some smtp client (that could be installed in the docker container). You'd likely have to setup credentials depending on your mail server but it could be done. Again, I'd suggest against it.
Instead I'd recommend having the logs be sent to something like AWS SQS or AWS CloudWatch where you can setup rules to alert (via SNS which supports email alerting) if there are N messages in N minutes. You can configure those thresholds as you see fit and it can also handle de-duplication.
If you didn't want to use AWS (or some other Cloud provider) you could perhaps use something like Elasticache to store the events with a script to check for recent events/perform de-duplication.
There are plenty of 3rd party companies that will take this data and give you an all-in-one solution for storing the logs and a nice dashboard with custom notifications (via email/SMS/etc.), some of which are free. NewRelic comes to mind and is free assuming you don't need log retention.
I don't work for any of these companies, just some tools I've worked with that I'd consider using before I tried to roll a cronjob to send SMTP messages (although I've done that several times in my career which I needed something quick and dirty).
I have an MVC web site, where users can search for large recordsets from SQL Server and Oracle databases. Some of these recordsets can be very large, with many thousands of records. Sadly, it is a user requirement that they do not make their searches more specific.
When a user posts their search request to the database, my web page is hanging before often timing out (due to the amount of time taken to query the database).
We are thinking about removing the expensive database calls from the MVC site, and sending the query to a separate process to run in the background. When the query is complete, we can notify the user.
My proposed solution is:
1) When the user completes the search form in the web page, to simply display a message that the results are being generated and will be sent when complete
2) Send the SQL query to a database which can contain a list of SQL queries that need to be processed
3) Create a Windows Service which checks this database every couple of minutes for new queries
4) This Windows Service then queries the database. When the query is completed, it will create a CSV of the results, and email this to the user
I am looking for some advice and comments on my above approach? What do folks think of this as an approach to process expensive database calls in the background?
Generally speaking the requests will be made infrequently, but as mentioned, will be for a great amount of data. There is a chance that two or more requests could be made at the same time, but this will be infrequent.
I will also look at optimising the databases.
Grateful for any tips.
Martin :)
Another option is to supplement the existing code to execute the query on a separate thread so that periodic keep-alive updates can be sent to the requesting page while you wait for the query results. Similar to the way the insurance quote agregator pages work.
A second option is to make the results available as a hyperlink when they are ready and then communicate that either through the website or by email to the user.
Option three if these queries are not completely ad-hoc type queries then you could profile for the most frequent combinations and pre-compute them periodically placing the results into new tables (sort of halfway to optimising the current database structure).
The caveat there is that the data won't be as up to date - but given the time the queries are currently taking it probably isn't that important to be up to the second?
Whichever solution you choose I think it's going to depend on the user expectation - Do they know what they want and just send one big query and get it and be happy? or do they try several queries to find the right combination of parameters? If the latter then waiting for an email delivery of results might not be acceptable to them. But if what they want is a downloadable results document and they know what they want first time then it may. The only problem I see here is emails going astray or taking longer than the user thinks it should causing the request to be resubmitted multiple times and increasing the server workload - caching queries and results is probably a very good idea.
I would suggest to introduce layer of abstraction like messaging broker. Request will go in queue and batch layer will consume request from queue and once heavy work is done, batch layer will notify web layer again via messaging broker, Request-Reply pattern.
In addition on database side it is allways good to optimize queries.
I'm looking for suggestions to the best way to handle network connection issues for an iPhone app (iOS9/Swift2/Xcode7), to give the best user experience since we know that mobile data networks are unreliable. I have my coding options in place but I'd like to know what's worked well for other experienced techs. There's lots of info out there but nothing I could find specific to a strategy to occur when there is a connection failure.
Here is my basic strategy dealing with failed connections I'd like to implement (along with questions):
App sends request to api.myserver.com and the request fails
Wait X second(s) and try request to api.myserver.com again (how many tries and at what time interval would you suggest?)
Try pinging some other server (i.e. google.com) to see if we can access a resource other than api.myserver.com
If we can successfully ping google.com then we know our internet is working, so we try once again to ping api.myserver.com
If this last ping fails then we alert the user that we can't communicate for some reason and to try again later
I'm using the philosophy outlined in this SO answer recommended by an Apple tech, which in general means you always check the connection to your server first, using Reachability as a separate check to ensure phone hardware is available.
At any time during this process if Reachability is false then we would put our request in a queue to be tried again when the phone hardware connection was restored.
I think I've got a handle on the code involved, but looking for insights like "this is what worked for our app and gives a good user experience during connection issues...and was approved for use in the Apple app store...". I understand the concepts of trying/retrying connections in the case of failure and alerting the user (currently my code already does this successfully), but still not solid on a good policy to use for how many times should I try to reconnect and at what intervals?
For most of the apps I have worked on it was useful to define a couple of categories of requests which have different rules. For each category consider if retries are appropriate and how long you can really afford to wait before considering the request(s) a failure.
At the most sensitive are blocking requests, things which the user must allow to complete before they can proceed. Sign in, checkout, some editing actions, etc. For these it is often not worth retrying(1) and failures need to be communicated to the user immediately: if the device is offline let the user decide when to try again, if the request fails you've probably already made the user wait too long. Since failures tend to block the user they usually also need to be communicated prominently.
Less sensitive are usually use initiated but non-blocking actions: pull-to-refresh, loading details of a selected collection item, or performing a search. Your user might be waiting to see the results but is probably free to give up or navigate elsewhere in the app and check back later. Failures still need to be communicated so users can choose to try again or at least know to stop waiting but the notification of those failures can be less prominent. Here retries start to make sense. I usually start by trying to define a time limit from the user's perspective, how long will they wait before the app feels broken, and then let that be your limit for how long a request can wait for connectivity or make any number of retries in response to failed connections.
Even less sensitive are requests triggered only indirectly by your user; polling for updates, loading non-essential images, warming caches. These you might retry but the impact of failure is often so low that it may not matter.
Of all of those requests your retry policy really only impacts #2 so I would make sure you actually have requests of that type before worrying about it. Assuming those do actually apply to your app...
Wait X second(s) and try request to api.myserver.com again (how many tries and at what time interval would you suggest?)
I would set some interval here (in the tens to hundreds of milliseconds depending on your normal api performance) to avoid an accidental flood of requests. I don't want to suggest a precise number when I don't have a solid justification for it.
My experience has been that optimizing this value is unlikely to make a perceptible difference to your users because requests often take hundreds of milliseconds to fail and users are only willing to wait for a few thousand milliseconds so making 1 or 5 or 10 requests in that time doesn't really change the final outcome. If you are able to set different expectations with your users then your results may vary.
Try pinging some other server (i.e. google.com) to see if we can access a resource other than api.myserver.com
If we can successfully ping google.com then we know our internet is working, so we try once again to ping api.myserver.com
I would not assume that this is true nor do I think that making an extra request to a third party will help you make useful predictions about when to attempt to reach your own systems. This seems like extra work to build and maintain and likely to be a source of misleading results more than valuable information. In what scenario do you imagine this provides useful information to your app or its user?
Maybe not the answer you're looking for, hopefully it's still useful.
Disclaimer: my experience is biased toward apps with a fairly simple set of REST or RPC style network requests. If you're working on a problem which calls for streaming data, P2P connections, or some other scenario then don't start with these assumptions.
(1) One end note here because I see it as a source of failures so often: These requests should really be idempotent. Yes, even those POSTs creating new resources, checking out your cart, or whatever. When you cannot safely repeat a request you'll eventually see cases where the request completed but the client never got the acknowledgement so it looked like a failure. It's much easier to recover through a retry (automatic or user triggered) of the same request than to detect and recover from duplicate requests.
For better network performance. In my application I use to ping Google server for before every API request if its reachable then I called my server API else no network alert.
If you are on wifi network then also you have to do the same, because wifi reachability only checks for wifi connectivity not for network access.
I'm trying to set a reminder in a system to fire at a certain time.
This is a web based app, so it's not like it will be in memory all the time.
Ideally I'd like to avoid using a service or job on the server(mainly out of curiosity, to see if there is a more efficient way to do it)
For example, imagine how many Ebay bids are constantly ending all the times, and emails being sent out seemingly perfectly in time.
Do people recon there is just a big loop going over and over, moving items into a queue etc... Or is there something lower level helping out (stored procedures, triggers etc)
Thanks everyone.
What you have to realize about eBay - and most large database-backed websites - is that the interactions between humans and the database that come through the web server are only a part (sometimes a very small part) of the functionality of the system.
To use eBay as an example, the email that goes out when auctions expire is not handled by a web server. They are far more likely to have that scripted. In other words, there is another program running on a number of their systems that look at the database for ended auctions, do some processing on them, send emails, etc.
If I were doing something similar (albeit on a much smaller scale,) I'd have my web services built in the usual way, but have a job that is run automatically every few minutes to do the maintenance work. It would start up, look at the database for work, process anything that was required, then exit.