Exception Logger: Best Practices - delphi

I have just started using an exception logger (EurekaLog) in my (Delphi) application. Now my application sends me lots of error messages via e-mail every day. Here's what I found out so far
lots of duplicate errors
multiple mails from the same PC
While this is highly valuable input to improve my application, I am slightly overwhelmed by the sheer amount of information I'm getting.
What are your best practices for handling mails from your application?

If you get to much information as it is currently the case you are not getting any information at all.
So I would say categorize your errors into groups, like WARNINGS, FATAL ERRORS, etc.
Then limit your emails to the most important messages (FATAL).
Apart from that review your logs on a regular basis (day, week ...).

What I've done with my exception logging, which uses madExcept as the core, but my own transport mechanism, is have them all go into a database. The core information is all extracted from each report and put into fields, and the whole report is stored as well. The stack trace is automatically analysed to remove the un-interesting functions, leaving a list of only my functions that have failed.
With this happening automatically, I can now "ignore" each individual message coming in, but see the bigger picture in a grid that shows me simply which functions are having the most problems. I can then focus on them, look for the causes, and fix them.
My display app is also able to filter out reports in builds before a certain number if I choose, so that I can tell it not to include "MyWidget.BadProc" before build 75 once I've fixed it.
This has helped me improve my app, and hit the problems that people found most problematic, without having to guess.

It would very much depend on what the errors are that are being sent back. The obvious one being if there are errors in your application, they need fixing and patches/updates sent to your clients.
If they are exceptions that you know can happen and do not required you to be notified you can add "Exception Filters" in the Eureaka Log options to specify how they should be handled (or ignored!).
Another option is to use EurekaLog Variables (where you can add the exception description etc) in the mail Subject line and then use your email client to filter based on this.

I did this using madExcept. It's really useful for tracking down problems we couldn't reproduce ourselves.
Which makes me ask why you are getting so many? Untrapped exceptions should be few and far between. Especially if the user sees an error dialog. I was responsible for several applications, each with hundreds of installs and I would rarely get e-mail notices.
If they are mostly from a very small number of PCs, I'd work with some of those users to find out what they're doing differently, or how their setup might be generating exceptions.
If they are from all over the place, it's probably a bug that got through your testing.
Either way, use the details to fix your code or, at the very least, anticipate known exceptions and trap them properly (no empty try..except).
Fixing the hot spot problems will cut way down on the number of e-mails you get, making the occasional notice much more manageable.

I think that you should throw out all duplicates. Leave only count of the reports. I.e. if you get, say 100 reports, but there are only 4 unique problems - leave only 4 reports, throw out other 96 reports, but use their count to sort reports by severity. For example, 6 reports for fourth problem, 10 reports for third, 20 for second and 60 for first. So, you should fix first problem with 60 reports and only then switch to second.
I believe that EurekaLog has BugID in its reports. Same problem has the same BugID. This will allow you to sort reports with duplicates. EurekaLog Viewer also can sort out duplicates.

Related

Remote Queued Logging of Issues (not exceptions)

We're writing an application which uses our http api, and occasionally it will encounter errors, failed connections, timeouts, etc.
We'd like to, at least in beta, be able to capture these incidents, and forward them to ourselves somehow. Since obviously many of these issues could be due to the actual connection being down, this would need to queue these incidents and send them when a connection is available.
I tried googling for an answer for this but to no avail, came across a bunch of solutions which catch exceptions, but not just random "incidents" (could really just be a string we log somewhere, we'd just include all the details in it).
Short of writing my own coredata (or something) backed queue, I'm at a loss at what a solution for this could be.
Does anyone know of any libs/services which could help with this?
You might want to look into Testflight, or less general purpose, Parse. Not quite sure, but maybe HockeyKit offers a solution for this, too.
You can take a look at Bugfender, it's a product we have built to solve this problem. We have found that while developing an app there are a lot of issues that are not crashes, so we decided to make our own product to help us on this.
It's easy to integrate and you can get the devices logs. Our service works offline and online, we have spent a lot of time to make it reliable and easy to use.
Compared to other products, you don't need any crash to get the logs, you choose when you want them.

How can I keep a large amount of OutputDebugString() calls from degrading my application in the Delphi 6 IDE?

This has happened to me on more than one occasion and has led to many lost hours chasing a ghost. As typical, when I am debugging some really difficult timing-related code I start adding tons of OutputDebugString() calls, so I can get a good picture of the sequence of related operations. The problem is, the Delphi 6 IDE seems to be able to only handle that situation for so long. I'll use a concrete example I just went through to avoid generalities (as much as possible).
I spent several days debugging my inter-thread semaphore locking code along with my DirectShow timestamp calculation code that was causing some deeply frustrating problems. After having eliminated every bug I could think of, I still was having a problem with Skype, which my application sends audio to.
After about 10 seconds the delay between my talking and hearing my voice come out of Skype on the second PC that I was using for testing, the far end of the call, started to grow. At around 20 - 30 seconds the delay started to grow exponentially and at that point triggered code I have that checks to see if a critical section was being held too long.
Fortunately it wasn't too late at night and having been through this before, I decided to stop relentlessly tracing and turned off the majority of the OutputDebugString(). Thankfully I had most of them wrapped in a conditional compiler define so it was easy to do. The instant I did this the problems went away, and it turned out my code was working fine.
So it looks like the Delphi 6 IDE starts to really bog down when the amount of OutputDebugstring() traffic is above some threshold. Perhaps it's just the task of adding strings to the Event Log debugger pane, which holds all the OutputDebugString() reports. I don't know, but I have seen similar problems in my applications when a TMemo or similar control starts to contain too many strings.
What have those of you out there done to prevent this? Is there a way of clearing the Event Log via some method call or at least a way of limiting its size? Also, what techniques do you use via conditional defines, IDE plug-ins, or whatever, to cope with this situation?
A similar problem happened to me before with Delphi 2007. Disable event viewing in the IDE and instead use DebugView from Sysinternals.
I hardly ever use OutputDebugString. I find it hard to analyze the output in the IDE and it takes extra effort to keep several sets of multiple runs.
I really prefer a good logging component suite (CodeSite, SmartInspect) and usually log to various files. Standard files for example are "General", "Debug" (standard debug info that I want to collect from a client installation as well), "Configuration", "Services", "Clients". These are all set up to "overflow" to a set of numbered files, which allows you to keep the logs of several runs by simply allowing more numbered files. Comparing log info from different runs becomes a whole lot easier that way.
In the situation you describe I would add debug statements that log to a separate logfile. For example "Trace". The code to make "Trace" available is between conditional defines. That makes turning it on pretty simple.
To avoid leaving in these extra debug statements, I tend to make the changes to turn on the "Trace" log without checking it out from source control. That way, the compiler of the build server will throw out "identifier not defined" errors on any statements unintentionally left in. If I want to keep these extra statements I either change them to go to the "Debug" log, or put them between conditional defines.
The first thing I would do is make certain that the problem is what you think it is. It has been a long time since I've used Delphi, so I'm not sure about the IDE limitations, but I'm a bit skeptical that the event log will start bogging down exponentially over time with the same number of debug strings being written in a period of 20-30 seconds. It seems more likely that the number of debug strings being written is increasing over time for some reason, which could indicate a bug in your application control flow that is just not as obvious with the logging disabled.
To be sure I would try writing a simple application that just runs in a loop writing out debug strings in chunks of 100 or so, and start recording the time it takes for each chunk, and see if the time starts to increase as significantly over a 20-30 second timespan.
If you do verify that this is the problem - or even if it's not - then I would recommend using some type of logging library instead. OutputDebugString really loses it's effectiveness when you use it for massive log dumps like that. Even if you do find a way to reset or limit the output window, you'd be losing all of that logging data.
IDE Fix Pack has an optimisation to improve performance of OutputDebugString
The IDE’s Debug Log View also got an optimization. The debugger now
updates the Log View only when the IDE is idle. This allows the IDE to
stay responsive when hundreds of OutputDebugString messages or other
debug messages are written to the Debug Log View.
Note that this only runs on Delphi 2007 and above.

Is it possible to tell BugzScout to stop reporting on a specific exception or set FogBugz to stop tracking that exception?

We have an exception that pops up on our website that is getting reported to BugzScout many times a day. The functionality that produces the exception still does what it's intended to do, so we just want to stop FogBugz from piling up all of these occurrences until we have a chance to dig into the issue and prevent the exception.
That said, is there a way to set up a filter on the FogBugz side of things to ignore a list of exceptions that get reported? I know I can set up some logic in our app's BugzScout class so it stops sending those messages, but it would be nice to know if FogBugz does this already before I put the time into building that filter locally. We are using the hosted FogBugz On Demand version of the product if that makes a difference.
In the BugzScout case itself, you should be able to set "Stop Reporting" for the Scout Will setting. This way, only the occurrences will increment when the exception is reported. The case will not reopen or notify anyone.
It sounds a bit from your description that there are many different exceptions reporting to the same ScoutDescription. As much as possible, you should use version numbers and exception line numbers to make sure that exceptions are reported separately. I can elaborate on this if you want.

How to fail gracefully and get notified if screen scraping fails in ruby on rails

I am working on a Rails 3 project that relies heavily on screen scraping to collect data mainly using Nokogiri. I'm aggregating essentially all the same data but I'm grabbing it from many difference sources and as time goes on I will be adding more and more. However I am acutely aware that screen scraping can be notoriously unreliable.
As such I am interested in how other people have handled the problem of verifying the data and then also getting notified if it is failing.
My current plan is as follow.
I am going to have validation on my model for most of the fields. If they fail I won't get bad data into my system. Although logging this failure in a meaningful way is still a problem.
I was thinking of some kind of counter where after so many failures from a particular source I somehow turn it off. Not sure how to keep track of that. I guess the only way is to have a field on my Source model that counts it and can be reset.
Logging is 800 pound gorilla I'm not sure how to deal with. I could just do standard writing to logs but if something fails I'd like to store the entire html so I can figure it out. Also I need to notify myself somehow so I can address the issues. I thought of maybe just creating a model for all this and storing it in the database. If I did this I'd probably have to store the html on s3 or something. I'm running this on heroku so that influences what I can do.
Setup begin and rescue blocks around every field. I was trying to figure out a to code this in a nicer ruby way so I just don't have a page of them but although I do have some fields are just straight up doc.css_at("#whatever") there are quite a number that require various formatting or calculations so I think it makes sense to try to rescue that so I can then log what went wrong. The other option is to let the exception bubble up and catch it when I try to create the model.
Anyway I'm sure I'm not even thinking of everything but that is why I'm trying to figure out how other people have handled this problem.
Our team does something similar to this, so here's some ideas:
we use a really high level begin/rescue transaction to make sure we don't get into weird half loaded states:
begin
ActiveRecord::Base.transaction do
...try to load a data source...
end
rescue
...error handling...
end
Email/page yourself when certain errors occur. We use exception_notifier but if you're sitting on Heroku the Exceptional plugin also seems like a good option. I've also heard of people having success w/ hoptoad
Capturing state is VERY important for troubleshooting issues. Something that's worked quite well for us is GMail. Our loaders effectively have two phases:
capture data and send it to our gmail account
log into gmail, download latest data and parse it
The second phase is the complex one, and if it fails a developer can simply log into the gmail account and easily inspect the failed message. This process has some limitations (per email and per mailbox storage limits, two phase pipeline, etc.) and we started out doing it because we had no other option, but it's proven shockingly resilient and convenient. Keep email in mind as a cheap/easy way to store noncritical state. We didn't start out thinking of using it that way and are now really glad we do. Logging into GMail feels better than digging through log files.
Build a dashboard UI. We have a simple dashboard with a grid of sources by day that looks like this. Each box is colored either red or green based on whether the load for that source on that day succeeded. You can go one step further and set up a monitor on this UI (mon.itor.us or equivalent) that alarms if some error threshold is met.

What information should I be logging in my web app?

I finishing up a web application and I'm trying to implement some logging. I've never seen any good examples of what to log. Is it just exceptions? Are there other things I should be logging? What type of information do you find useful for finding and fixing bugs.
Looking for some specific guidance and best practices.
Thanks
Follow up
If I'm logging exceptions what information specifically should I be logging? Should I be doing something more than _log.Error(ex.Message, ex); ?
Here is my logical breakdown of what can be logged within and application, why you might want to and how you might go about doing it. No matter what I would recommend using a logging framework such as log4net when implementing.
Exception Logging
When everything else has failed, this should not. It is a good idea to have a central means of capturing all unhanded exceptions. This shouldn't
be much harder then wrapping your entire application in a giant try/catch unless you are using more than on thread. The work doesn't end here
though because if you wait until the exception reaches you a lot of useful information would have gone out of scope. At the very least you should
try to collect specific pieces of the application state that are likely to help with debugging as the stack unwinds. Your application should always be prepared to produce this type of log output, especially in production. Make sure to take a look at ELMAH if you haven't already. I haven't tried it but I have heard great things
Application Logging
What I call application logs includes any log that captures information about what your application is doing on a conceptual level such as "Deleted Order" or "A User Signed On". This kind of information can be useful in analyzing trends, auditing the system, locking it down, testing, security and detecting bugs of coarse. It is probably a good idea to plan on leaving these logs on in production as well, perhaps at variable levels of granularity.
Trace Logging
Trace logging, to me, represents the most granular form of logging. At this level you focus less on what the application is doing and more on how it is doing it. This is one step above actually walking through the code line by line. It is probably most helpful in dealing with concurrency issues or anything for that matter which is hard to reproduce. You wouldn't want to always have this running, probably only turning it on when needed.
Lastly, as with so many other things that usually only get addressed at the very end, the best time to think about logging is at the beginning of a project so that the application can be designed with it in mind. Great question though!
Some things to log:
business actions, such as adding/deleting items. Talk to your app's business owner to come up with a list of things that are useful. These should make sense to the business, not to you (for example: when user submitted report, when user creates a new process, etc)
exceptions
exceptions
exceptions
Some things to NOT to log:
do not log information simply for tracking user usage. Use an analytics tool for that (which tracks the client in javascirpt, not in the client)
do not track passwords or hashes of passwords (huge security issue)
Maybe you should log page/resource accesses which are not yet defined in your application, but are requested by clients. That way, you may be able to find vulnerabilities.
It depends on the application and its audience. If you are managing sales or trading stocks, you probably should log more info than say a personal blog. When you need the log most is when an error is happening in your production environment, but can't reproduce it locally. Having log level and log hierarchy would help in such situations, because you can dynamically increase the log level. See log4j's documentation and log4net.
My few cents.
Besides using log severity and exceptions properly, consider structuring your log statements so that you could easily look though the log data in the future. For example - extracting meaningful info quickly, doing queries etc. There is no problem to generate an ocean of log data, the problem is to convert this data into information. So, structuring and defining it beforehand helps in later usage. If you use log4j, I would also suggest using mapped diagnostic context (MDC) - this helps a lot for tracking session contexts. Aside from trace and info, I would also use debug level where I usually keep temp. items. Those could be filtered out or disabled when not needed.
You probably shouldn't be thinking of this at this stage, rather, logging is helpful to consider at every stage of development to help diffuse potential bugs before they arise. Depending on your program, I would try to capture as much information as possible. Log everything. You can always stop logging certain components or processes if you don't reference that data enough. There is no such thing as too much information.
From my (limited) experience, if you don't want to make a specific error table for each possible error type, construct a generic database table that accepts general information as well as a string that you can populate with exception data, confirmation messages during successful yet important processes, etc. I've used a generic function with parameters for this.
You should also consider the ability to turn logging off if necessary.
Hope this helps.
I beleive when you log an exception you should also save current date and time, requested url, url refferer and user IP address.

Resources