Rhino eTL: Join operation with orphan rows - rhino-etl

I'm using rhino ETL for the first time in a project, and I'm very
impressed by its capabilities. I use a join-operation to match two
datasources.
Sometimes there might be missing data, so I override LeftOrphanRow to
"log" the error. So I though I would throw an exception and then at
the end of the process collect all occured exceptions using
GetAllErrors().
But as it seems the process is being aborted with the first exception.
Is that intentionally? What would be the best way to deal with
OrphanRows (especially when I would like to have a summary of all orphan rows for all operations at the end of the process)?

Seems to me that the problem is that you're trying to use exceptions to report a non-exceptional event. That's not really what exceptions are for, and certainly when you're expecting the exception to pass through a third-party library, you shouldn't rely on that library to behave in any specific way with respect to that exception.
Can you just keep a list of orphan rows somewhere, e.g. globally, and add to it whenever you encounter one in any of your join operations? Then after your EtlProcess is finished, just print the list out. You might also consider using log4net to accomplish this. Or even simply raising an event, that you subscribe to elsewhere and do whatever seems appropriate.

Related

What exceptions should I catch in an ActiveRecord Transaction block?

I can't believe this has not already been discussed on SO but the helpful question completion widget didn't show one ...
The question is, akin to worrying about possible errors when attempting an HTTP connection, what "system level" exceptions should I catch when using ActiveRecord::Base.transaction? I understand about catching invalid records and statements, caused by bad data; but what about all the ways in which the database connection and/or transaction might fail for reasons outside my app's logic's control?
The whole point of the transaction is that if an error is raised at any point in the transaction block everything is reverted. So you don't need to check for any of that.
You certainly want to see if everything succeeded or failed but individual statements in the block you don't need to check.
Unless I'm completely misunderstanding your question.

How to know what exceptions to rescue?

I often find myself without knowing what exceptions to rescue when using a specific library of code in Ruby.
For instance, I often use HTTParty for any HTTP requests my rails/sinatra app would make. I dug around the code for HTTParty and found a file containing the defined exceptions used. Great! I'll just rescue them when making a request.
To test it out, I put in a bogus domain name for the request, but instead of HTTParty::ResponseError exception I expected, I instead get got a SocketError exception.
What is the best way to deal with this? I'm aware that HTTParty is a wrapper for Ruby's implementation, and that's probably what threw the SocketError exception. But how would I know that normally?
I could solve this by just rescuing "Exception", but that's pretty awful practice. I'd rather be well aware of the exceptions I could be causing and dealing with those.
EDIT: I should clarify that what really prompted me to create this question was that I have no idea how I can figure out the possible exceptions that CAN be raised when calling a specific function... that is, without looking through every single function call in the stack.
In general terms (I'm not a ruby programmer) I use the following approach.
I deal with exceptions in the following way:
Can I recover from it? If the exception can happen and I know I can recover or retry perhaps, then I handle the exception.
Does it need to be reported? If the exception can happen but I know I can't recover or retry perhaps, then I handle the exception by logging it and then passing it on to the caller. I always do this on natural subsystem boundary like major module or services. Sometimes (dependant on the API) I might wrap the exception with a 'my module' specific one so that the caller only has deal with my exceptions.
Can't handle it? All exceptions that are not dealt with should be caught at the top level and (a) reported, (b) ensure that the system remains stable and consistent. This is the one that should always be there regardless of whether the other two are done.
Of course there is another class of exception - the ones that are so serious that they give you no chance to deal with them. For these there is only one solution -Post Mortem debugging and I find the best thing for this is logs, logs and more logs. And having worked on many system from small to large, I would prefer to sacrifice performance for stability and recoverability (except where it's critical) and add copious amounts of logging - introspectively if possible.
A socketError response is totally fine if you put in a bogus domain name.
After all - trying to connect to a non-existant domain would cause the connection to fail AKA SocketError.
The best way to deal with that is to use a valid Domain with a false URL in your test, but catch socketError in your live code.
The problem here is not that you're catching the wrong exception but that you're priming the test with bad data.
The best course of action is understand what exceptions could happen and manage them,
When I say understand, I'm getting at - Where does the URL come from, is it entered by your user ? if so never trust it and catch everything. Does it come from your config data; Semi trust it and log errors, unless it's mission critical that the URL is ok.
There's no right or wrong answer here but this approach will, I hope, give you a good result.
Edit: What I'm attempting to do here is advocate the mindset of a programmer that is aware of the results of their actions. We all know that trying to connect to 'thisServiceIsTotallyBogus.somethingbad.notAvalidDomain' will fail; but the mindset of a programmer should be to first validate exactly where that domain comes from. If it is inputted by the user then you must assume full checks; if you know it's from a config file only accessed by yourself or your support team; you can relax a little; Sadly though, this is a bad example as you should really always test URLS because sometimes the internet doesn't work!
Ideally, the developer documentation for anything you use should tell you what exceptions that it can throw.
For libraries or gems where the source code is publicly available, you can typically find the types of exceptions in an exceptions.rb file. (Ref here). Otherwise you will have to rely on the documentation. If all else fails you can rescue StandardError, although it is a less-than-ideal practice in many cases (ref this SO answer.

Is it possible to tell BugzScout to stop reporting on a specific exception or set FogBugz to stop tracking that exception?

We have an exception that pops up on our website that is getting reported to BugzScout many times a day. The functionality that produces the exception still does what it's intended to do, so we just want to stop FogBugz from piling up all of these occurrences until we have a chance to dig into the issue and prevent the exception.
That said, is there a way to set up a filter on the FogBugz side of things to ignore a list of exceptions that get reported? I know I can set up some logic in our app's BugzScout class so it stops sending those messages, but it would be nice to know if FogBugz does this already before I put the time into building that filter locally. We are using the hosted FogBugz On Demand version of the product if that makes a difference.
In the BugzScout case itself, you should be able to set "Stop Reporting" for the Scout Will setting. This way, only the occurrences will increment when the exception is reported. The case will not reopen or notify anyone.
It sounds a bit from your description that there are many different exceptions reporting to the same ScoutDescription. As much as possible, you should use version numbers and exception line numbers to make sure that exceptions are reported separately. I can elaborate on this if you want.

Core dump equivalent for the Rails exception

So I got an exception log from my application. I have a call stack, request parameters and all other usual stuff in that log. This is a rare exception and info from the log doesn't contain all details I need to resolve / duplicate the problem.
I wonder if there is some way (gem?) to get full dump of Rails application state in case of an exception. Including all instance and local variables values from a controller methods. I guess that dump of whole Ruby object space might take even a minute or so but I don't care about disk and cpu resources in a such case.
I don't think so, the way I go about trying to do something like that is by using logger.error on the variables that i need more on info on.
If possible, it's also not a bad idea to try running it with ruby-debug (instructions here). All you would need to do is insert a call to debugger right before the error would be triggered or stick it in a rescue clause.
I am unaware of any method to get the application state.
One thing I do upon occassion is install netbeans which has a graphical debugger. You can hover over variables to see their values, easily walk the stack, and also have the exception trigger the debugger, rather than a breakpoint.

Exception Logger: Best Practices

I have just started using an exception logger (EurekaLog) in my (Delphi) application. Now my application sends me lots of error messages via e-mail every day. Here's what I found out so far
lots of duplicate errors
multiple mails from the same PC
While this is highly valuable input to improve my application, I am slightly overwhelmed by the sheer amount of information I'm getting.
What are your best practices for handling mails from your application?
If you get to much information as it is currently the case you are not getting any information at all.
So I would say categorize your errors into groups, like WARNINGS, FATAL ERRORS, etc.
Then limit your emails to the most important messages (FATAL).
Apart from that review your logs on a regular basis (day, week ...).
What I've done with my exception logging, which uses madExcept as the core, but my own transport mechanism, is have them all go into a database. The core information is all extracted from each report and put into fields, and the whole report is stored as well. The stack trace is automatically analysed to remove the un-interesting functions, leaving a list of only my functions that have failed.
With this happening automatically, I can now "ignore" each individual message coming in, but see the bigger picture in a grid that shows me simply which functions are having the most problems. I can then focus on them, look for the causes, and fix them.
My display app is also able to filter out reports in builds before a certain number if I choose, so that I can tell it not to include "MyWidget.BadProc" before build 75 once I've fixed it.
This has helped me improve my app, and hit the problems that people found most problematic, without having to guess.
It would very much depend on what the errors are that are being sent back. The obvious one being if there are errors in your application, they need fixing and patches/updates sent to your clients.
If they are exceptions that you know can happen and do not required you to be notified you can add "Exception Filters" in the Eureaka Log options to specify how they should be handled (or ignored!).
Another option is to use EurekaLog Variables (where you can add the exception description etc) in the mail Subject line and then use your email client to filter based on this.
I did this using madExcept. It's really useful for tracking down problems we couldn't reproduce ourselves.
Which makes me ask why you are getting so many? Untrapped exceptions should be few and far between. Especially if the user sees an error dialog. I was responsible for several applications, each with hundreds of installs and I would rarely get e-mail notices.
If they are mostly from a very small number of PCs, I'd work with some of those users to find out what they're doing differently, or how their setup might be generating exceptions.
If they are from all over the place, it's probably a bug that got through your testing.
Either way, use the details to fix your code or, at the very least, anticipate known exceptions and trap them properly (no empty try..except).
Fixing the hot spot problems will cut way down on the number of e-mails you get, making the occasional notice much more manageable.
I think that you should throw out all duplicates. Leave only count of the reports. I.e. if you get, say 100 reports, but there are only 4 unique problems - leave only 4 reports, throw out other 96 reports, but use their count to sort reports by severity. For example, 6 reports for fourth problem, 10 reports for third, 20 for second and 60 for first. So, you should fix first problem with 60 reports and only then switch to second.
I believe that EurekaLog has BugID in its reports. Same problem has the same BugID. This will allow you to sort reports with duplicates. EurekaLog Viewer also can sort out duplicates.

Resources