"Barry Kelly" <✉eircom.net> wrote in message
> You show me where the reasoning breaks down:
> * If you see virtue in ignoring error conditions, you
> aren't surviving errors - you are ignoring them.
> (this point follows directly)
> I ask for an integer. The user enters "x".
> * Does my StrToInt routine have a fault for not converting it?
No: A fault would be, if StrToInt returned some value without somehow
indicating that it is invalid.
> * Is the input faulty?
> I fear we're losing the semantics here; we need to define "failure",
> "fault" and responsibility for input and correction more clearly in
> order to say what's best for each case.
OK let me take a swipe at that:
To begin with the easy part: input validation - this isn't software
errors IMO. The input is generated by a human operator or another
application or comes from a data file. If this input is not conformant to
that expected by the program, the program shall robustly reject that
input and - if possible - allow other input data to be presented. This
is, in my view, *not* related to error handling as such but rather a
normal function of any program which takes input from external sources.
An 'error' in this context (i.e. from the internal application software
point of view) would only be if the program failed to trap bad input
data, or unexpectedly reject valid input data; i.e. the application is
not fulfilling its 'contract' or specification. If the user types an 'x'
where '0'-'9' was expected, it is a user error but not an error in the
application as such (i.e. the application did nothing 'wrong').
Going from there: a 'fault' would be a design (specification) and/or
implementation error in the application which allows the application to
accept invalid input and treat it as if it were valid. So: user enters
'x' and internal converter reports value=185 with no notion of
At this point, we have an *error situation* in the software. The error is
a symptom of a fault (fault being that converter function doesn't react
to invalid characters in input). The consequence of the error is failure:
failure of the converter function and illegal / inconsistent data in your
application which may quickly lead to invalid results.
> I have an editor. It's called PFE - Programmer's File Editor. ....
> <snip> ... but that editor continuing and surviving
> long enough to save its "buffer" to disk was *lethal*. The sooner it
> could have crashed and alerted me, the better.
> I just don't believe in what you're advocating. Don't mind that it isn't
> a good idea; it isn't even ethical.
Sounds like we need a counter-example:
One of my company's software products supports communication links with
satellites while on ground,
i.e. during development and testing. Those satellites cost typically 1-2
billion EUROs apiece. They are pretty vulnerable and helpless little
buggers in their infancy and will fall over and cry for mummy for the
silliest little reasons. While the system software engineers are
uploading new and more mature software versions to their on-board
computers every week, the problem is that it can take a day or more to
reboot such a satellite once it's down. A surefire way of taking the
satellite down is to abandon it for more than 5 minutes. A missing comm
link will do that for you, unless you can somehow tell it to go to "safe
mode" first, which is a hibernation state from which it will be
resurrected more quickly. After a while, distrusting the satellite's
maturity (self-sustainment) level, the engineers have little choice but
to hit the "emergency power-off" button.
If my software incurs an anomaly, e.g. due to an I/O error at file level,
I am *not* going to panic and take the whole software down in one blow,
because this doesn't offer the users a chance to tell the satellite to go
to safe mode first, for example. Or worse yet, a temperature conditioning
unit on-board may run uncontrolled and damage an expensive sensor which h
as taken many years and millions to build. So, the lesson for me is:
whatever happens, 1) assess the damage; 2) isolate the damage; 3) notify
the user; 4) allow healthy parts to be used; 5) let the user decide when
it's ok to shut down.
So what I am saying is: *of course* you should *never produce incorrect
results*. Allowing a source code file to corrupt like you described is a
prime example of that. But if your software is built out of carefully
self contained modules then an error in one part of the software should
stay confined to that part and still allow the rest of the software to
continue operations until the *user* has determined that he's as ready as
he can get to quit the application.
Another example: When Delphi IDE crashes (which it does, regularly) I am
*very* happy that usually I am able to save my work before I quit Delphi.
This is of course assuming that the files are saved correctly!!
Now, the follow-on, if you have followed me so far, is this: when you are
trying to dam up against error propagation (by this I mean a chain
reaction of more failures following upon a single failure - not the
passing of information about errors) the exception mechanism, with its
propensity for automatic self-propagation (and potential disruption of
state in all layers of software which it transverses) is actually not the
kind of behaviour you want. Sure, if properly dealt with you need not
incur any new problems, but there is IMO a latent *risk* associated with
this propagation behaviour which you do *not* see with error codes.
Combine this with the fact that many exceptions, especially the low-level
ones from third-party libraries and components which you have no control
or visibility upon, are often not justified or based on different error
handling schemes than your own, and the risk to your application's
> This code:
> * Loses the concept of natural return value (that is, every
> function is effectively a procedure since the return value
> has been hijacked for another purpose.
I am afraid you close-read me too much here.
This is not what I actually do, by TErrorInfo for example I was just
trying to indicate a generalised "error info returned" without committing
myself to a particular data type. Also, I guess more often than not, the
error info is returned as a var parameter (my favorite parameter list
closure phrase is 'var isOK:boolean);' - I even have a keyboard macro for
this one! <g>). So I am not trying to rob functions of their natural
return values. This is more the case with API related routines or custom
DLL function calls.
> * Requires plumbing in *every* procedure, even if it doesn't have to
> do any error management itself.
> * Is trying to reinvent exceptions, but maybe you don't know that yet.
See my comments on TErrorInfo above.
> * Requires now partially redundant try..finally blocks for cleanup.
Let's write off that on the part of a clumsy and oversimplified code
> I wouldn't dream of letting *my* back end near the user interface. <g>
??? you lost me there, I must be getting too serious about this <g>
What I meant was that, at least in all operational/control classes, I
always have a messaging event which passes text information to the owner
of that object, which in turn has its own event for exporting these
messages again... eventually this upward chain of message passing results
in a line of log text displayed in the user interface.
> > So that's why I feel also, that just passing some low level exception on
> > to BigBoss has little actual value.
> There are at least three solutions to this.
<... examples noted... >
Interesting techniques which can make raising exceptions "look" like
error code passing.
<discussion on printing exceptions>
> So let the application handle the exception: <....>
> At least it lets the user know that something is wrong and gives
> them a clue, instead of leaving them wondering if the
> printer is plugged in.
OK, this wasn't the point here - I wasn't trying to advocate "leaving the
user in the dark". I was trying to demonstrate that there are "errors"
(leading to *exceptions*) which an application might not actually care
about. Within the spec of that application, these errors are simply not
relevant. We truly don't care. Period. This is not *sloppy programming*,
we are *not* incurring bad states, no, we simply call some bleeping API
somewhere, cross our fingers and don't care if it blows up, as long as it
doesn't blow *us* up. Printing was one example. Trying to locate a
default e-mail client or web browser is another example. If it's not
there it's not there. Too bad.
In these situations it's nice to have error codes, because you can do
what you want with them.
Exceptions, you *got to* catch them, or they'll wallop you. So, this puts
extra constraints on your code.