Levels of Error Handling
Daniel, November 29th, 2009If you have been using computers for any amount of time, you’ve come across an error message. From nearly useless “access violation” dialog boxes, to the more helpful “User names must not contain spaces” form messages, error messages are frequent on computers. Whether you’re a Mac, PC, Penguin, or Other, error messages that mean only “something broke” are frustrating.
Programmers often treat all kinds of errors the same, especially when a user interaction is involved. The problem is, not all errors have the same “meaning”, and not all errors should be handled the same way. If, as a User, I attempt to use the program for something, and it reports a message target at a programmer (displaying a stack-trace, for instance), it can be confusing. As a High-Level programmer, if a library call fails but my code gets back few details (getting only ‘There was an SQL error’, without the SQL statement that caused the error), it can be just as frustrating.
What do you mean by low-level or high level?
There usually different “levels” to an application, and each level has a specific type of concern. Errors on lower levels should be filtered up in as appropriate a manor as possible, so that the higher levels can handle them as gracefully as possible. Applications can have many different layers, but conceptually they can usually be represented in exactly four levels. What are those four layers? I’m glad you asked, because I’ve listed them below, from highest to lowest.
- The User
- This, of course, is the person using your application. Most often they are a human being, but someday users may include Artificial Intelligence or Extra Terrestrial entities. Users are often task-oriented, although some of them are naturally explorers. Task-Oriented users are more damaged by unexpected errors. Exploring users are more interested in seeing what is and isn’t possible, so errors often define a boundary for them, but these users are also most likely to find bugs in your application.
- High-Level Code
- This is sometimes called the “Domain” level. The public methods in these classes are often task-oriented, and the class often define nouns that the User would be able to identify with. Code in this level translates between user requests and low-level requests. This is of course an idealized definition, but the closer to this definition you can make your reality, the easier to maintain your application will be.
- Low-Level Code
- This is the library or framework code. Code in this level usually is used to handle resources, and to actually get things done. Most Low-Level code is “possibility” oriented. The API defined allows you to do simple, atomic, operations that might not be meaningful on their own, but can be strung together in a useful manor.
- Resources
- These are “external” things used by the application. Some examples are memory, files, databases, and sockets . Consumable (memory, disk-space, etc…) resources can run out unexpectedly, even if you “check” before hand. Many resources fail for various reasons, for example sockets might fail because of a disconnected wire, files might fail because of a physical disk error, and database operations might fail due to overloaded CPU or low available memory.
Note, that there can be some “blurring” of levels, depending on the project you are working on. For example, the developer of an HTTP Client library is probably writing “high-level” code, in that it represents the domain of HTTP, and provides “task-oriented” interfaces. However, when that HTTP Client library is used in an application, it becomes “low-level”, because the User probably doesn’t care about the HTTP Protocol, and instead only cares about “downloading a file” or “viewing a web-page”.
From my experience, each of the levels correspond to a different kind of error, and those errors should all be handled differently. The likelihood of most of the types of errors can be managed, but not eliminated completely.
Error Handling
Handling User errors
This is the highest-level error. It is caused by a user doing something they shouldn’t. Most commonly this means the user entered an invalid input somewhere. Whenever possible, the User Interface should be designed to prevent user errors. For example, if you need a date then you provide a “Date Widget” and if you need a number you disallow non-numeric input.
Sometimes, it isn’t feasible to simply “prevent” user error. In this case, the user input must be validated. The best place to validate it is in the High-Level code. Since this code is closest to the user, it has more knowledge about the user. The High-Level code knows whether to pop-up a dialog box, print a line to the console, or update a form with error messages.
It may sometimes useful abstract the “error reporting” interface, so that it can be reused by different UI interfaces. For example, domain level code might be used in a Web App, as well as a native GUI app, or even a command-line app. Having an reusable interface for reporting multiple user errors helps in porting to new user interfaces.
Usually, it should be considered a bug in the high-level code if invalid user input causes a lower level exception. Low-Level code must verify the input it receives from higher-level code.
There are times, however, when it is infeasible to verify before hand whether an input is valid, and the validation must be handled a lower-level code. In this case, the lower-level code should report the error to the higher level code, which should handle it and pass that information on to the user in an appropriate manor.
Handling Bugs
Bugs can happen in either low-level code, or high-level code.
In High-Level code, it means that some method (often a user task-oriented method) has violated some constraint, or failed to handle a lower-level error. In Low-Level code
it means that a method (part of a framework or library) has violated some constraint, or failed to properly handle a Resource Error.
First and foremost, it is rarely appropriate to show all the detail of “bug” caused errors to the user. Most users would prefer not to experience bugs at all, but because this is the real world, we have to handle the bugs some how. Most times, its enough to tell the user “There was an internal error. Click ‘Report’ to send a bug report.” The bug report should include any useful state information, but absolutely must include a full stack trace. If possible, the high-level state should be recoverable, so that the user doesn’t lose any of their work, and can try either a slightly different approach, or wait for a bug fix.
The number of errors of these kinds can be reduced by Unit Testing, improving API design, Integration Testing, and User Testing.
Resource errors
These are Exceptional Circumstances. Resource errors can be caused by bad data, connectivity issues, database constraint violations, file corruption, file-not-found, out-of-memory, and many more uncontrollable problems.
Resource errors should be reported to the user with as much detail as is likely to be useful to them. If the user specifies a file, and that file is in the wrong format, this could be treated as a user error, and the user should be informed of the file they chose, and what was wrong with it. Low-level code should report to higher-level code something along the lines of “I expected a FOO file,” and the higher-level code should report something like “The file c:\my.bar is not a FOO file. “ If a file can not be found, an appropriate message should be displayed, etc…
For this reason, low-level code which handles resources should report errors with as much structured information as feasible to higher-level code. The higher-level code can then query that structure and produce an error message which is appropriate for the user. For example, an SQL error at the lower level might mean different things; Typically duplicate a record might a User Error, but a syntax error is likely Low-Level Programmer Error.
There is far less that can be done to reduce resource errors, as they are often caused by external circumstances. Often times using redundancy can reduce the frequency of fatal errors, but the errors themselves will still occur, and should still be handled appropriately.
Tags: abstraction, bugs, business layer, design, error handling, humans, Object Oriented Design
