Daniel Pitts’ Tech Blog

Posts Tagged ‘bugs’

Levels of Error Handling

Sunday, November 29th, 2009

If you have been using computers for any amount of time, you’ve come across an error message.  From nearly useless “access violation” dialog boxes, to the more helpful “User names must not contain spaces” form messages, error messages are frequent on computers.  Whether you’re a Mac, PC, Penguin, or Other, error messages that mean only “something broke” are frustrating.

Programmers often treat all kinds of errors the same, especially when a user interaction is involved.  The problem is, not all errors have the same “meaning”, and not all errors should be handled the same way.  If, as a User, I attempt to use the program for something, and it reports a message target at a programmer (displaying a stack-trace, for instance), it can be confusing.  As a High-Level programmer, if a library call fails but my code gets back few details (getting only ‘There was an SQL error’, without the SQL statement that caused the error), it can be just as frustrating.

What do you mean by low-level or high level?

There usually different “levels” to an application, and each level has a specific type of concern.  Errors on lower levels should be filtered up in as appropriate a manor as possible, so that the higher levels can handle them as gracefully as possible.  Applications can have many different layers, but conceptually they can usually be represented in exactly four levels. What are those four layers? I’m glad you asked, because I’ve listed them below, from highest to lowest.

The User
This, of course, is the person using your application. Most often they are a human being, but someday users may include Artificial Intelligence or Extra Terrestrial entities.  Users are often task-oriented, although some of them are naturally explorers.  Task-Oriented users are more damaged by unexpected errors. Exploring users are more interested in seeing what is and isn’t possible, so errors often define a boundary for them, but these users are also most likely to find bugs in your application.
High-Level Code
This is sometimes called the “Domain” level.  The public methods in these classes are often task-oriented, and the class often define nouns that the User would be able to identify with.  Code in this level translates between user requests and low-level requests. This is of course an idealized definition, but the closer to this definition you can make your reality, the easier to maintain your application will be.
Low-Level Code
This is the library or framework code.  Code in this level usually is used to handle resources, and to actually get things done.  Most Low-Level code is “possibility” oriented. The API defined allows you to do simple, atomic, operations that might not be meaningful on their own, but can be strung together in a useful manor.
Resources
These are “external” things used by the application. Some examples are memory, files, databases, and sockets . Consumable (memory, disk-space, etc…) resources can run out unexpectedly, even if you “check” before hand.  Many resources fail for various reasons, for example sockets might fail because of a disconnected wire, files might fail because of a physical disk error, and database operations might fail due to overloaded CPU or low available memory.

Note, that there can be some “blurring” of levels, depending on the project you are working on.  For example, the developer of an HTTP Client library is probably writing “high-level” code, in that it represents the domain of HTTP, and provides “task-oriented” interfaces.  However, when that HTTP Client library is used in an application, it becomes “low-level”, because the User probably doesn’t care about the HTTP Protocol, and instead only cares about “downloading a file” or “viewing a web-page”.

From my experience, each of the levels correspond to a different kind of error, and those errors should all be handled differently. The likelihood of most of the types of errors can be managed, but not eliminated completely.

Error Handling

Handling User errors

This is the highest-level error.  It is caused by a user doing something they shouldn’t.  Most commonly this means the user entered an invalid input somewhere.  Whenever possible, the User Interface should be designed to prevent user errors.  For example, if you need a date then you provide a “Date Widget” and if you need a number you disallow non-numeric input.

Sometimes, it isn’t feasible to simply “prevent” user error.  In this case, the user input must be validated.  The best place to validate it is in the High-Level code.  Since this code is closest to the user, it has more knowledge about the user.  The High-Level code knows whether to pop-up a dialog box, print a line to the console, or update a form with error messages.

It may sometimes useful abstract the “error reporting” interface, so that it can be reused by different  UI interfaces. For example, domain level code might be used in a Web App, as well as a native GUI app, or even a command-line app.  Having an reusable interface for reporting multiple user errors helps in porting to new user interfaces.

Usually, it should be considered a bug in the high-level code if invalid user input causes a lower level exception. Low-Level code must verify the input it receives from higher-level code.

There are times, however, when it is infeasible to verify before hand whether an input is valid, and the validation must be handled a lower-level code.  In this case, the lower-level code should report the error to the higher level code, which should handle it and pass that information on to the user in an appropriate manor.

Handling Bugs

Bugs can happen in either low-level code, or high-level code.

In High-Level code, it means that some method (often a user task-oriented method) has violated some constraint, or failed to handle a lower-level error. In Low-Level code
it means that a method (part of a framework or library) has violated some constraint, or failed to properly handle a Resource Error.

First and foremost, it is rarely appropriate to show all the detail of “bug” caused errors to the user.  Most users would prefer not to experience bugs at all, but because this is the real world, we have to handle the bugs some how.  Most times, its enough to tell the user “There was an internal error.  Click ‘Report’ to send a bug report.”  The bug report should include any useful state information, but absolutely must include a full stack trace.  If possible, the high-level state should be recoverable, so that the user doesn’t lose any of their work, and can try either a slightly different approach, or wait for a bug fix.

The number of errors of these kinds can be reduced by Unit Testing, improving API design, Integration Testing, and User Testing.

Resource errors

These are Exceptional Circumstances.  Resource errors can be caused by bad data, connectivity issues, database constraint violations, file corruption, file-not-found, out-of-memory, and many more uncontrollable problems.

Resource errors should be reported to the user with as much detail as is likely to be useful to them.  If the user specifies a file, and that file is in the wrong format, this could be treated as a user error, and the user should be informed of the file they chose, and what was wrong with it.  Low-level code should report to higher-level code something along the lines of “I expected a FOO file,” and the higher-level code should report something like “The file c:\my.bar is not a FOO file. ”  If a file can not be found, an appropriate message should be displayed, etc…

For this reason, low-level code which handles resources should report errors with as much structured information as feasible to higher-level code.  The higher-level code can then query that structure and produce an error message which is appropriate for the user.  For example, an SQL error at the lower level might mean different things; Typically duplicate a record might a User Error, but a syntax error is likely Low-Level Programmer Error.

There is far less that can be done to reduce resource errors, as they are often caused by external circumstances.  Often times using redundancy can reduce the frequency of fatal errors, but the errors themselves will still occur, and should still be handled appropriately.

JavaScript and Java applets

Saturday, October 11th, 2008

I know, Applets are almost dead, but we actually had use for them recently in a project.  We needed to upload multiple images at once, and since no one on our team knows Flash, we decided to use an Applet. Anyway, the trouble we came across was the Java to JavaScript bridge is flat-out broken.  What you are supposed to be able to do is well documented at Sun’s java_js page and Mozilla’s LiveConnect page. The trouble is, that they are wrong when it comes to calling a Java method from JavaScript. They also fail to mention the security implications of that altogether.

Signed applets with unsigned JavaScript

The first challenge we came to (which is actually the easiest to solve) was that we were getting AccessControlExceptions, even though we went through the trouble of signing the Applet jar. As it turns out, the permission context used is that of the JavaScript, so you need to elevate the permission to your Applets context, using AccessController, and PrivilegedAction.
Java:

public void javascriptCallsMe() {
    AccessController.doPrivileged(new PrivilegedAction() {
      public Void run() {
         // We can now
         readOrWriteFilesOrWhatever();
         return null;
      }
    });
 }

That solved that problem.

Passing JavaScript objects to Java

The next problem, which plagued me for a week, was that getting a JSObject in Java seems to be broken. There are two ways to get an instance of the netscape.javascript.JSObject class.  The first way, and this always seems to work, is the JSObject.getWindow(Applet applet).  This will get the JSObject wrapping the “window” browser object.  This is useful if you know the “path” to the JavaScript object your code cares about.  It is akin to using a static reference, and isn’t good design. The other way, is to have a Java method that takes an Object or JSObject reference: In Java:

public class MyApplet extends JApplet {
  public void doStuff(JSObject params) {
    System.out.println(params.getMember("foo"));
  }
}

Then in JavaScript you should be able to do this: documents.applets[0].doStuff({foo: "bar"}); Unfortunately, what really happens is you get a “broken” instance of JSObject.  Debugging the Applet, I found that the JSObject instance has a field called nac, which has a value for the JSObject.getWindow(…), but is null for values passed in from JavaScript. So, what solutions and work arounds have people come up with? None that I could find.  I searched high and low. Plenty of people have discovered this bug, but none of come up with a solution. Until now!  I thought about it and realized, JSObjects I get from the “window” JSObject all work, so maybe I can put my broken object into a working object, and pull it back out to get a working object. A little experiment proved that it worked (at least on FireFox, I’ll guess it works on IE too, anyone want to verify?). So, I decided to go ahead and create a JSObject resolver, that will fix any possibly broken object:

public class MyApplet extends JApplet {
  private JSObject appletTmp;

  public JSObject resolveObject(Object o) {
    final int hashCode = System.identityHashCode(o);

    appletTmp.setMember("toResolve" + hashCode, o);
    return (JSObject) appletTmp.getMember("toResolve" + hashCode);
  }

  public void init() {
    if (appletTmp == null) {
      final JSObject window = JSObject.getWindow(this);
      final String tmpName = "_AppletTmp" + System.identityHashCode(this);
      window.eval("var "+ tmpName +" = {}");
      appletTmp = (JSObject)window.getMember(tmpName);
    }
  }
}

Granted, this code doesn’t clean up after itself, so if you use it for a long time or on a lot of JS objects, you will need to add some clean up code to it. With that, I was finally able to fully use the Java Applet the way I wanted to: As a “service provider” to the JavaScript on our existing page.

Fixing Bugs

Tuesday, May 20th, 2008

The best way to fix a bug is to prevent the bug in the first place. There are several fundamental causes of bugs which we can explore.

One very common cause of bugs is miscommunication. This is really not a bug, but an unimplemented feature; if there was a requirement that wasn’t met, it is often labeled as a bug. It really could have been a misunderstanding of what the requirement actually was, or even that there was a requirement at all. No one is to blame, but that doesn’t mean someone couldn’t have prevented it. If you think there is a missing requirement, bring it up. If you don’t understand a requirement, get it clarified.

Bad assumptions can also lead to bugs. Specifically, if you assume that some external API will handle all the cases you need it to, but for at least one case the behavior isn’t what you assumed, you have a bug. It can be hard to determine for this whether the bug is in your code or the external code. Usually that can be determined by the documentation of the third party code. If its not documented, its undefined.

Ambiguity can lead to bad assumptions. If the documentation is ambiguous, try looking in the source, or writing a unit test. Update the documentation to be less ambiguous if you have access. Also make sure to add assertions to your code.

Mistakes happen. A bug can be the result of a simple mistake. Sometimes you type one thing while you’re thinking something else. If you were (un)lucky enough to type something the compiler was okay with, a bug is born. eg, A common bug in c and c++ is the =/== bug where “if (i = 1)” is an (=) assignment, but was probably meant as an (==) comparison.

What was I writing about? Forgetfulness might also lead to bugs. “Oh, I’ll just fill this method out later,” or “I’ll write the ‘else’ case tomorrow.” Judicious use of “TODO” comments and the like can help with this problem.

“That does not compute.” Sometimes, there isn’t a bug, per se, but your algorithm fails in certain cases. Maybe the case is impossible to solve in human-scale times, maybe you’re using a heuristic that has a flaw. These kinds of “bugs” are harder to solve, since the programmer has done everything right. Research other algorithms, or put in checks to verify (as much as possible) the input will work with the algorithm.

And as such, these bugs get into the code base. The next phase should filter out bugs. The testing phase. How do bugs progress from codebase into production systems?

Murphy’s law applies a few ways to testing. There is always “the tiniest little change that couldn’t possibly cause any problems.” If you don’t test after making it, it will cause a problem.

Unit testing can do a lot to verify that one unit works for the tested use cases. Sometimes the unit test doesn’t find bugs that come from complex interactions between two separate units, or the unit test doesn’t actually test all cases.

Putting your product in front of a user can help you find more bugs. Murphy’s law as it applies: The first thing a user will do is the only combination of things you didn’t test. That combination will fail. This is not a bad thing though. It lets you know exactly what unit-test you forgot to write. Don’t just fix the bug, fix the test. Write a test that fails for that series of actions, and then make the code work correctly.

No code will ever be 100% bug free. If it is 100% bug free, it probably isn’t very useful. Of course, this is a generalization, but it is often the case. Don’t become discouraged because of bugs. Fix them one at a time, and document the ones you won’t fix in this release as “unsupported features”. If you think a poor design is causing the easy of creating bugs, or a better design would prevent them, then go ahead and consider refactoring. Just know that a new design will have new bugs too. They may be easier to spot and easier to fix, so don’t rule the redesign out flat.

So now we know the reasons bugs get created, how do we find their cause?

Causality. We know of the existence of a bug because of its effect. A number is wrong, or we get a NullPointerException, or Mr. Smith gets Mrs. Robinson’s bill. Where in the codebase should the “correct” behavior be? Look there, add debug logging or break points of some sort. Assert your assumptions and watch what happens. What if that code appears to be doing the right thing? Maybe the bug happens before that point. From where in the codebase does this behavior get invoked? Repeat the process there. Feel free to use “gut” intuition to find where you think the bug is, but verify that the bug is there before “fixing” it.

Also, no bug is so old that it can’t be fixed. Even 25 year old BSD bugs can be found and fixed.